Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Every string is internally encoded in utf-8, all the string operations are Unicode-safe

This seems very slightly disingenuous, if my memory is correct. I don't remember all the details, but I ran into an issue with a Tcl application a while back and Unicode support. Digging into it, I recall that the Tcl interpreter actually represents every character as a predefined number of bytes, set by a preprocessor definition. The default was 2, and cut off any Unicode character that needed more than 2 bytes to encode, unless you were willing to recompile the interpreter to use 4 bytes, at the cost of doubling memory consumption for every string.

Very nitpicky, but I think it's important to point out it's not "quite" utf-8, because Tcl needs each codepoint to be O(1) indexable in an array, something normal utf-8 can't do.



> Digging into it, I recall that the Tcl interpreter actually represents every character as a predefined number of bytes, set by a preprocessor definition.

Ah, that’s exactly what old python did. Wonder of it was inspired by the tcl solution.

Fwiw because they rejected indexing recent cpython uses a variable encoding based on contents (possibilities are iso-8859-1, ucs2, or ucs4). That does mean adding an astral codepoint to an ASCII string quadruples its size.

Pypy, on the other hand, uses utf-8.


That's pretty much what every older language did.

I assume its because the original unicode only supported 2 byte characters, and even after astral characters became a thing it took a while for them to be used for non-cjk things.


> That's pretty much what every older language did.

The fixed size, but I’m referring to the compile-time width switch.

I may be wrong but I was under the impression most older langages simply remained on their original character width (or left the issue ill defined and/or added an ancillary type like C).




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: