> Every string is internally encoded in utf-8, all the string operations are Uni...

masklinn · on April 23, 2022

> Digging into it, I recall that the Tcl interpreter actually represents every character as a predefined number of bytes, set by a preprocessor definition.

Ah, that’s exactly what old python did. Wonder of it was inspired by the tcl solution.

Fwiw because they rejected indexing recent cpython uses a variable encoding based on contents (possibilities are iso-8859-1, ucs2, or ucs4). That does mean adding an astral codepoint to an ASCII string quadruples its size.

Pypy, on the other hand, uses utf-8.

bawolff · on April 23, 2022

That's pretty much what every older language did.

I assume its because the original unicode only supported 2 byte characters, and even after astral characters became a thing it took a while for them to be used for non-cjk things.

masklinn · on April 23, 2022

> That's pretty much what every older language did.

The fixed size, but I’m referring to the compile-time width switch.

I may be wrong but I was under the impression most older langages simply remained on their original character width (or left the issue ill defined and/or added an ancillary type like C).