Hacker News new | past | comments | ask | show | jobs | submit login

> Clearly there's enough left to keep adding more and more characters for a really long time.

And then what? It's already 11% full.




So? In this 11% it already covers almost all written languages in use and tons of dead ones. It also has lots of classic symbols, from math, book ornaments, standard typographic stuff (left arrow, etc.).

So all the basics are covered.

We could cover the rest 89% with variations of the turd emoticon and we'll still be perfectly fine.


And then we expand it again, like we did at the earlier 2000's.

UTF8 will support it by default, UTF16 will stay broken, UTF32 will break, but nobody uses the later.


Ah, being able to expand and keep using UTF-8 sounds great.

I didn't know that UTF-16 was considered broken. In what way is it so?


The original UTF-16 can only represent 65536 code points, what is less than half the number of unicode codes today. It was broken at the expansion around a decade ago.

There's a new, incompatible ("mostly compatible" may explain it better) UTF-16 encoding that represent all unicode codes, but well two formats with the same name is even more broken than only a broken one.

UTF-32 will suffer the same fate as UTF-16 if unicode expands. And UTF-8 is capable of representing an absolutely huge number of codes, requiring only non-breaking extensions.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: