> Clearly there's enough left to keep adding more and more characters for a real...

coldtea · on June 23, 2016

So? In this 11% it already covers almost all written languages in use and tons of dead ones. It also has lots of classic symbols, from math, book ornaments, standard typographic stuff (left arrow, etc.).

So all the basics are covered.

We could cover the rest 89% with variations of the turd emoticon and we'll still be perfectly fine.

marcosdumay · on June 23, 2016

And then we expand it again, like we did at the earlier 2000's.

UTF8 will support it by default, UTF16 will stay broken, UTF32 will break, but nobody uses the later.

Bromskloss · on June 23, 2016

Ah, being able to expand and keep using UTF-8 sounds great.

I didn't know that UTF-16 was considered broken. In what way is it so?

marcosdumay · on June 24, 2016

The original UTF-16 can only represent 65536 code points, what is less than half the number of unicode codes today. It was broken at the expansion around a decade ago.

There's a new, incompatible ("mostly compatible" may explain it better) UTF-16 encoding that represent all unicode codes, but well two formats with the same name is even more broken than only a broken one.

UTF-32 will suffer the same fate as UTF-16 if unicode expands. And UTF-8 is capable of representing an absolutely huge number of codes, requiring only non-breaking extensions.