Hacker News new | past | comments | ask | show | jobs | submit login

UTF-16 is the worst of both worlds when compared to UTF-8 and UTF-32. The only reason it exists (and, unfortunately prevalent) is because a number of popular technologies (Java, Javascript, Windows) thought they were being smart when building their Unicode support on UCS-2, and now here we are.

Now, the issue of clipping or reversing strings is a problem not just because of encoding. It simply doesn't work even with UTF-32. You're going to end up cutting off combining characters for example. Manipulating strings is very difficult, and software should never really try to do it unless they know what they are doing, and even then you need to use a library to help you do it.




> thought they were being smart when building their Unicode support on UCS-2, and now here we are.

Not sure what you mean by 'being smart' when all of those were released before Unicode 2.0.


I said they thought they were smart. I'm not going to judge whether it actually was smart based on the situation then.

That said, UTF-8 was already 4 years old by the time Java came out. Surrogate pairs was added to Unicode in 1996, one year prior to the release of Java.

I joined Sun Microsystems around that time, and Unicode really wasn't a thing in the Solaris world for a few more years, so the fact that people wasn't aggressively pushing good Unicode support at the time is understandable. People just didn't have much experience with it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: