Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Or use UTF-8 everywhere to fix the problem. Code points have their own issues (as does UTF-8, but on balance these seem like better engineering tradeoffs).


Using UTF-8 only has better engineering trade offs if you represent text as UTF-8. If you don’t (like JavaScript) then it’s just extra complexity. It’s no better than UTF-16. Only advantage it has is it’s more common among newer languages.

Code points is the only thing that makes sense for a multi-language protocol. It’s unambiguous and every Unicode client can talk in code points, even if they use some exotic encoding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: