Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We should stop assuming any string data is a fixed-length encoding. This is a major disadvantage of UTF-8, because it allows for this conflation.

So what do you suggest? UTF-16 and UTF-32 encourage this even more.



Yeah, ASCII is such a powerful mental model that I think anyone working with Unicode made a lot of concessions to convert people, no argument there. But I think we need to say we're done with that and move on to phase 2. Here's what I advocate:

- Encodings should be configurable. Programmers get to decide what format their strings are internally, users get to decide what encoding programs use when dealing with filenames or saving data to disk, etc. Defaults matter, and we should employ smarts, but we should never say "I know best" and remove those knobs.

- Engineers need to internalize that "strings" conceal mountains of complexity (because written language is complex), and default to using libraries, to manage them. We should start view manual string manipulation as an anti-pattern. There isn't an encoding out there that we can all standardize on that makes this untrue, again because written language is complex.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: