Hacker News new | past | comments | ask | show | jobs | submit login

> If I understand you correctly, it means that the encoding itself serves as the metadata that indicates Chinese/Japanese.

Only if you have some unicodebrained mentality where you consider a Chinese character that looks sort of similar to a Japanese character to be "the same". If you think of them as two different characters then they're just different characters, which may or may not be present in particular encodings (which is completely normal if you make a program that handles multiple encodings: not every character exists in every encoding)

> In which case, why is it unreasonable to ask for the same for UTF-8, except using some more clearly specified way to indicate this (like lang="ja" etc), rather than encoding it all into separate characters?

Firstly, you have to "draw the rest of the fucking owl" and actually implement that language selection mechanism. Secondly, if you implement some clever extension mechanism on top of UTF-8 that's only needed for Japanese and can only really be tested by people who read Chinese or Japanese, realistically even if you implement it perfectly, most app makers won't use it or will use it wrong. Whereas if you implement encoding-awareness in the standard way that we've been doing for decades and test with even one non-default encoding, your application will most likely work fine for Japanese, Chinese and every other language even if you never test it with Japanese.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: