Hacker News new | past | comments | ask | show | jobs | submit login

A text file, a webpage, or a database table can only contain textual data in a given encoding.

That's because every byte stored in the file, for example byte number 188, either means "¼" (as it does in ISO/IEC 8859-1, aka. Latin-1 or ANSI), or it means "ỳ" (as in ISO/IEC 8859-14) or "シ" (in JIS X 0201, one of the many Japanese encodings that were devised over the years.)

How do you know which encoding a certain file uses? In general YOU CAN'T and this was the source of many problems and "solutions" which caused even more problems over the years.

Well then, how did you mix symbols from different alphabets, say in a dictionary or in a post that talks about them, like this very post? YOU COULDN'T, short of doing ugly hacks and other subterfuges, like using GIFs for all foreign characters.

Unicode gave a distinct number (or "codepoint") to every character and symbol known to man (within reasonable limits) and this allowed a lot of things that we take for granted nowadays, including this very post, were I just copied and pasted various symbols from their Wikipedia pages and just expect it to work.




Fun fact: Japanese even has a word for foreign characters, "gaiji", and it's extremely common in Japanese ePUBs to use small square images very frequently for characters not in the current font, using the term "gaiji" in the CSS class names used for these characters. And at least one mainstream ePUB reader has special code to detect these gaiji and adjust its rendering to make them behave better.


Huh, you've just caused me to reexamine the word gaijin. gai (外) = outside, jin(人) = person\nationality. gaijin(外字) is outside + character. Neat!


Similarly, the word "loanword" (used to describe words borrowed from other languages) is gairaigo 外来語 which is literally 外来 (gairai) "foreign" + 語 (go) "word/language". Japanese is filled with words where you can often figure out the meaning purely from the characters used!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: