Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  Ok, but everything handles the UTF-8 "BOM" just fine.
For small values of "everything". I can't even count the number of bizarre errors I encounter, only to discover that one of the files I'm working on has been corrupted by somebody carelessly using Notepad. Usually it's not obvious what's going on -- the error will be something like "Error parsing file: illegal byte before start of message body", which provides not much help when it's 23:30 and I'm staring at a text editor wondering what the hell byte it's talking about.


And therein lies the insanity that is a BOM: it's a character that's meant to be invisible, even to your UTF-8 capable text editor. With this in mind, it's not clear what is supposed to be able to view or edit this character, short of a hex editor. Somebody else mentioned even cat ignores it; how are you supposed to easily tell that it's there or not?

IMHO a BOM is completely ridiculous to have on a UTF-8 file, because UTF-8 has no ambiguity over endian-ness that a BOM needs to resolve, and it breaks the principle that UTF-8 can serve as ASCII when all the codepoints <0x7F.

Is there any point to using it in UTF-8 besides pain, suffering, and "well UTF-16 and UTF-32 have it"?

The answer, straight from the horse's mouth.

http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf , pg. 36

"Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature."

That's right, the UTF-8 BOM is neither required nor recommended. I rest my case.


"cat -v" saved lot of my time in cases like that.

I had similar problem with nbsp, until I moved nbsp from second level (shift+space) to third level in XKB configuration in my Gnome/Fedora/Linux.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: