Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> As of utf-8 encoding... It is a variable encoding that is capable to encode any 32-bit number: from 0 to 0xFFFFFFFF. Not just current set of 21-bit unicode code points.

You're thinking of an older version of UTF-8 which allowed sequences of up to 6 bytes to be used to encode a single code point. UTF-8 is now defined to not allow code point values above 0x10FFFF and not allow code point values between 0xD800 to 0xDFFF (inclusive) to allow only the same values as possible in UTF-16.

https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

https://tools.ietf.org/html/rfc3629



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: