Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Perhaps - convention is a powerful thing. I am very confident that any future byte would be a power of two, but I'm not sure that 8 will remain ascendant. A 32-bit byte might be practical - even english language text is commonly no longer composed of 8-bit characters, so why bother with 8-bit addressing, especially when the majority of the world needs more than 8 bits per character? Memory is cheap, and a little bit of "wasted" space could reduce errors and simplify text handling.


> ... and simplify text handling

That ship has largely sailed. UTF-8/UTF-16 will be around for a long time to come. It's encoded into data that is archived. It's built into practically everything we use today. It's reasonably space/transmission efficient. It's standardized across many locales. Of course, you can use all the bytes in memory that you want to. Some languages even do!

My mind was blown when I found out that there are invalid utf-8 sequences. I was then impressed to find out that some exploits started out on this premise against software that didn't understand/protect against this. What a mess indeed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: