Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Let me respond to you again in a different way, this time referencing some unicorn definitions I like (https://stackoverflow.com/a/27331885).

I don't think we can have a meaningful conversation in terms of characters so I'm going to ignore that and reference your last paragraph. You seem to be arguing that string as a type has use when viewing it as a collection of methods that allow access to Code Points given an underlying storage of Code Units. The article is arguing that unless you're writing a unicode encoder/decoder, you probably don't care about manipulating Code Units (except that modern languages have given you these byte arrays that you reference the length of for memory purposes). What you really usually care about is searching, replacing, concating, and cutting collections of Code Points. But languages have only given you this hodge podge grouping of Code Unit arrays and specialty methods for Code Point access so thats what you're used to dealing with and of course you want some kind of abstraction, like a string type, to deal with so you don't end up with the scenario you describe where you screw up a Code Unit sequence trying to manipulate a Code Point.

So the final point is that unless you're working with unicode encoding/decoding, you really only care about Code Points. And once you create a String class that only exposes Code Points, you have got something equivalent to a simple array.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: