Hacker News new | past | comments | ask | show | jobs | submit login

Some platforms (e.g., Android) have methods specifically for asking how to edit a string following a backspace. However, there's no standard Unicode algorithm to answer the question (and I strongly suspect that it's something that's actually locale-dependent to a degree).

On further reflection, probably the best starting point for string editing on backspace is to operate on codepoints, not grapheme clusters. For most written languages, the various elements that make up a character are likely to be separate codepoints. In Latin text, diacritics are generally precomposed (I mean, you can have a + diacritic as opposed to precomposed ä in theory, but the IME system is going to spit out ä anyways, even if dead keys are used). But if you have Indic characters or Hangul, the grapheme cluster algorithm is going to erroneously combine multiple characters into a single unit. The issue is that the biggest false positive for a codepoint-based algorithm is emoji, and if you're a monolingual speaker whose only exposure to complex written scripts is Unicode emoji, you're going to incorrectly generalize it for all written languages.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: