Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> some people do consider using the same code point for apostrophe and citations problematic (it's definitely annoying when doing word segmentation)

And those people are a menace. "Ball bearings" is a single word with a space in the middle. The only way you're going to get reliable word segmentation is with a natural language parser and a lexicon with an entry for "ball bearing". At that point, you're already recognizing "aren't" as a word; the punctuation isn't really relevant.

On the other hand, if you're not particularly upset about messing up space-including words, there's no real reason to be upset about apostrophe-including ones either.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: