Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author of Python port of Unidecode here. I wrote a comment previously, pointing out that Unidecode does the reverse of Mimic. But then I actually checked the tables of characters that Mimic uses and deleted my comment.

Mimic chooses replacement characters solely based on their visual similarity with ASCII. Unidecode, while still doing character-by-character replacements without deeper analysis, tries to optimize the replacement tables for transliteration of natural languages.

For example, mimic will replace Latin capital H with Greek capital eta (U+0397), because they look similar. However, Unidecode will replace U+0397 with Latin capital E, because Latin E is typically used in place of Greek eta when transliterating Greek text to Latin.



I have used the php port long ago when creating a simple website search engine... Great project!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: