Hacker News new | past | comments | ask | show | jobs | submit login

It's ... interesting.

I asked

how do i remove diacritics from unicode characters https://www.phind.com/search?cache=bd2b33eb-9454-4d38-975e-1... and it answered with multiple Python code snippets with the second solution getting close to the real solution but is slow and incorrect.

Now if I ask the question I actually want

how do i remove diacritics from unicode characters with php https://www.phind.com/search?cache=c29fe466-16cc-4795-94bf-0... then you can see it misunderstanding the question: the first answer is the desirable output (I suspect it only works for latin letters tho) except for the iconv problems mentioned but the second is completely incorrect. The third answer is close but no cigar as it only works with Latin characters.

The answer I wanted to see is using the ICU library to run the rules NFD; [:Nonspacing Mark:] Remove; NFC. as mentioned on https://unicode-org.github.io/icu/userguide/transforms/gener... and https://www.unicode.org/iuc/iuc22/a339.html and a million other places but I wanted to link only official Unicode documentation.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: