Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you for posting this code on Github! There has been some reverse-engineering done on the language dictionaries bundled with Mac OS, and it's nice to know that the same model is being used on the Apple Watch! I look forward to seeing your dictionary app.

https://josephg.com/blog/reverse-engineering-apple-dictionar...

There's also a command-line tool that can query the dictionary:

https://github.com/takumakei/osx-dictionary

Something I haven't yet reverse-engineered is Apple's word segmentation. I can get the word breaks in Chinese by pressing option + right arrow + space, repeatedly. But I have no idea how the backend for that works.



>Apple's word segmentation

Unless they changed it, it's probably similar to CFStringTokenizer which used ICU Boundary Analysis (and maybe mecab for Japanese).


Thank you! The ICU Boundary Analysis documentation says it uses a dictionary to split Chinese, Japanese, Thai or Khmer.

https://unicode-org.github.io/icu/userguide/boundaryanalysis...

Is that the same as the macOS dictionary being parsed here? It seems like a pretty big file to grep every time!


No, the ICU dictionaries are seen at: https://github.com/unicode-org/icu/tree/main/icu4c/source/da...

I assume at compile time it's converted to a more efficient query format




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: