Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I definitely need to write more docs on this project! I will share them when I'm done. It looks like SymSpell is doing spelling correction, which is part of what my code does.

The "surprisal" (information theory: Shannon information) component is doing statistical next word prediction. For example, given a corpus of data in which "data augmentation" is a common phrase, "da" could complete to "data augmentation".

"data augmentation" starts with "da", and would be determined as a likely candidate for the next word because it was common in the dataset on which the statistical model was "fine-tuned".

This Wikipedia page covers the concept in more depth: https://en.wikipedia.org/wiki/Information_content



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: