Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are quite a few Ngram datasets available https://www.google.com/search?q=download+n-gram+dataset

... these are almost certainly used in many spelling and grammar checkers. (To help with where the same spelled word is used in different context)

http://www.aclweb.org/anthology/W12-0304



Yes, I remember trying to use Google Books Ngram Dataset [1], but it was too tedious for me to setup and maintain a server with the data for a purpose of a quick-and-dirty tool (that's why I asked for a ready API). Still, using it is probably a nice idea for a more ambitious side project or even a startup.

EDIT. Actually I would happily pay for a tool that implements the idea. Grammarly has paid plans but $30/month is too steep (for my types of usages), and the types of grammar checks it performs is not exactly what I need (which is what real people in real situations use).

[1] http://storage.googleapis.com/books/ngrams/books/datasetsv2....


We (foxtype) actually have a dev tool that does exactly this.

If we publish it as an online tool do you think people will find it useful?

We have multiple corpora, some language models built in neural networks, etc.


LanguageTool has limited support for using Google's n-gram data to find spelling errors. It only uses 3-grams, and only for a list of commonly confused words. I'm not aware of any Free Software that does better.

http://wiki.languagetool.org/finding-errors-using-n-gram-dat...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: