More

barryhunter · on Oct 20, 2016

There are quite a few Ngram datasets available https://www.google.com/search?q=download+n-gram+dataset

... these are almost certainly used in many spelling and grammar checkers. (To help with where the same spelled word is used in different context)

http://www.aclweb.org/anthology/W12-0304

twa927 · on Oct 20, 2016

Yes, I remember trying to use Google Books Ngram Dataset [1], but it was too tedious for me to setup and maintain a server with the data for a purpose of a quick-and-dirty tool (that's why I asked for a ready API). Still, using it is probably a nice idea for a more ambitious side project or even a startup.

EDIT. Actually I would happily pay for a tool that implements the idea. Grammarly has paid plans but $30/month is too steep (for my types of usages), and the types of grammar checks it performs is not exactly what I need (which is what real people in real situations use).

[1] http://storage.googleapis.com/books/ngrams/books/datasetsv2....

plusepsilon · on Oct 20, 2016

We (foxtype) actually have a dev tool that does exactly this.

If we publish it as an online tool do you think people will find it useful?

We have multiple corpora, some language models built in neural networks, etc.

mrob · on Oct 20, 2016

LanguageTool has limited support for using Google's n-gram data to find spelling errors. It only uses 3-grams, and only for a list of commonly confused words. I'm not aware of any Free Software that does better.

http://wiki.languagetool.org/finding-errors-using-n-gram-dat...

barryhunter · on April 11, 2016

Did you look carefully at the date this was posted on the mailing list? :)

barryhunter · on March 14, 2016

It's up to you if want to take the risk :)

There is always a risk in trying something new. It might never pan out, or disappear in a puff of smoke.

Nothing would happen if nobody took a little risk.

barryhunter · on March 8, 2016

what does it used as the backend? The actual classification system used?

theodorton · on March 8, 2016

Looks like it uses Python on the backend, with nltk. http://www.nltk.org/

fatiherikli · on March 8, 2016

Using nltk for tokenizing and stemming text. There is no classification backend, I implemented a Naive Bayes classifier.

barryhunter · on Dec 9, 2015

untick autoShuffle in the controls.

venning · on Dec 9, 2015

Not available on mobile.

jstanley · on Dec 9, 2015

Thanks!

barryhunter · on Sept 8, 2015

then link dropped on hn, to get even more traffic?

tinytrophies · on Sept 8, 2015

Slightly self motivated yes. That being said I've spent two years developing applications that could greatly help communities across the world and was having a hard time getting those ventures the proper attention. I theorized that something "less-impactful" would get more attention and in the long run drive attention to more important projects. I've been sharing my google analytics on the facebook page and with friends as a way for all of us to better understand the influence of different platforms. 32% of traffic has come from Facebook which is pretty much free to all startups. I then ran an ad campaign for five days which performed miserably. I will be carefully examining traffic sources and how to cheaply launch my app in the coming month. I don't expect a great deal from tiny tiny trophies I put up a wordpress site in a night and have been watching it in the past week.

barryhunter · on Aug 25, 2015

Anyone know the criteria for what sites are included?

barryhunter · on March 2, 2015

Seems to be the premise of http://www.majestic12.co.uk/ - looks like it might be failing.

barryhunter · on Oct 29, 2014

There is a 'Technology' link in the footer.

Really nice implementation btw.

barryhunter · on April 29, 2014

The logger was obviouslly there, it was deliberatly collecting the SSIDs and MAC addresses.

Possibly a debug option to log the whole packets added during development, and it was accidently left on in production.

Or the whole packet was always logged, a second process would then skim just extracting the SSID/MAC (correlating with GPS), and another process was deleting the raw logs. That third process failed.

A few big drives in teh data collection devices, and possibly nobody noticed where filling up a little too quickly.