Yes, I'm working on a blog post describing how this works.
It won't go into a lot of detail, but it should be a good overview. The truth is that I still have lots of ideas on how to make it better, but it is also the case that for every 8 things tried maybe 1 works. It comes back to your point that part of the appeal of working on this class of problems is that it is somewhat hard. In general, I would prefer to work on things that have technical barriers to entry, rather than network effects.
Although I'm not in love with the tone, I'm happy to answer the question.
The coding is all done in python. I'm using nltk, although the approach is more statistical inference than linguistics. That's really the only library. I'm very much a bottoms-up guy, and am happy to re-invent the wheel because it sometimes gives me a better understanding of the texture of the problem I'm up against. For example, where I'm doing some linear algebra / graph traversal math I'm using functions I've written rather than numpy, or scipy, or a graphdb, etc.
It won't go into a lot of detail, but it should be a good overview. The truth is that I still have lots of ideas on how to make it better, but it is also the case that for every 8 things tried maybe 1 works. It comes back to your point that part of the appeal of working on this class of problems is that it is somewhat hard. In general, I would prefer to work on things that have technical barriers to entry, rather than network effects.