I'm really interested in how you did the automagic summarization? Is there some sort of public algorithm, or did you make something yourself? If so what/how?
The reason I'm asking is because I was thinking of making a summarizator but ended up realising I have no idea how to even begin making one.
Yes, I'm working on a blog post describing how this works.
It won't go into a lot of detail, but it should be a good overview. The truth is that I still have lots of ideas on how to make it better, but it is also the case that for every 8 things tried maybe 1 works. It comes back to your point that part of the appeal of working on this class of problems is that it is somewhat hard. In general, I would prefer to work on things that have technical barriers to entry, rather than network effects.
Although I'm not in love with the tone, I'm happy to answer the question.
The coding is all done in python. I'm using nltk, although the approach is more statistical inference than linguistics. That's really the only library. I'm very much a bottoms-up guy, and am happy to re-invent the wheel because it sometimes gives me a better understanding of the texture of the problem I'm up against. For example, where I'm doing some linear algebra / graph traversal math I'm using functions I've written rather than numpy, or scipy, or a graphdb, etc.
This is nice, though some formatting would make it easier to read. In IE7x there is almost no whitespace on the right margin. On my big monitor it is close to wall to wall text.
Linebreaks would be great. Even if you just spaced them every 3 sentences, it would improve readability greatly. It may seem totally arbitrary, but right now the wall of text is rather daunting.
Also, set .body { margin: auto; } -- it will center the text in the screen.
The reason I'm asking is because I was thinking of making a summarizator but ended up realising I have no idea how to even begin making one.