Automagically Generated Summaries of MIT OCW Algorithms Lectures

Swizec · on Oct 3, 2010

I'm really interested in how you did the automagic summarization? Is there some sort of public algorithm, or did you make something yourself? If so what/how?

The reason I'm asking is because I was thinking of making a summarizator but ended up realising I have no idea how to even begin making one.

riffer · on Oct 3, 2010

Yes, I'm working on a blog post describing how this works.

It won't go into a lot of detail, but it should be a good overview. The truth is that I still have lots of ideas on how to make it better, but it is also the case that for every 8 things tried maybe 1 works. It comes back to your point that part of the appeal of working on this class of problems is that it is somewhat hard. In general, I would prefer to work on things that have technical barriers to entry, rather than network effects.

mahmud · on Oct 3, 2010

While we're waiting, throw us a bone. Name any libs you're using.

riffer · on Oct 3, 2010

Although I'm not in love with the tone, I'm happy to answer the question.

The coding is all done in python. I'm using nltk, although the approach is more statistical inference than linguistics. That's really the only library. I'm very much a bottoms-up guy, and am happy to re-invent the wheel because it sometimes gives me a better understanding of the texture of the problem I'm up against. For example, where I'm doing some linear algebra / graph traversal math I'm using functions I've written rather than numpy, or scipy, or a graphdb, etc.

mahmud · on Oct 4, 2010

Thanks riffer, I have searched low and high for a usable automatic text summarization library, but everything was a high-dollar item for finance.

Also, the tone, if there was any, was friendly to neutral.

riffer · on Oct 4, 2010

the tone, if there was any, was friendly to neutral

No hard feelings, sorry if I was reading too much into it.

I have search low and high for a usable automatic text summarization library, but everything was a high-dollar item for finance

Now you really have my attention. I'll shoot you an email.

naner · on Oct 4, 2010

This should get you started:

http://laughingmeme.org/2003/07/28/text-summarization/

http://stevehanov.ca/blog/index.php?id=52

http://libots.sourceforge.net/

astrofinch · on Oct 3, 2010

Why don't you just read introduction to algorithms?

riffer · on Oct 3, 2010

The concept is roughly that people shouldn't have to skim what software can summarize. Let the machines do the work.

mahmud · on Oct 3, 2010

I laud the initiative, but there is hardly any fluff in CLRS.

riffer · on Oct 3, 2010

As promised here yesterday on the MIT Video lectures - Introduction to Algorithms discussion thread -> http://news.ycombinator.com/item?id=1751181

Scott_MacGregor · on Oct 3, 2010

This is nice, though some formatting would make it easier to read. In IE7x there is almost no whitespace on the right margin. On my big monitor it is close to wall to wall text.

riffer · on Oct 3, 2010

Ouch, sorry about that, I'm not too proficient at this cross-browser stuff. Let me see what I can do.

makmanalp · on Oct 3, 2010

It'd also be nice if the summarizer separated the text into readable chunks, like paragraphs as opposed to a single wall of text!

zbanks · on Oct 3, 2010

Linebreaks would be great. Even if you just spaced them every 3 sentences, it would improve readability greatly. It may seem totally arbitrary, but right now the wall of text is rather daunting.

Also, set .body { margin: auto; } -- it will center the text in the screen.