Hacker News new | past | comments | ask | show | jobs | submit login
Instapaper like article extractor demo now online (jimplush.com)
29 points by beagledude on June 11, 2011 | hide | past | favorite | 11 comments



Nice! I worked on something similar a year ago but for the Ruby world. If you're on a Unix with Ruby installed (e.g. OS X), you can mostly repeat the linked demo like so:

    gem install pismo
    pismo http://techcrunch.com/2011/06/09/twitter-ios/ title lede author datetime body
And then enjoy the output. No image pick out, but it's the first IMG tag in the 'html_body'.. just never got around to implementing it as I didn't need that feature.

The downside is I haven't worked on it for months and it's in sore need of improvements. For its current in-production use though, it's proving sufficient and a reasonable option for Rubyists. More info at https://github.com/peterc/pismo

Not knocking Jim's work on Plush, btw, he's actively working on it so if Java works out for you, stick to him! :)


One thing I've always wanted is something to extract multi-page forum threads and render them in a normalized readable way. For example, Reddit comment threads like IAMAs.

Anyone know of a service or library that does that?


I actually started playing with the concept one weekend for IAmAs specifically. I was trying to do it all client-side and the issue I was running up against was reddit's jsonp responses get VERY slow on large threads.


it didn't extract images for the sites i tried. all wordpress blogs. EDIT: i just realized why. those blogs are pulling images from flickr.


How is this different from the myriad of other text extraction services, APIs and libraries out there?


1. it's open sourced 2. it's embeddable 3. it extracts images 4. it's one of the most accurate (http://www.readwriteweb.com/hack/2011/06/head-to-head-compar...) 5. it's named after the best top gun character


I agree. It works quite well! If it can be formatted to look like Readability's default (which is really quite plesant to read) it would be nice.


I'm sold, then, thanks!


Cool! I really like how accurate it is; works with almost every site I try. :)


Nice work!


Everything I try gives a 404.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: