Show HN: Build a Kindle book of PG's essays

phreeza · on Nov 18, 2012

I did a script like this a while back, and asked pg before releasing it. He asked me not to do it so I refrained.

edit: just checked my email, i actually asked him about putting up the epub, not the script. but same difference I suppose.

reitzensteinm · on Nov 18, 2012

I can't speak for how pg feels, but that is a very different kettle of fish.

In your case, you'd have been modifying and distributing his content, rather than providing users a tool for consuming content they have obtained directly from pg.

Think hosting mirrored ad free sites vs creating AdBlock.

He might still have a problem with this (if only because of server load), and it would have been a good idea to check first, but your experience isn't evidence one way or another.

PanMan · on Nov 18, 2012

Impressive how short this can be with the right libraries: 40 lines of code to scrape an index, download the pages, get the right parts, and make them an ebook.

However, wouldn't it be more efficient if one person would do this, and publish it? Now everybody has to scrape PG's site. Thanks for the code tho!

What is the licence on the articles?

on Nov 18, 2012

[deleted]

gbraad · on Nov 18, 2012

The result is an invalid ePub book since some pages do not follow strict xhtml rules; tag mismatch. iBooks and some other readers fail to render them.

This page contains the following errors: error on line 13 at column 7: Opening and ending tag mismatch: font line 0 and p

wslh · on Nov 18, 2012

Use lxml.html instead of BeautifulSoup

kami8845 · on Nov 18, 2012

and PyQuery. Shameless plug: http://doda.co/7-python-libraries-you-should-know-about

harpb · on Nov 18, 2012

+1 for PyQuery - it is definitely my favorite out of them all.

chongli · on Nov 18, 2012

I tried running it and it chokes on "Chapter 1 of ANSI Common Lisp". I think that's due to the link being a txt file rather than html, causing an exception to be thrown: "Error: URL doesn't exist".

olasitarska · on Nov 18, 2012

Fixed, thanks!

tangue · on Nov 18, 2012

Those essays are available for free on his site. If you want to scrap it for yourself, you're in a grey zone, but it will be fine. But hey, authors deserve respect : if an author wants to publish an ebook, he will.

georgeorwell · on Nov 18, 2012

Why am I in a grey zone for manipulating a bunch of bytes that I have downloaded into a format I find convenient?

The bytes were distributed legally by the copyright holder.

I downloaded the bytes legally using an ISP that I paid for.

The author did not use a robots.txt indicating his wishes that I not download the bytes using an automatic tool.

The bytes are unprotected by DRM.

I have no intention to distribute the bytes to anyone else.

I have not broken the DRM on my ebook reader.

tangue · on Nov 18, 2012

You can scrap for yourself (or in Instapaper, but that's another story), but sharing a script and telling the others to do so is bad.

Call me old fashioned, but I'm thinking that an author has his word to say on the way to structure and distribute his works.

cowsaysoink · on Nov 18, 2012

Scraping content isn't bad if you aren't redistributing it, technically your web browser is scraping content in the same sense.

http://paulgraham.com/robots.txt doesn't disallow robots from reading the essays.

tangue · on Nov 18, 2012

Let's not be naïve. This is geek-clad redistribution.

riffraff · on Nov 18, 2012

isn't kindle's format different from epub?

davidw · on Nov 18, 2012

Yes, but if you email an epub to yourself, or use KindleGen, it can convert ePubs. Under the hood, they're pretty similar.

sebcioz · on Nov 18, 2012

Could you upload result - epub file?

dirkk0 · on Nov 18, 2012

This is the resulting ebook file: https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.e...

And this is the (Kindle compatible) .mobi conversion: https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.m...

The conversion was done through this service: http://www.2epub.com/

paulovsk · on Nov 23, 2012

Thank you, I always wanted to read it on my kindle.

smartial_arts · on Nov 18, 2012

Ola, this is brilliant!

olasitarska · on Nov 18, 2012

Thanks ;) It's rather a simple script, but I've always wanted to read PG essays on my Kindle.

tchalla · on Nov 18, 2012

Super cool stuff. :-)

olasitarska · on Nov 18, 2012

Thanks! :)