Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Build a Kindle book of PG's essays (gist.github.com)
53 points by olasitarska on Nov 18, 2012 | hide | past | favorite | 23 comments



I did a script like this a while back, and asked pg before releasing it. He asked me not to do it so I refrained.

edit: just checked my email, i actually asked him about putting up the epub, not the script. but same difference I suppose.


I can't speak for how pg feels, but that is a very different kettle of fish.

In your case, you'd have been modifying and distributing his content, rather than providing users a tool for consuming content they have obtained directly from pg.

Think hosting mirrored ad free sites vs creating AdBlock.

He might still have a problem with this (if only because of server load), and it would have been a good idea to check first, but your experience isn't evidence one way or another.


Impressive how short this can be with the right libraries: 40 lines of code to scrape an index, download the pages, get the right parts, and make them an ebook.

However, wouldn't it be more efficient if one person would do this, and publish it? Now everybody has to scrape PG's site. Thanks for the code tho!

What is the licence on the articles?


[deleted]


The result is an invalid ePub book since some pages do not follow strict xhtml rules; tag mismatch. iBooks and some other readers fail to render them.

This page contains the following errors: error on line 13 at column 7: Opening and ending tag mismatch: font line 0 and p


Use lxml.html instead of BeautifulSoup



+1 for PyQuery - it is definitely my favorite out of them all.


I tried running it and it chokes on "Chapter 1 of ANSI Common Lisp". I think that's due to the link being a txt file rather than html, causing an exception to be thrown: "Error: URL doesn't exist".


Fixed, thanks!


Those essays are available for free on his site. If you want to scrap it for yourself, you're in a grey zone, but it will be fine. But hey, authors deserve respect : if an author wants to publish an ebook, he will.


Why am I in a grey zone for manipulating a bunch of bytes that I have downloaded into a format I find convenient?

The bytes were distributed legally by the copyright holder.

I downloaded the bytes legally using an ISP that I paid for.

The author did not use a robots.txt indicating his wishes that I not download the bytes using an automatic tool.

The bytes are unprotected by DRM.

I have no intention to distribute the bytes to anyone else.

I have not broken the DRM on my ebook reader.


You can scrap for yourself (or in Instapaper, but that's another story), but sharing a script and telling the others to do so is bad.

Call me old fashioned, but I'm thinking that an author has his word to say on the way to structure and distribute his works.


Scraping content isn't bad if you aren't redistributing it, technically your web browser is scraping content in the same sense.

http://paulgraham.com/robots.txt doesn't disallow robots from reading the essays.


Let's not be naïve. This is geek-clad redistribution.


isn't kindle's format different from epub?


Yes, but if you email an epub to yourself, or use KindleGen, it can convert ePubs. Under the hood, they're pretty similar.


Could you upload result - epub file?


This is the resulting ebook file: https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.e...

And this is the (Kindle compatible) .mobi conversion: https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.m...

The conversion was done through this service: http://www.2epub.com/


Thank you, I always wanted to read it on my kindle.


Ola, this is brilliant!


Thanks ;) It's rather a simple script, but I've always wanted to read PG essays on my Kindle.


Super cool stuff. :-)


Thanks! :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: