Briss Trims PDFs so They Fit Better, Are Easier to Read on Your Ereader

larsberg · on Jan 28, 2011

I know I should "just do it myself," but I keep waiting for something that can unsplit and unwrap PDFs generated in ACM double-column style with LaTeX word-breaking and turn it into an epub with graphics for the figures/tables. Trying to deal with that 9-ish pt. font is a huge pain for my old eyes. I ended up giving up on reading them on my iPad because keeping a reasonable zoom level and managing to scan down then over to the next column required the finger dexterity of a concert pianist (even on GoodReader, which is quite, well, Good).

rubidium · on Jan 28, 2011

Calibre was mentioned in the article as being able to convert PDF's into epub format. I had my hopes up for a second, so downloaded it and tried it on a textbook and a smaller scientific publication.

It threw up on both the math equations and figures. It didn't handle the general formatting of the book too well either.

To my knowledge, a good PDF->epub converter has not yet been built. Any takers?

dpapathanasiou · on Jan 28, 2011

"To my knowledge, a good PDF->epub converter has not yet been built. Any takers?"

Check out eBookBurn.com, which is a site I launched last month (http://denis.papathanasiou.org/?p=468).

It lets you upload pdfs and attempts to parse them into editable text.

The pdf parsing is based on my experiments with pdf-miner (http://denis.papathanasiou.org/?p=343), and while still imperfect (in general parsing pdfs is a difficult problem), it works fairly well for certain types of whitepapers.

roel_v · on Jan 28, 2011

This is a wicked cool site, but you need to put in screenshots of the input (how it went in) and the output (what the output looked like in an epub reader).

What approach do your algorithms use? Do you do recognition of title, subtitles etc based on differences in fonts, spacing, line length etc.? Or do you need to enter regexps to recognize those?

Do you recognize paragraphs correctly?

Can you filter out front- and back filler like the ToC, and extract only the 'content' pages?

If so, it's 90% of what I'm looking for and I think good enough to pay for :)

I have some notes on how to approach from when I tried to make it myself, it includes what functionality I consider necessary for a MVP. Let me know if you're interested...

dpapathanasiou · on Jan 28, 2011

I'm working on an FAQ/Help page which will show some of those features in more detail.

The algorithm I use is a variation of the code described here: http://denis.papathanasiou.org/?p=343 except the output is html, not text, so that I can take account things like font sizes and paragraph breaks.

If you signup and try it (it's free for the first 3 days), you'll see that the parser renders each pdf page as text, and it's up to you to decide which range of pages you want to use in your book.

Feel free to contact me by the form on that site, and I can reply in more detail.

felixc · on Jan 28, 2011

A similar tool is my own PDFMunge, previously discussed on HN here: http://news.ycombinator.com/item?id=1089068 and in more depth on my site here: http://www.felixcrux.com/posts/pdfmunge-improve-pdfs-ebook-r...

But this looks quite a bit more polished and user-friendly.

afshin · on Jan 28, 2011

Clever name ... a little crass.

chopsueyar · on Jan 28, 2011

I thought Briss was the client app, and Mohel was server side.

cschmidt · on Jan 28, 2011

The name made me wince a bit when I saw the "trim" part.

patrocles · on Jan 28, 2011

yeah, the article title should have been:

Briss Trims PDFs so They Fit Better, Look More Impressive in Your Hand

gaiusparx · on Jan 28, 2011

PDF is ill fitted to be the format for mobile devices due to its format-for-print purpose with no text-overflow. Its time epub and mobi takes over.

omaranto · on Jan 28, 2011

I read most non-mathematical text in epub (usually converted from something else) because, as you say, it is better. But there is no tool support for making good epubs of math text, so I still need PDFs. When I have LaTeX source for what I read, I just compile with appropriate margin settings.

sliverstorm · on Jan 28, 2011

Really PDF is just ill-suited for distribution of text. The only reasonable exceptions are when that text is explicitly meant for printing (ala fliers or posters), or when said text is not computerized- e.g. a scan of written script that has yet to be OCR'd

stcredzero · on Jan 28, 2011

Is the name of this product a commentary on those who send PDF to mobile users?

EliRivers · on Jan 28, 2011

Alternatively, provide documents in latex or similar and people can do the final compilation themselves, dictating exactly the details of the physical medium (be that printed paper or an electronic display of some kind) they will be using to view it.

This would require people learning how to do something, though.

fdb · on Jan 28, 2011

On the iPad, GoodReader can do margin cropping on the fly, and remembers the margins you've set up for a document so they're reapplied when you open the document again.

http://www.goodreader.net/goodreader.html

larsberg · on Jan 28, 2011

I just wish it had some smarts for two-column PDFs (easily 90% of what I read). I often resize, read down the left column, then shift to the top, which moves the crop window and confuses GR horribly.

kraemate · on Jan 28, 2011

I've been wanting something like this for ages - particularly to print ebooks and latex stuff with their huge side-margins. The basic aim is to trim all margins and print 2 pages side-by-side (landscape).

While Briss trims the margins just fine, printing the (trimmed) document as pdf(or ps) restores the margins. (Tried on okular/evince). What gives?

chopsueyar · on Jan 28, 2011

Good name.

siculars · on Jan 28, 2011

Oy vey.