Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've experienced scanning personal books and also try to reduce them since I'm also concerned with bloat on my (older) mobile reading devices. Unfortunately, there are reasons I cannot upload those, but the procedures might still be helpful for existing scans.

Use ScanTailor to clean them up. If there is no need for color/grayscale, have the output strictly black and white.

OCR them with Adobe Acrobat ClearScan (or something else, that is what I have).

Convert to black and white DJVU (Djvu-Spec).

Dealing with color is another thing, and takes some time. I find that using G'MIC's anisotropic smoothing can help with the ink-jet/half-tone patterns. But it's too time consuming to be used for books.




I like ScanTailor! I've used ocrmypdf for the OCR and compression steps. It uses lossless JBIG2 by default, at 2 or 3k per page; I'm curious how that compares to DJVU. (And my mistake, pdf and DJVU are competing container formats.)


If the PDF is from a scanned source, converting it to DJVU with equivalent DPI typically results to about half the file size (figures can vary depending on the specifics of the PDF source).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: