You might be interested in https://github.com/ocrmypdf/OCRmyPDF then. It does qu... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		rjzzleep on Nov 22, 2022 \| parent \| context \| favorite \| on: Frog: OCR Tool for Linux You might be interested in https://github.com/ocrmypdf/OCRmyPDF then. It does quite some preprocessing on the PDF pages before passing it on to tesseract.

grahameb on Nov 22, 2022 [–]

I've found ocrmypdf to be excellent: the only issue I've had is with PDFs with differing page sizes; it seems to scale everything up to the size of the largest page, which can be a bit of a pain.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact