Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
rjzzleep
on Nov 22, 2022
|
parent
|
context
|
favorite
| on:
Frog: OCR Tool for Linux
You might be interested in
https://github.com/ocrmypdf/OCRmyPDF
then.
It does quite some preprocessing on the PDF pages before passing it on to tesseract.
grahameb
on Nov 22, 2022
[–]
I've found ocrmypdf to be excellent: the only issue I've had is with PDFs with differing page sizes; it seems to scale everything up to the size of the largest page, which can be a bit of a pain.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
It does quite some preprocessing on the PDF pages before passing it on to tesseract.