Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In 2019 I was working on a project that involved OCRing millions of scanned historical documents. I evaluated Google, Azure, Amazon, Adobe, ABBYY, and Tesseract somewhat rigorously.

Google's was by far the best, especially for obscured or malformed characters. Azure was second and I ended up merging the results from both.

For my use case (in Spring 2019) Tesseract was not very accurate and struggled with slanted text especially. Hopefully that has changed.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: