I recently needed to scan hundreds of low quality invoices and run them through OCR for invoice numbers and dates. I really took for granted how seamless this is in some applications, and was shocked how much work went into producing decent results.
I was obviously really naive. Either way, it gets me excited any time I see progress with OCR. I should give this a try against my (small) dataset.
I just ran Qwen against some of invoices that my gnarly algorithm really struggled with (with openai fallback) and Qwen was able to extract all relevant data without any issues. I'm pretty damn impressed to be honest.
I was obviously really naive. Either way, it gets me excited any time I see progress with OCR. I should give this a try against my (small) dataset.