Thanks for sharing — that article points at the same core tension. Determinism isn’t about rejecting probabilistic systems, it’s about deciding where uncertainty is allowed to live.
What keeps breaking in practice is when probabilistic reasoning leaks into places that expect reproducibility and accountability.
What matters most is how well OCR and structured data extraction tools handle documents with high variation at production scale. In real workflows like accounting, every invoice, purchase order, or contract can look different. The extraction system must still work reliably across these variations with minimal ongoing tweaks.
Equally important is how easily you can build a human-in-the-loop review layer on top of the tool. This is needed not only to improve accuracy, but also for compliance—especially in regulated industries like insurance.
At instances where data accuracy is of paramount importance, i think a hybrid route of non-llm ocr for data parsing and LLMs for structured data extraction is the safe passage to tread on. Seen better results for LLMWhisperer(OCR)[1] and Latest Gemini.
reply