Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm running a large scale object detection/classification and ocr pipeline at the moment, figuring out the properties of all doorbells, mailboxes and house number signs in an european country (don't ask lmao).

This article resonates a lot, we have OCR and "semantic" pipeline steps using a VLM, and while it works very well most of the time, there are absurdly weird edge cases. Structuring the outputs via tool calls helps a little in reducing these, but still, it's clear that there is little reasoning and a lot of memorizing going on.




Agreed. It would be even more dangerous if we were talking about weird edge cases in self-driving cars or medical imaging.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: