That shouldn't be what causes this problems; if we can see it's wrong despite th...

That shouldn't be what causes this problems; if we can see it's wrong despite the low resolution, the AI isn't going to fully replace humans for all tasks involving this kind of thing.

That said, even with this kind of error rate an AI can speed *some* things up, because having a human whose sole job is to ask "is this AI correct?" is easier and cheaper than having one human for "do all these things by hand" followed by someone else whose sole job is to check "was this human output correct?" because a human who has been on a production line for 4 hours and is about ready for a break also makes a certain number of mistakes.

But at the same time, why use a really expensive general-purpose AI like this, instead of a dedicated image model for your domain? Special purpose AI are something you can train on a decent laptop, and once trained will run on a phone at perhaps 10fps give or take what the performance threshold is and how general you need it to be.

If you're in a factory and you're making a lot of some small widget or other (so, not a whole motherboard), having answers faster than the ping time to the LLM may be important all by itself.

And at this point, you can just ask the LLM to write the training setup for the image-to-bounding-box AI, and then you "just" need to feed in the example images.