There was a surprising amount of useful OCR happening in the 70’s. High error ra...

GuB-42 · 2025-06-04T22:55:41 1749077741

I find that modern OCR, audio transcription, etc... are beginning to have the opposite problem: they are too smart.

It means that they make a lot fewer mistakes, but when they do, it can be subtle. For example, if the text is "the bat escaped by the window", a dumb OCR can write "dat" instead of "bat". When you read the resulting text, you notice it and using outside clues, recover the original word. An smart OCR will notice that "dat" isn't a word and can change it for "cat", and indeed "the cat escaped by the window" is a perfectly good sentence, unfortunately, it is wrong and confusing.

devilbunny · 2025-06-05T02:55:36 1749092136

Thankfully, most speech misrecognition events are still obvious. I have seen this in OCR and, as you say, it is bad. There are enough mistakes in the sources; let us not compound them.