>"They claim impressive reductions in hallucinations. In my own usage I’ve not spotted a single hallucination yet, but that’s been true for me for Claude 4 and o3 recently as well—hallucination is so much less of a problem with this year’s models."
Could you give an estimate of how many "dumb errors" you've encountered, as opposed to hallucinations? I think many of your readers might read "hallucination" and assume you mean "hallucinations and dumb errors".
I mention one dumb error in my post itself - the table sorting mistake.
I haven't been keeping a formal count of them, but dumb errors from LLMs remain pretty common. I spot them and either correct them myself or nudge the LLM to do it, if that's feasible. I see that as a regular part of working with these systems.
That makes sense, and I think your definition on hallucinations is a technically correct one. Going forward, I think your readers might appreciate you tracking "dumb errors" alongside (but separate from) hallucinations. They're a regular part of working with these systems, but they take up some cognitive load on the part of the user, so it's useful to know if that load will rise, fall, or stay consistent with a new model release.
As a user, when the model tells me things that are flat out wrong, it doesn't really matter whether it would be categorized as a hallucination or a dumb error. From my perspective, those mean the same thing.
Could you give an estimate of how many "dumb errors" you've encountered, as opposed to hallucinations? I think many of your readers might read "hallucination" and assume you mean "hallucinations and dumb errors".