Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I still think complaining about "hallucination" is a pretty big "tell".

The conversation around LLMs is so polarized. Either they’re dismissed as entirely useless, or they’re framed as an imminent replacement for software developers altogether.

Hallucinations are worth talking about! Just yesterday, for example, Claude 4 Sonnet confidently told me Godbolt was wrong wrt how clang would compile something (it wasn’t). That doesn’t mean I didn’t benefit heavily from the session, just that it’s not a replacement for your own critical thinking.

Like any transformative tool, LLMs can offer a major productivity boost but only if the user can be realistic about the outcome. Hallucinations are real and a reason to be skeptical about what you get back; they don’t make LLMs useless.

To be clear, I’m not suggesting you specifically are blind to this fact. But sometimes it’s warranted to complain about hallucinations!





That's not what people mean when they bring up "hallucinations". What the author apparently meant was that they had an agent generating Terraform for them, and that Terraform was broken. That's not surprising to me! I'm sure LLMs are helpful for writing Terraform, but I wouldn't expect that agents are at the point of being able to reliably hand off Terraform that actually does anything, because I can't imagine an agent being given permission to iterate Terraform. Now have an agent write Java for you. That problem goes away: you aren't going to be handed code with API calls that literally don't exist (this is what people mean by "hallucination"), because that could wouldn't pass a compile or linter pass.

Are we using the same LLMs? I absolutely see cases of "hallucination" behavior when I'm invoking an LLM (usually sonnet 4) in a loop of "1 generate code, 2 run linter, 3 run tests, 4 goto 1 if 2 or 3 failed".

Usually, such a loop just works. In the cases where it doesn't, often it's because the LLM decided that it would be convenient if some method existed, and therefore that method exists, and then the LLM tries to call that method and fails in the linting step, decides that it is the linter that is wrong, and changes the linter configuration (or fails in the test step, and updates the tests). If in this loop I automatically revert all test and linter config changes before running tests, the LLM will receive the test output and report that the tests passed, and end the loop if it has control (or get caught in a failure spiral if the scaffold automatically continues until tests pass).

It's not an extremely common failure mode, as it generally only happens when you give the LLM a problem where it's both automatically verifiable and too hard for that LLM. But it does happen, and I do think "hallucination" is an adequate term for the phenomenon (though perhaps "confabulation" would be better).

Aside:

> I can't imagine an agent being given permission to iterate Terraform

Localstack is great and I have absolutely given an LLM free rein over terraform config pointed at localstack. It has generally worked fine and written the same tf I would have written, but much faster.


With terraform, using a property or a resource that doesn't exist is effectively the same as an API call that does not exist. It's almost exactly the same really, because under the hood terraform will try to make a gcloud/aws API call with your param and it will not work because it doesn't exist. You are making a distinction without a difference. Just because it can be caught at runtime doesn't make it insignificant.

Anyway, I still see hallucinations in all languages, even javascript, attempting to use libraries or APIs that do not exist. Could you elaborate on how you have solved this problem?


> Anyway, I still see hallucinations in all languages, even javascript, attempting to use libraries or APIs that do not exist. Could you elaborate on how you have solved this problem?

Gemini CLI (it's free and I'm cheap) will run the build process after making changes. If an error occurs, it will interpret it and fix it. That will take care of it using functions that don't exist.

I can get stuck in a loop, but in general it'll get somewhere.


Yeah, again, zero trouble believing that agents don't reliably produce sane Terraform.

As if a compiler or linter is the sole arbiter of correctness.

Nobody said anything about "correctness". Hallucinations aren't bugs. Everybody writes bugs. People writing code don't hallucinate.

It's a pretty obvious rhetorical tactic: everybody associates "hallucination" with something distinctively weird and bad that LLMs do. Fair enough! But then they smuggle more meaning into the word, so that any time an LLM produces anything imperfect, it has "hallucinated". No. "Hallucination" means that an LLM has produced code that calls into nonexistent APIs. Compilers can and do in fact foreclose on that problem.


Speaking of rhetorical tactics, that's an awfully narrow definition of LLM hallucination designed to evade the argument that they hallucinate.

If, according to you, LLMs are so good at avoiding hallucinations these days, then maybe we should ask an LLM what hallucinations are. Claude, "in the context of generative AI, what is a hallucination?"

Claude responds with a much broader definition of the term than you have imagined -- one that matches my experiences with the term. (It also seemingly matches many other people's experiences; even you admit that "everybody" associates hallucination with imperfection or inaccuracy.)

Claude's full response:

"In generative AI, a hallucination refers to when an AI model generates information that appears plausible and confident but is actually incorrect, fabricated, or not grounded in its training data or the provided context.

"There are several types of hallucinations:

"Factual hallucinations - The model states false information as if it were true, such as claiming a historical event happened on the wrong date or attributing a quote to the wrong person.

"Source hallucinations - The model cites non-existent sources, papers, or references that sound legitimate but don't actually exist.

"Contextual hallucinations - The model generates content that contradicts or ignores information provided in the conversation or prompt.

"Logical hallucinations - The model makes reasoning errors or draws conclusions that don't follow from the premises.

"Hallucinations occur because language models are trained to predict the most likely next words based on patterns in their training data, rather than to verify factual accuracy. They can generate very convincing-sounding text even when "filling in gaps" with invented information.

"This is why it's important to verify information from AI systems, especially for factual claims, citations, or when accuracy is critical. Many AI systems now include warnings about this limitation and encourage users to double-check important information from authoritative sources."


What is this supposed to convince me of? The problem with hallucinations is (was?) that developers were getting handed code that couldn't possibly have worked, because the LLM unknowingly invented entire libraries to call into that don't exist. That doesn't happen with agents and languages with any kind of type checking. You can't compile a Rust program that does this, and agents compile Rust code.

Right across this thread we have the author of the post saying that when they said "hallucinate", they meant that if they watched they could see their async agent getting caught in loops trying to call nonexistent APIs, failing, and trying again. And? The point isn't that foundation models themselves don't hallucinate; it's that agent systems don't hand off code with hallucinations in it, because they compile before they hand the code off.


If I ask an LLM to write me a skip list and it instead writes me a linked list and confidently but erroneously claims it's a skip list, then the LLM hallucinated. It doesn't matter that the code compiled successfully.

Get a frontier model to write an slist when you asked for a skip list. I'll wait.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: