> It's the new "serverless" and I would really like people to stop making the discussion between about the word. You know what it means, I know what it means, let's all move on.
Well, parent is lamenting the lack of lowerbound/upperbound for "hallucinations", something that cannot realistically exist as "hallucinations" don't exist. LLMs aren't fact-outputting machines, so when it outputs something a human would consider "wrong" like "the sky is purple", it isn't true/false/correct/incorrect/hallucination/fact, it's just the most probable character after the next.
That's why it isn't useful to ask "but how much it hallucinates?" when in reality what you're out after is something more like "does it only output facts?". Which, if it did, LLMs would be a lot less useful.
There is a huge gap between "facts" and "nonfacts" which compose the majority of human discourse. Statements, opinions, questions, when properly qualified, are not facts or nonfacts or hallucinations.
LLM don't need to be perfect fact machines at all to be honest, and non-hallucinating. They simply need to ground statements in other grounded statements and identify the parts which are speculative or non-grounded.
If you simply want to ground statements in statements, you quickly get into GOFAI territory where you need to build up the full semantics of a sentence (in all supported languages) in order to prove that two sentences mean the same or have the same denotation or that one entails the other.
Otherwise, how do you prove the grounding isn't "hallucinated"?
The root issue is that us humans perceive our own grasp on things better than it is ( "better" may be the wrong word, maybe just "different"), of how exactly concepts are tied to each other in our heads, it's been a primordial tool for our survival, and for our day to day lives but it's at odds with the task of building reasoning skills in the machine, because language evolved first and foremost to communicate among beings that share a huge context, so for example our definition of the word "blue" in "the sky is blue" would be wildly different if humans were all blind (like the machine is, in a sense)
> it's just the most probable character after the next.
That's simply not true. You're confusing how they're trained and what they do. They don't have some store of exactly how likely each word is (and it's worth stopping to think about what that would even mean) for every possible sentence.
No it's fundamentally not true because when you say "most likely" it's the highest value output of the model, not what's most likely either in the underlying data or the goal of what is being trained for.
Well, parent is lamenting the lack of lowerbound/upperbound for "hallucinations", something that cannot realistically exist as "hallucinations" don't exist. LLMs aren't fact-outputting machines, so when it outputs something a human would consider "wrong" like "the sky is purple", it isn't true/false/correct/incorrect/hallucination/fact, it's just the most probable character after the next.
That's why it isn't useful to ask "but how much it hallucinates?" when in reality what you're out after is something more like "does it only output facts?". Which, if it did, LLMs would be a lot less useful.