LLMs model their corpus, which for most models tends to be factually correct text (or subjective text with no factuality). Sure, there exist factually incorrect statements in the corpus, but for the vast majority of incorrect statements there exist many more equivalent but correct statements. If an LLM makes a statement that is not supported by the training data (either because it doesn't exist or because the equivalent correct statement is more strongly supported), I think that's an issue with the implementation of the model. I don't think it's an intrinsic feature/flaw in what the model is modeling.
Hallucination might not be the best word, but I don't think it's a bad word. If a weather model predicted a storm when there isn't a cloud in the sky, I wouldn't have a problem with saying "the weather model had a hallucination." 50 years ago, weather models made incorrect predictions quite frequently. That's not because they weren't modeling correct weather, it's because we simply didn't yet have good models and clean data.
Fundamentally, we could fix most LLM hallucinations with better model implementations and cleaner data. In the future we will probably be able to model factuality outside of the context of human language, and that will probably be the ultimate solution for correctness in AI, but I don't think that's a fundamental requirement.
This isnt going to happen with better data. Better data means it will be better at predicting the next token.
For questions or interactions where you need to process, consider, decompose a problem into multiple steps, solve those steps etc - you need to have a goal, tools, and the ability to split your thinking and govern the outcome.
That isnt predicting the next token. I think it’s easier to think of LLMs as doing decompression.
They take an initial set of tokens and decompress them into the most likely final set of tokens.
What we want is processing.
We would have to set up the reaction to somehow perfectly result in the next set of tokens to then set up the next set of tokens etc - till the system has an answer.
Or in other words, we have to figure out how to phrase an initial set of tokens so that each subsequent set looks similar enough to “logic” in the training data, that the LLM expands correctly.
Hallucination might not be the best word, but I don't think it's a bad word. If a weather model predicted a storm when there isn't a cloud in the sky, I wouldn't have a problem with saying "the weather model had a hallucination." 50 years ago, weather models made incorrect predictions quite frequently. That's not because they weren't modeling correct weather, it's because we simply didn't yet have good models and clean data.
Fundamentally, we could fix most LLM hallucinations with better model implementations and cleaner data. In the future we will probably be able to model factuality outside of the context of human language, and that will probably be the ultimate solution for correctness in AI, but I don't think that's a fundamental requirement.