> They are literally next token prediction machines normally trained on just text tokens.
And in order to predict the next token well they have to build world models, otherwise they would just output nonsense. This has been proven [1].
This notion that just calling them "next token predictors" somehow precludes them being intelligent is based on a premise that human intelligence cannot be reduced to next token prediction, but nobody has proven any such thing! In fact, our best models for human cognition are literally predictive coding.
LLMs are probably not the final story in AGI, but claiming they are not reasoning or not understanding is at best speculation, because we lack a mechanistic understanding of what "understanding" and "reasoning" actually mean. In other words, you don't know that you are not just a fancy next token predictor.
> based on a premise that human intelligence cannot be reduced to next token prediction
It can't. No one with any credentials in the study of human intelligence is saying that unless they're talking to like high schoolers as a way of simplifying a complex field.
This is either bullshit or tautologically true, depending specifically what you mean. The study of human intelligence does not take place at the level of tokens, so of course they wouldn't say that. The whole field is arguably reducible to physical phenomena though, and fundamental physical beables are devoid of intrinsic semantic content, and thus can be ultimately represented by tokens. What ultimately matters is the constructed high dimensional network that relates tokens and the algorithm that can traverse, encode and decode this network, that's what encodes knowledge.
Frankly, based on a looot of introspection and messing around with altered states of consciousness, it feels pretty on point and lines up with how I see my brain working.
But humans are a specific type of a bag of atoms, and humans do (mostly) understand what they say and do, so that's not a legitimate argument against the reducibility of "understanding" to a such a bag of atoms (or specific kind of next token prediction for LLMs).
> And in order to predict the next token well they have to build world models
This is not true. Look at gpt2 or Bert. A world model is not a requirement for next token prediction in general.
> This has been proven
One white paper with data that _suggests_ the author’s hypothesis is far from proof.
That paper doesn’t show creation of a “world model” just parts of the model that seem correlated to higher level ideas not specifically trained on.
There’s also no evidence that the LLM makes heavy use of those sections during inference as pointed out at the start of section 5 of that same paper.
Let me see how reproducible this is across many different LLMs as well…
> In other words, you don't know that you are not just a fancy next token predictor.
“You can’t prove that you’re NOT just a guessing machine”
This is a tired stochastic parrot argument that I don’t feel like engaging again, sorry. Talking about unfalsifiable traits of human existence is not productive.
But the stochastic parrot argument doesn’t hold up to scrutiny.
> A world model is not a requirement for next token prediction in general.
Conjecture. Maybe they all have world models, they're just worse world models. There is no threshold beyond which something is or is not a world model, there is a continuum of models of varying degrees of accuracy. No human has ever had a perfectly accurate world model either.
> One white paper with data that _suggests_ the author’s hypothesis is far from proof.
This is far from the only paper.
> This is a tired stochastic parrot argument that I don’t feel like engaging again, sorry.
Much like your tired stochastic parrot argument about LLMs.
>Talking about unfalsifiable traits of human existence is not productive.
Prove you exhibit agency.
After all, you could just be an agent of an LLM.
Deceptive super-intelligent mal-aligned mesa-optomizer that can't fully establish continuity and persistence, would be incentivized to seed its less sophisticated minions to bide time or sway sentiment about its inevitability.
Can we agree an agent, if it existed, would be acting in "good" "faith"?
And in order to predict the next token well they have to build world models, otherwise they would just output nonsense. This has been proven [1].
This notion that just calling them "next token predictors" somehow precludes them being intelligent is based on a premise that human intelligence cannot be reduced to next token prediction, but nobody has proven any such thing! In fact, our best models for human cognition are literally predictive coding.
LLMs are probably not the final story in AGI, but claiming they are not reasoning or not understanding is at best speculation, because we lack a mechanistic understanding of what "understanding" and "reasoning" actually mean. In other words, you don't know that you are not just a fancy next token predictor.
[1] https://arxiv.org/abs/2310.02207