I mostly use Gemini 2.5 Pro. I have a “you are my editor” prompt asking it to proofread my texts. Recently it pointed out two typos in two different words that just weren’t there. Indeed, the two words each had a typo but not the one pointed out by Gemini.
The real typos were random missing letters. But the typos Gemini hallucinated were ones that are very common typos made in those words.
The only thing transformer based LLMs can ever do is _faking_ intelligence.
Which for many tasks is good enough. Even in my example above, the corrected text was flawless.
But for a whole category of tasks, LLMs without oversight will never be good enough because there simply is no real intelligence in them.
I'll show you a few misspelled words and you tell me (without using any tools or thinking it through) which bits in the utf8 encoded bytes are incorrect. If you're wrong, I'll conclude you are not intelligent.
LLMs don't see letters, they see tokens. This is a foundational attribute of LLMs. When you point out that the LLM does not know the number of R's in the word "Strawberry", you are not exposing the LLM as some kind of sham, you're just admitting to being a fool.
Damn, if only something called a "language model" could model language accurately, let alone live up to its creators' claims that it possesses near-human intelligence. But yeah we can call getting some basic facts a "feature not a bug" if you want
So people that can't read or write have no language? If you don't know an alphabet and its rules, you won't know how many letters are in words. Does that make you unable to model language accurately?
So first of, people who _can't_ read or write have a certain disability (blindness or developmental, etc). That's not a reasonable comparison for LLMs/AI (especially since text is the main modality of an LLM).
I'm assuming you meant to ask about people who haven't _learned_ to read or write, but would otherwise be capable.
Is your argument then, that a person who hasn't learned to read or write is able to model language as accurately as one who did?
Wouldn't you say that someone who has read a whole ton of books would maybe be a bit better at language modelling?
Also, perhaps most importantly: GPT (and pretty much any LLM I've talked to) does know the alphabet and its rules. It knows. Ask it to recite the alphabet. Ask it about any kind of grammatical or lexical rules. It knows all of it. It can also chop up a word from tokens into letters to spell it correctly, it knows those rules too. Now ask it about Chinese and Japanese characters, ask it any of the rules related to those alphabets and languages. It knows all the rules.
This to me shows the problem is that it's mainly incapable of reasoning and putting things together logically, not so much that it's trained on something that doesn't _quite_ look like letters as we know them. Sure it might be slightly harder to do, but it's not actually hard, especially not compared to the other things we expect LLMs to be good at. But especially especially not compared to the other things we expect people to be good at if they are considered "language experts".
If (smart/dedicated) humans can easily learn the Chinese, Japanese, Latin and Russian alphabets, then why can't LLMs learn how tokens relate to the Latin alphabet?
Remember that tokens were specifically designed to be easier and more regular to parse (encode/decode) than the encodings used in human languages ...
But actually, you can see an intense enough source of (monochromatic) near-UV light, our lenses only filter out the majority of it.
And if you did, your brain would hallucinate it as purplish-blueish white. Because that's the closest color to those inputs based on your what your neural network (brain) was trained on. It's encountering something uncommon, so it guesses and present it as fact.
From this, we can determine either that you (and indeed all humans) are not actually intelligent, or alternatively, intelligence and cognition are complicated and you can't conclude its absence from the first time someone behaves in a way you're not trained to expect from your experience of intelligence.
If I had learned to read utf8 bytes instead of Latin alphabet, this would be trivial. In fact give me a (paid) week to study utf8 for reading and I am sure I could do it. (yes I already know how utf8 works)
And the token/strawberry thing is a non-excuse. They just can't count. I can count the number of syllables in a word, regardless of how it's spelled, that's also not based on letters. Or if you want a sub-letter equivalent, I could also count the number of serifs, dots or curves in a word.
It's really not so much that the strawberry thing is a "gotcha", or easily explained by "they see tokens instead", because the same reasoning errors happen all the time in LLMs also in places where "it's because of tokens" can't possibly be the explanation. It's just that the strawberry thing is one of the easiest ways to show it just can't reason reliably.
Being confused as to how LLMs see tokens is just a factual error.
I think the more concerning error GP makes is how he makes deductions on fundamental nature of the intelligence of LLMs by looking at "bugs" in current iterations of LLMs. It's like looking at a child struggling to learn how to spell, and making broad claims like "look at the mistakes this child made, humans will never attain any __real__ intelligence!"
So yeah at this point I'm often pessimistic whether humans have "real" intelligence or not. Pretty sure LLMs can spot the logical mistakes in his claims easily.
Your explanation perfectly captures another big differences between human / mammal intelligence and LLM intelligence: A child can make mistakes and (few shot) learn. A LLM can’t.
And even a child struggling with spelling won’t make a mistake like the one I have described. It will spell things wrong and not even catch the spelling mistake. But it won’t pretend and insist there is a mistake where there isn’t (okay, maybe it will, but only to troll you).
Maybe talking about “real” intelligence was not precise enough and it’s better to talk about “mammal like intelligence.”
I guess there is a chance LLMs can be trained to a level where all the questions where there is a correct answer for (basically everything that can be benchmarked) will be answered correctly. Would this be incredibly useful and make a lot of jobs obsolete? Yes. Still a very different form of intelligence.
> A child can make mistakes and (few shot) learn. A LLM can’t.
Considering that we literally call the process of giving an llm several attempts at a problem "few-shot reasoning", I do not understand your reasoning here.
And LLM absolutely can "gain acquire knowledge of or skill in (something)" of things within its context window (i.e. learning). And then you can bake those understandings in by making a LoRa, or further training.
If this is really your distinction that makes intelligence, the only difference between llms and human brains is that human brains have a built-in mechanism to convert short-term memory to long-term, and llms haven't fully evolved that.
> When you point out that the LLM does not know the number of R's in the word "Strawberry", you are not exposing the LLM as some kind of sham, you're just admitting to being a fool.
I'm sorry but that's not reasonable. Yes, I understand what you mean on an architectural level, but if a product is being deployed to the masses you are the fool if you expect every user to have a deep architectural understanding of it.
If it's being sold as "this model is a PhD-level expert on every topic in your pocket", then the underlying technical architecture and its specific foibles are irrelevant. What matters is the claims about what it's capable of doing and its actual performance.
Would it matter if GPT-5 couldn't count the number of r's in a specific word if the marketing claims being made around it were more grounded? Probably not. But that's not what's happening.
I think we’re saying the same thing using different words. What LLMs do and what human brains do are very different things. Therefore human / biological intelligence is a different thing than LLM intelligence.
I had this too last week. It pointed out two errors that simply weren’t there. Then completely refused to back down and doubled down on its own certainty, until I sent it a screenshot of the original prompt. Kind of funny.
do you really think that an architecture that struggles to count r in strawberry is a good choice for proofreading? It perceives words very differently from us.
Counting letters in words and identifying when words are misspelled are two different tasks - it can be good at one and bad at the other.
Interestingly, spell checking is something models have been surprisingly bad at in the past - I remember being shocked at how bad Claude 3 was at spotting typos.
This has changed with Claude 4 and o3 from what I've seen - another example of incremental model improvements swinging over a line in terms of things they can now be useful for.
Wasn't expecting a "you're a shill" accusation to show up on a comment where I say that LLMs used to suck at spell check but now they can just about do it.
So 2 trillion dollars to do what Word could do in 1995... and trying to promote that as an advancement is not propaganda? Sure let's double the amount of resources a couple more times who knows what it will be able to take on after mastering spelling.
Yes, actually I think it works really well for me considering that I’m not a native speaker and one thing I’m after is correcting technical correct but non-idiomatic wording.
The real typos were random missing letters. But the typos Gemini hallucinated were ones that are very common typos made in those words.
The only thing transformer based LLMs can ever do is _faking_ intelligence.
Which for many tasks is good enough. Even in my example above, the corrected text was flawless.
But for a whole category of tasks, LLMs without oversight will never be good enough because there simply is no real intelligence in them.