Comparisons to humans are ultimately misleading because 1) humans are not general intelligences most of the time, 2) humans run on incredibly faulty hardware.
1) Attention is limited. Human reasoning is slow. Motivation is limited. System 1 vs 2 thinking. Many will just tell you to fuck off or get bored and give some random answer to make you go away. Etc. See difference 2.
2) People run on limited hardware in terms or error rate and memory.
2a) Brains make mistakes all the time. Ask them to multiply a bunch of large numbers, using pen and paper they will get it wrong a lot of the time.
2b) Doing it in their head, they will run out of memory pretty fast.
But you wouldn't say that humans can't multiply numbers. When they have the right algorithm, they can do it, they just have to use the right tools to extend their memory and check for errors. A human who notices the difference in input to something he already knows, immediately knows he has to pay attention to that bit and all subsequent parts which depend on it. Once a human has the right algorithm, he can apply it to different inputs.
LLMs:
comparison to 2a: Current LLMs also make a lot of mistakes. But theirs are not a result of faulty or limited hardware, they are the result of a faulty algorithm. Take away the random seeds and an LLM will make the same mistake over and over. Randomness is the smoke and mirrors which make LLMs seem more "alive" and less like machines imperfectly imitating humans.
comparison to 2b) Current LLMs do not store statements in an abstract, structured form where they could save and load information and perform steps such as inferring redundant information from the rest. They operate on the token stream which is probably wasteful in terms of memory and less flexible in terms of what they operations they can perform on it.
Most importantly, they are not limited by memory. The input clearly states "the wolf will eat the cabbage", yet the LLM generates "This is safe because the wolf won't eat the cabbage if they're together on the far side." just a few lines below. It is unable to infer those two facts are contradictory. The statistics of tokens simply worked out in a way that lead to this.
1) Attention is limited. Human reasoning is slow. Motivation is limited. System 1 vs 2 thinking. Many will just tell you to fuck off or get bored and give some random answer to make you go away. Etc. See difference 2.
2) People run on limited hardware in terms or error rate and memory.
2a) Brains make mistakes all the time. Ask them to multiply a bunch of large numbers, using pen and paper they will get it wrong a lot of the time.
2b) Doing it in their head, they will run out of memory pretty fast.
But you wouldn't say that humans can't multiply numbers. When they have the right algorithm, they can do it, they just have to use the right tools to extend their memory and check for errors. A human who notices the difference in input to something he already knows, immediately knows he has to pay attention to that bit and all subsequent parts which depend on it. Once a human has the right algorithm, he can apply it to different inputs.
LLMs:
comparison to 2a: Current LLMs also make a lot of mistakes. But theirs are not a result of faulty or limited hardware, they are the result of a faulty algorithm. Take away the random seeds and an LLM will make the same mistake over and over. Randomness is the smoke and mirrors which make LLMs seem more "alive" and less like machines imperfectly imitating humans.
comparison to 2b) Current LLMs do not store statements in an abstract, structured form where they could save and load information and perform steps such as inferring redundant information from the rest. They operate on the token stream which is probably wasteful in terms of memory and less flexible in terms of what they operations they can perform on it.
Most importantly, they are not limited by memory. The input clearly states "the wolf will eat the cabbage", yet the LLM generates "This is safe because the wolf won't eat the cabbage if they're together on the far side." just a few lines below. It is unable to infer those two facts are contradictory. The statistics of tokens simply worked out in a way that lead to this.