Peak denialism? Answering LSAT questions requires general intelligence. They present real life scenarios that test-taker has to understand. It requires "common sense" knowledge about the world and reasoning ability. It's not something you can memorize answers to or solve by following prescribed patterns or templates. And GPT-4 wasn't trained specifically to solve LSAT questions.
For the human brain, the LSAT requires reasoning. But not for an LLM. Do we even know exactly what data this is trained on? I have only seen vague references to what data they are using. If it is trained on large chunks of the internet, then it certainly is trained on LSAT practice questions. And because LSAT questions follow a common pattern, it is well suited to a LLM. There isn't any reasoning or general intelligence at all. Just really good statistics applied to large amounts of data.
From the article: "We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative—see our technical report for details."
I’m skeptical. There is a lot wiggle room in “no specific training”. Could just mean the didn’t fine tune the model for any of tests. Their training data probably included many past LSAT exams and certainly included many instances of people discussing how to solve LSAT problems.
As others have said elsewhere, the issue remains accuracy. I wish every response comes with an accurate estimation of how true the answer is, because at the moment it gives wrong answers as confidently as right ones.
So the thing is, giving wrong answers with confidence is literally what we train students to do when they are unsure.
I can remember my GRE coach telling me that it was better to confidently choose an answer I only had 50% confidence in, rather than punt on the entire question.
AIs hallucinate because, statistically, it is 'rewarding' for them to do so. (In RLHF)
> Answering LSAT questions requires general intelligence.
Obviously not, since GPT-4 doesn't have general intelligence. Likewise "common sense," "knowledge about the world," nor "reasoning ability."
As just one example, reasoning ability: GPT-4 failed at this problem I just came up with: "If Sarah was twice as old as Jimmy when Jimmy was 1/3 as old as Jane, and Jane is as much older than Sarah as Sarah is older than Jimmy, and Sarah is now 40, how old are Jane and Jimmy?"
First, every answer GPT-4 came up with contradicted the facts given: they were just wrong. But beyond that, it didn't recognize that there are many solutions to the problem. And later when I gave it an additional constraint to narrow it to one solution, it got the wrong answer again. And when I say "wrong," I mean that its answer clearly contradicted the facts given.