This doesn't feel like a "reasoning" challenge. The mental skill required to sol...

jkhdigital · 2025-02-09T22:43:03 1739140983

I had the same thought. It reminds me of solving Project Euler problems, where there is often an obvious naive approach which is guaranteed to produce the correct answer but would consume prohibitive memory/compute resources to execute to completion. I suspect the models would perform much better if prompted to formulate a strategy for efficiently solving these challenges rather than solving them directly… which indicates a direction for potential improvement I suppose.

tkgally · 2025-02-09T22:53:46 1739141626

I agree that recall seems to play an important role in solving these problems. Similar to how the ARC-AGI problems seem to depend on visual perception of shapes and colors. When I come up with the correct answers to such puzzles, I feel subjectively that the answers flashed into my mind, not that I reasoned my way to them.

szundi · 2025-02-10T18:01:39 1739210499

Your base model is good then

enum · 2025-02-09T19:52:51 1739130771

It's definitely U.S. centric.

But, I do think this is reasoning. It requires recall, but anything other than pure logic puzzles do. For example, on a competition math problem or a programming problem, No person or LLM is inventing well-known lemmas and algorithms from first-principles.

XCabbage · 2025-02-09T19:58:40 1739131120

It's not just that it requires recall. It's that it requires no non-trivial thought beyond recall.

enum · 2025-02-09T20:05:36 1739131536

I think what you mean is that once you've managed to recall, checking constraints is easy. Remarkably, a few people are much better at this than others. They are able to think fast and execute an explicit mental search over a very small number of plausible candidates. Other people take forever. Seems to be the case for models too.

mort96 · 2025-02-10T18:56:05 1739213765

I think what you said is the same as what your comment said? "Requires no non-trivial thought besides recall" seems remarkably similar to "once you have recalled an item, checking that it fits the constraints is trivial"

Or are you pointing to a nuanced difference between "easy" and "trivial" that I'm not understanding? Or do you think it requires non-trivial thought before the recall step?