Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is solvable in roughly half an hour on pen and paper by a random person I picked with no special math skills (beyond a university). This is far from a difficult problem. The "95%+" in math reasoning is a meaningless standard, it's like saying a model is better than 99.9% of world population in Albanian language, since less than 0.1% bother to learn Albanian.

Even ignoring the fact that this or similar problem may have appeared in the training data, it's something a careful brute-force math logic should solve. It's neither difficult, nor interesting, nor useful. Yes, it may suggest a slight improvement on the basic logic, but no more so than a million other benchmarks people quote.

This goes to show that evaluating models is not a trivial problem. In fact, it's a hard problem (in particular, it's a far far harder than this math puzzle).



The "random person" you picked is likely very, very intelligent and not at all a good random sample. I'm not saying this is difficult to the extent that it merits academic focus, but it is NOT a simple problem and I suspect less than 1% of the population could solve this in half an hour "with no special math skills." You have to be either exceedingly clever or trained in a certain type of reasoning or both.


I agree with your general point that this "random person" is probably not representative of anything close to an average person off the street, but I think the phrasing "very very intelligent" and "exceedingly clever" is kinda misleading.

In my experience, the difference between someone who solves this type of logic puzzle and someone who doesn't, has more to do with persistence and ability to maintain focus, rather than "intelligence" in terms of problem-solving ability per se. I've worked with college students helping them learn to solve these kinds of problems (eg. as part of pre-interview test prep), and in most cases, those who solve it and those who don't have the same rate of progress towards the solution as long as they're actively working at it. The difference comes in how quickly they get frustrated (at themselves mostly), decide they're not capable of solving it, and give up on working on it further.

I mention this because this frustration itself comes from a belief that the ability to solve these belongs some "exceedingly clever" people only, and not someone like them. So, this kind of thinking ends up being a vicious cycle that keeps them from working on their actual issues.


I solved it in less than 15 minutes while walking my dog, no pen or paper. But I wouldn't claim to be a random person without math skills. And my very first guess was correct.

It was a fun puzzle though and I'm surprised I didn't know it already. Thanks for sharing.


My dog solved it in less than 14 minutes, no pen or paper, and no fingers.

Seriously though, nice work.


So in the three hours between you reading the puzzle in the parent comment, you stopped what you were doing, managed to get some other "random" person to stop what they were doing and spend half an hour of their time on a maths puzzle that at that point prior experience suggested could take a day? All within three hours?

That's not to say that you didn't, or you're recalling from a previous time that happens to be this exact puzzle (despite there being scant prior references to this puzzle, and precisely the reason for using it). But you can see how some might see that as not entirely credible.

Best guess: this random person is someone that really likes puzzles, is presumably good at them and is very, very far from being representative to the extent you would require to be in support of your argument.

Read: just a heavy flex about puzzle solving.


> This is solvable in roughly half an hour on pen and paper by a random person I picked with no special math skills (beyond a university).

I randomly answered this post and can't solve it in half an hour. Is the point leet code but for AI? I rather it solve real problems than "elite problems".

Side note: couldn't even find pen and paper around in half an hour.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: