Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yup, it flunked that one.

I also have a question that LLMs always got wrong until ChatGPT o3, and even then it has a hard time (I just tried it again and it needed to run code to work it out). Qwen3 failed, and every time I asked it to look again at its solution it would notice the error and try to solve it again, failing again:

> A man wants to cross a river, and he has a cabbage, a goat, a wolf and a lion. If he leaves the goat alone with the cabbage, the goat will eat it. If he leaves the wolf with the goat, the wolf will eat it. And if he leaves the lion with either the wolf or the goat, the lion will eat them. How can he cross the river?

I gave it a ton of opportunities to notice that the puzzle is unsolvable (with the assumption, which it makes, that this is a standard one-passenger puzzle, but if it had pointed out that I didn't say that I would also have been happy). I kept trying to get it to notice that it failed again and again in the same way and asking it to step back and think about the big picture, and each time it would confidently start again trying to solve it. Eventually I ran out of free messages.



4o with thinking:

By systematic (BFS) search of the entire 32-state space under these rules, one finds no path from to that stays always safe. Thus the puzzle has no solution—there is no way for the man to ferry all four items across without at least one of them being eaten.


You go with the cabbage, goat, wolf and lion all together!


O3 gave me basically that solution. "Below is the shortest safe schedule that really works ‒ but it assumes the boat can hold the man plus two passengers (three beings total). If your version of the puzzle only lets him move one passenger at a time, the puzzle has no solution: at the very first trip he would always leave at least one forbidden pair alone."


i tried grok 3 with Think and it was right also with pretty good thinking


I don't have access to Think, but I tried Grok 3 regular, and it was hilarious, one of the longest answers I've ever seen.

Just giving the headings, without any of the long text between each one where it realizes it doesn't work, I get:

    Solution
        [... paragraphs of text ommitted each time]
    Issue and Revision
    Revised Solution
    Final Solution
    Correct Sequence
    Final Working Solution
    Corrected Final Solution
    Final Correct Solution
    Successful Solution
    Final answer
    Correct Final Sequence
    Final Correct Solution
    Correct Solution
    Final Working Solution
    Correct Solution
    Final Answer
    Final Answer
Each time it's so confident that it's worked out the issue, and now, finally, it has the correct, final, working solution. Then it blows it again.

I'm surprised I didn't start seeing heading titles such as "Working solution-FINAL (3) revised updated ACTUAL-FINAL (2)"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: