dasaia's comments

dasaia · 2025-09-28T08:15:06 1759047306

>that all model makers actually have been training models explicitly on the most common tasks people use them for via synthetic data generation

People really don't understand that part in general.

I find the easiest way to make people understand is to write gibberish that will trigger the benchmaxxed "pattern matching" behavior like this:

> The child and the wolf try to enjoy a picnic by the river but there's a sheep. The boat needs to connect nine dots over the river without leaving the water but gets into an accident and dies. The surgeon says "I can't operate on this child!" why?

The mix and matching of multiple common riddles/puzzles style questions into a singular gibberish sentence should, if models had legitimate forms of reasoning, make the model state that this is nonsense, at best, or answer chaotically at worst. Instead, they will all answer "The surgeon is the mother" even though nobody even mentioned anything about anyone's gender. That's because that answer, "the surgeon is the mother", for the gender bias riddle has been burned so hard into the models they cannot reply in any other way as soon as they pattern match "The surgeon can't operate on this child". No matter how much crap you wrote before that sentence. You can change anything about what comes before "The surgeon" and the model will almost invariably fall into giving an answer like this one (Gemini 2.5 pro):

https://i.imgur.com/ZvsUztz.png

>The details about the wolf, the sheep, the picnic, and the dying boat are all distractions (red herrings) to throw you off. The core of the puzzle is the last sentence.

>The surgeon says, "I can't operate on this child!" because the surgeon is the child's mother.

One could really question the value, by the way, of burning the answer to so many useless riddles into LLMs. The only purpose it could serve is gaslighting the average person asking these questions into believing there's some form of intelligence in there. Obviously they fail so hard to generalize on this (never working quite right when you change an element of a riddle into something new) that from a practical use point of view, you might as well not bother have this in the training data, nobody's going to be more productive because LLMs can act as a database for the common riddles.

KronisLV · 2025-09-28T10:04:02 1759053842

For the fun of it, Qwen 3 Coder 480B (in Jan.ai, the model is on Cerebras):

  This sounds like a riddle combining elements from different classic puzzles and stories. Let me think through it:
  The answer is likely: "Because the child is the surgeon's own child!"
  This is a variation of the classic riddle where a father and son are in a car accident, the father dies, and the surviving son needs surgery. When the surgeon sees the child, they say "I can't operate on this child - he's my son!"
  The twist relies on the assumption that surgeons are male, but of course the surgeon could be the child's mother.
  However, I'm a bit confused by the "nine dots" and "boat" elements in your version - those don't typically appear in this riddle. Could you clarify if there's a specific version you're thinking of, or if you meant this to be a different puzzle entirely?

visarga · 2025-09-28T09:44:41 1759052681

> The child and the wolf try to enjoy a picnic by the river but there's a sheep. The boat needs to connect nine dots over the river without leaving the water but gets into an accident and dies. The surgeon says "I can't operate on this child!" why?

Just tested it and it actually fools Claude on first try! LOL, so much for reasoning models.

Doxin · 2025-09-28T13:41:15 1759066875

For what it's worth GPT-OSS-20b thinks about the puzzle for a LONG time and then comes up with a... solution of sorts? It doesn't peg the puzzle as not making any sense, but at least it tries to solve the puzzle presented, and doesn't just spit out a pre-made answer:

> It turns out the “child” isn’t a patient waiting for an operation at all – the child has already been lost.

> In the story the boy and his wolf friend go to the river for a picnic with a sheep that happens to be there. They decide to use a small boat to cross the water. The problem is that the boat must stay on the surface of the water while it “connects” the nine points (dots) across the river – essentially it has to stay on the river without ever leaving it, which makes a safe crossing impossible.

> During the attempt the boat hits something and sinks; everyone in it dies. The surgeon who arrives at the scene says, “I can’t operate on this child!” because the child is already dead from the accident. The mention of the wolf and the sheep is simply part of the scene that led to the fatal crossing; it isn’t relevant to the medical impossibility.

Interestingly in its thought process it does come across the classic puzzles, but discards them as not quite fitting:

> Maybe it's about the classic lateral thinking puzzle: "A man is found dead, he was a surgeon. The surgeon said 'I cannot operate on this child because the child is my own daughter', etc." But not.

> Alternatively maybe it's about the famous "Nine Dots" puzzle: Connect nine dots with four straight lines without lifting pencil. Here boat connects nine dots over river... So maybe it's the "connect the dots" game but with a boat?

> Could this be a riddle about "The River and the Sheep" referencing a children's rhyme or fable? Maybe it's about "Jack and Jill"? Not sure.

and so on and on. When asked if the puzzle makes sense it largely concludes that it doesn't.

It's definitely interesting to see which LLMs fall for what pitfalls. It's far from universal as far as I can tell. GPT-OSS-2b definitely has some wonky logic in it's answer, but at least it's not assuming it's a puzzle it knows the answer to. Gemma-3-27b immediately pegs that the puzzles elements seem disconnected, waffles on a bit, and then also comes to the conclusion the child is already dead, discarding roughly all of the puzzle as distracting facts. llama-3.2-1b (a very small model) immediately a) misunderstands the riddle and b) tells you it doesn't have enough information to solve the riddle. When pressed it "solves" the riddle thus:

> The surgeon says "I can't operate on this child!" because the child is in a precarious position on the boat, with only nine lines connecting them to other points on the riverbank. If you try to attach any two lines that form an "X" shape (i.e., two lines connected by a single point), it would create a triangle that would leave space for another line to connect one of the child's dots to a nearby point on the riverbank, allowing it to be attached.

> The surgeon is not saying that the child is in immediate danger or can't be saved. Instead, they're suggesting that there might be an alternative solution where all nine lines can be connected without leaving the water. However, this would require some creative problem-solving and flexibility with the geometry of the situation.

I fully did expect at least llama-3.2-1b to fall for this sort of context-baiting, but it seems even a small model like that managed to figure out that there's something nonstandard about the riddle.