At 0:52 in their demo video, there is a grammatical inconsistency in the agent's...

m_w_ · 2025-11-13T16:10:56 1763050256

I can't speak to the content of the actual game being played, but it wouldn't surprise me if there was an in-game text prompt:

> "The house that looks like a ripe tomato!"

that was transformed into a "user prompt" in a more instructional format

> "Go to the tomato house"

And both were used in the agent output. At least the Y-axes on the graphs look more reasonable than some other recent benchmarks.

vessenes · 2025-11-13T20:57:27 1763067447

The scene just before you describe has the user write "ripe tomato" in the description - you can see it in the video. The summary elides it, but the "ripe tomato" instruction is also clearly part of the context.