From the post: "Evaluation began immediately after the 2025 IMO problems were re...

os2warpman · 2025-07-19T15:00:59 1752937259

What they mean is that in a couple of weeks there are going to be stories titled "LLMS NOW BETTER THAN HUMANS AT 2025 INTERNATIONAL MATH OLYMPIAD" (stories published as thinly-veiled investment solicitations) but in reality they're still shitty-- they've just had the answers fed in to be spit back out.

sorokod · 2025-07-19T15:17:23 1752938243

Companies would game metrics whenever they have the opportunity. What else is new?

esafak · 2025-07-19T15:41:22 1752939682

I suppose what's new is that the models aren't as smart as their companies claimed.

chvid · 2025-07-19T15:03:36 1752937416

Not really.