Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the post: "Evaluation began immediately after the 2025 IMO problems were released to prevent contamination."

Doe this address your concern?



What they mean is that in a couple of weeks there are going to be stories titled "LLMS NOW BETTER THAN HUMANS AT 2025 INTERNATIONAL MATH OLYMPIAD" (stories published as thinly-veiled investment solicitations) but in reality they're still shitty-- they've just had the answers fed in to be spit back out.


Companies would game metrics whenever they have the opportunity. What else is new?


I suppose what's new is that the models aren't as smart as their companies claimed.


Not really.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: