Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a point of interest and for comparison, Gemini 2.5 Pro is able to generate a Python program that outputs the complete correct solution when run, but it can't figure out how to one-shot the problem if asked directly.

This is just a for-fun test to get a sense of how models are progressing; it highlights the jagged nature of their intelligence and capabilities. None of the big AI labs are testing for such a basic problem type, which makes it a bit of an interesting check.

I think it's still interesting to see how Grok 4 performs, even if we don't use this test to draw any broader conclusions about what capabilities it offers.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: