Hacker News new | past | comments | ask | show | jobs | submit login

Your quote is accurate from here:

https://arcprize.org/blog/oai-o3-pub-breakthrough

They were talking about training on the public dataset -- OpenAI tuned the o3 model with 75% of the public dataset. There was some idea/hope that these LLMs would be able to gain enough knowledge in the latent space that they would automatically do well on the ARC-AGI problems. But using 75% of the public training set for tuning puts them at the about same challenge level as all other competitors (who use 100% of training).

In the post they were saying they didn't have a chance to test the o3 model's performance on ARC-AGI "out of-the-box", which is how the 14% scoring R1-zero was tested (no SFT, no search). They have been testing the LLMs out of the box like this to see if they are "smart" wrt the problem set by default.






Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: