"Raising visibility on this note we added to address ARC "tuned" confusion:
> OpenAI shared they trained the o3 we tested on 75% of the Public Training set.
This is the explicit purpose of the training set. It is designed to expose a system to the core knowledge priors needed to beat the much harder eval set.
The idea is each training task shows you an isolated single prior. And the eval set requires you to recombine and abstract from those priors on the fly. Broadly, the eval tasks require utilizing 3-5 priors.
Yeah, that makes this result a lot less impressive for me.