I would say they are a fairly good measure of how well the model has integrated ...

		Legend2440 9 months ago \| parent \| context \| favorite \| on: Gemini 2.5 I would say they are a fairly good measure of how well the model has integrated information from pretraining. They are not so good at measuring reasoning, out-of-domain performance, or creativity.