I don't think they make it clear: I wonder if they mean testers prefer o3 mini 56% of the time when they express an opinion, or overall? Some percentage of people don't choose; if that number is 10% and they aren't excluded, that means 56% of the time people prefer o3 mini, 34% of the time people prefer o1 mini, and 10% of the time people don't choose. I'm not sure I think it would be reasonable to present the data that way, but it seems possible.