But of course, after all the benchmark issues we've had thus far -- memorization, conflicts of interest, and just plainly low-quality questions -- I think it's fair to be suspicious of the extent to which these numbers will actually map to usability in the real world.