By the way, the fact that "average" and "best" human performance are presented as meaningful benchmarks is one of the biggest signs that modern AI is driven by hype, rather than science.
For example, speech recognition AI is supposedly within fraction of a percent from "average human level", and yet auto-generated captions are awful. They have no punctuation, they don't distinguish between different speakers, they aren't visually grouped, and fail miserably dealing with slang. So turns out researchers are measuring only one aspect of the problem their algorithm is good at and ignoring the rest.
On the flip side, we have animal intelligence. Bees aren't nearly as smart as humans. So surely modern AI, which surpasses humans in this and that, would have no problem outperforming a bee with its 960 000 neurons, right? But in reality, there is nothing even approaching bees' versatile intelligence. Of course, modern AI researchers would just hand-wave this saying the problem is not well defined. Convenient.
> For example, speech recognition AI is supposedly within fraction of a percent from "average human level", and yet auto-generated captions are awful. They have no punctuation, they don't distinguish between different speakers, they aren't visually grouped, and fail miserably dealing with slang. So turns out researchers are measuring only one aspect of the problem their algorithm is good at and ignoring the rest.
YouTube captioning != SOTA, any more than Google Translate for years and years represented anything close to the NMT SOTA.
Well, for a certain definition of 'human performance'... I believe that's carried over from the DQN paper and is something like 'an ordinary video game player given a few hours'. When it comes to ALE you should usually treat the 'human performance' numbers as being lower bounds.
(In this case, if an agent can beat 'human performance' by only clearing 1 of 9 total levels, one is entitled to a little skepticism about how useful 'human performance' is as a benchmark for this particular game. Focus on the improvement over other DRL agents, not that.)