When he says "the algorithm is given just the pixels", does that mean it's not given any information about the game itself, like the objective? How does it know how to measure it's own success?
There is an objective, but it may not be exactly as you reason about it. There's a great video that made the rounds last year about building a neural net that plays Super Mario World, that may help visualize what's going on - https://www.youtube.com/watch?v=qv6UVOQ0F44
There's also a great snippet in the currently-ongoing AlphaGo videos that explains that when AlphaGo plays in ways that you may not expect, it's because it's strictly worried about _winning_ (even by the slimmest margin) with the greatest probability, and not necessarily by winning handily, like a human might.