When he says "the algorithm is given just the pixels", does that mean it's not g...

slpsys · on March 14, 2016

There is an objective, but it may not be exactly as you reason about it. There's a great video that made the rounds last year about building a neural net that plays Super Mario World, that may help visualize what's going on - https://www.youtube.com/watch?v=qv6UVOQ0F44

There's also a great snippet in the currently-ongoing AlphaGo videos that explains that when AlphaGo plays in ways that you may not expect, it's because it's strictly worried about _winning_ (even by the slimmest margin) with the greatest probability, and not necessarily by winning handily, like a human might.

qwertyuiop924 · on March 15, 2016

Yeah, I love sethbling. And the lua for MarI/O is only one file, and a relatively small one at that.

rahimnathwani · on March 14, 2016

This paper[0] says "In addition it receives a reward rt representing the change in game score."

[0] https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf