Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When he says "the algorithm is given just the pixels", does that mean it's not given any information about the game itself, like the objective? How does it know how to measure it's own success?


There is an objective, but it may not be exactly as you reason about it. There's a great video that made the rounds last year about building a neural net that plays Super Mario World, that may help visualize what's going on - https://www.youtube.com/watch?v=qv6UVOQ0F44

There's also a great snippet in the currently-ongoing AlphaGo videos that explains that when AlphaGo plays in ways that you may not expect, it's because it's strictly worried about _winning_ (even by the slimmest margin) with the greatest probability, and not necessarily by winning handily, like a human might.


Yeah, I love sethbling. And the lua for MarI/O is only one file, and a relatively small one at that.


This paper[0] says "In addition it receives a reward rt representing the change in game score."

[0] https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: