Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Binary rewards (win/loss score at the end of the roll out) scored a "good" 70.

With sparse reward (kills, health, etc), scored a better 80 and learned much faster.

Normally, "reward engineering" uses human knowledge to give more continuous, richer rewards. This was not used here.



Perhaps we are looking at a different graph, but in the one I am looking at, blue is "sparse" (plateaus at 70) and orange is "dense" (very quickly hits 80). I believe "dense" means they are doing reward engineering.


The "sparse blue graph" is just the binary win loss outcome - learns ok-ish but slow

The "dense orange graph" - uses more dense rewards - kills, health - and learns better. I referred to this as a "sparse reward" - since it is still a fairly lean and sparse function.

But this is just my opinion. Also note this is for the older 1v1 agent.

The current reward function is even more detailed, and they blend and anneal the 5 agent score, so i dunno...

https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae939...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: