Binary rewards (win/loss score at the end of the roll out) scored a "good" 70. W...

obastani · on June 25, 2018

Perhaps we are looking at a different graph, but in the one I am looking at, blue is "sparse" (plateaus at 70) and orange is "dense" (very quickly hits 80). I believe "dense" means they are doing reward engineering.

yazr · on June 25, 2018

The "sparse blue graph" is just the binary win loss outcome - learns ok-ish but slow

The "dense orange graph" - uses more dense rewards - kills, health - and learns better. I referred to this as a "sparse reward" - since it is still a fairly lean and sparse function.

But this is just my opinion. Also note this is for the older 1v1 agent.

The current reward function is even more detailed, and they blend and anneal the 5 agent score, so i dunno...

https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae939...