Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This algorithm (in it's regular form used often in games and examples) has one interesting "downside" I was exploring some time ago - selection is performed using the UCB formula. So basically it tries to maximize the player payout. But in the most games this is in fact impractical assumption, because we end up tending to expand branches that will be most likely not chosen by our opponent. As in the example (I assume gray moves are "our" moves) - we will much more likely choose to expand 5/6 branch instead of the 2/4, that will be in fact more likely chosen by our opponent.


In a game state where your opponent will be choosing the next move, you should select the next state to expand based on a UCB formula involving your opponent's expected payout, not your own.


But this requires storing and back propagating this info for the other players - something I really haven't seen in any examples (nor in this article). We cannot also assume that game is always zero-sum game and this information is not needed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: