Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can make AlphaZero learn different value function by changing the reward from the game. Game with no reward from draw, more reward from shorter games with more captures creates different value functions.


Or by training it against a flawed version of the algorithm that tends to overlook best defence like a human would.

This was what made the romantic style possible in the first place: humans perform worse under pressure, so they tend to crack once they're put on the ropes.

It's psychologically easier to attack than to defend accurately


I don’t think that would work. A weird attacking style only works if your attacking moves that technically are losing ones lead to situations so complex that your opponent are unlikely to find the moves that win it for them.

AlphaZero would not have trouble finding the (likely fairly dull) winning moves, and certainly wouldn’t see any reason to play such a move itself.


The chess computing term for this is "contempt," the parameter that determines the worst move the engine assumes its opponent would play. Classical chess engines can allow this parameter to be changed easily. I'm sure the MCTS algorithm for alphazero could be modified to allow this as well, if it doesn't already support it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: