If anyone is interested in learning more on this topic, Mykel Kochenderfer's "Decision Making Under Uncertainty" offers a stellar treatment of reinforcement learning from the ground up. https://mitpress.mit.edu/decision-making-under-uncertainty
This game really is quite simple! The go-to example I use for a simple game is called 21.
- There are N (usually 21) tokens in a pile.
- A turn consists of removing 1, 2, or 3 tokens from the pile.
- The player who removes the final token is the winner.
- The opponent will always take tokens equal to n mod 4 if that is a valid move, otherwise will play randomly (this is the optimal strategy).
- The AI plays first.
You can see my write-up here: [1]. One of the most interesting things for me was visually inspecting the action scores (at the end) to see how the agent learned the optimal strategy over time. My configuration took 3000 games to reach the optimal strategy against against a strong opponent (opponent epsilon = 0.1), and substantially longer as the opponent starts to play worse.