>Each of OpenAI Five’s networks contain a single-layer, 1024-unit LSTM that sees the current game state (extracted from Valve’s Bot API)
This will likely dramatically simplify the problem vs. what the DeepMind/Blizzard framework does for StarCraft II, which provides a game state representation closer to what a human player would actually see. I would guess that the action API is also much more "bot-friendly" in this case, i.e., it does not need to do low-level actions such as boxing to select.
The problem they're trying to solve is also not how to recognise actions from pixels, it's how to outstrategise and outexecute players at the game. Conceptual rather than mechanical advantage.
This will likely dramatically simplify the problem vs. what the DeepMind/Blizzard framework does for StarCraft II, which provides a game state representation closer to what a human player would actually see. I would guess that the action API is also much more "bot-friendly" in this case, i.e., it does not need to do low-level actions such as boxing to select.