Quite a good read!
Impressive results, it seems.
Still think much more useful to research learning complex things without absurd compute/sample inefficiency/various hacks eg reward shapring (which, lets be honest, this seems to have a lot of), but still interesting results.