Quite a good read! Impressive results, it seems. Still think much more useful to...

Quite a good read! Impressive results, it seems. Still think much more useful to research learning complex things without absurd compute/sample inefficiency/various hacks eg reward shapring (which, lets be honest, this seems to have a lot of), but still interesting results.