Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?
It isn't just like `cp`, on some systems `ln` is just a symlink to `cp`! `cp -s` does the same thing as `ln -s`, although the other flags are generally different.
One logical continuation of adding more attention steps is to make decision of how many attention steps to take determined by the network ala "Adaptive Computation Time for Recurrent Neural Networks", are you planning to go in that direction?
One of my students tried something along these lines for Natural Language Inference (NLI) last year. [1] The results where not conclusive, but perhaps Machine Translation is a better target? My reason for believing this is that the specific dataset for NLI most likely does not require multiple steps of inference for most cases (you can get away with simple token overlap), while the decoder in MT does so since it is constrained to output a single token at each step.
A few years ago SpringRTS, and in particular the BalancedAnnihilation mod [0] were active and playable. There is also Planetary Annihilation [1] (that I used to play competitively online, a few of the top players are ex-StarCraft II Grand Masters).
I've played Warzone, Zero-K and SC2. I think Zero-K is the better of the three. Warzone's huge tech tree is overwhelming and I believe it doesn't really add to the game, because it's likely that only a few tech paths are actually viable if one talks competitive play. SC2 is pretty and all, but suffers from hard-to-solve balance problems. Zero-K has solid gameplay elements (like terrain deformation and good unit control) a great variety of units available to all players to master, and it handles nicely a dozen of players without population caps. The only problem is the small player base.
I own Planetary Annihilation. It is a horrible successor to TA. No use of terrain, boring units and terrible UX. I currenlty play Total Mayhem (TA mod) and TA:Escalation.