Hacker News new | past | comments | ask | show | jobs | submit | snippyhollow's comments login

We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long sequences.


Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?

[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming)


Undoubtedly, they have tried ALiBi…


Thanks for the feedback, we'll fix this.



It is like `cp`.


It isn't just like `cp`, on some systems `ln` is just a symlink to `cp`! `cp -s` does the same thing as `ln -s`, although the other flags are generally different.


Quite usable, all of TorchCraftAI uses it https://torchcraft.github.io/TorchCraftAI/ :)



Yes. t-SNE is probably the only algorithm that might produce "sensible" 2D mappings.

BTW, matplotlib has a nicer facility than add_subplot() for making grid plots:

  fig, axes = plot.subplots(nrows=figdims, ncols=figdims)
  for dim, ax1 in zip(range(2, MAX_DIM), axes.flatten()[:(MAXDIM-2)]):
  .
  .
  .




One logical continuation of adding more attention steps is to make decision of how many attention steps to take determined by the network ala "Adaptive Computation Time for Recurrent Neural Networks", are you planning to go in that direction?


One of my students tried something along these lines for Natural Language Inference (NLI) last year. [1] The results where not conclusive, but perhaps Machine Translation is a better target? My reason for believing this is that the specific dataset for NLI most likely does not require multiple steps of inference for most cases (you can get away with simple token overlap), while the decoder in MT does so since it is constrained to output a single token at each step.

[1]: https://arxiv.org/abs/1610.07647



We really have no way of knowing for which purpose or in which way any data that is collected about us is used by any company/person/entity.


A few years ago SpringRTS, and in particular the BalancedAnnihilation mod [0] were active and playable. There is also Planetary Annihilation [1] (that I used to play competitively online, a few of the top players are ex-StarCraft II Grand Masters).

[0] https://springrts.com/wiki/Balanced_Annihilation [1] http://www.uberent.com/pa/



I've played Warzone, Zero-K and SC2. I think Zero-K is the better of the three. Warzone's huge tech tree is overwhelming and I believe it doesn't really add to the game, because it's likely that only a few tech paths are actually viable if one talks competitive play. SC2 is pretty and all, but suffers from hard-to-solve balance problems. Zero-K has solid gameplay elements (like terrain deformation and good unit control) a great variety of units available to all players to master, and it handles nicely a dozen of players without population caps. The only problem is the small player base.


PS: The Spring-Engine Team is always welcoming experienced contributors. And yes, that's the engines repo- not the java hibernate framework ;)

https://github.com/spring/spring


I own Planetary Annihilation. It is a horrible successor to TA. No use of terrain, boring units and terrible UX. I currenlty play Total Mayhem (TA mod) and TA:Escalation.


My earliest RTS is TA and we played the heck out of it. I think the SpringRTS is a worthy successor.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: