Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Parameter count is strictly better IF the number of tokens (and ideally better quality tokens) trained on increases, and if the training is done for longer (most LLMs are way undertrained)

Most of the huuuuuge models failed on most or all of these fronts and that's why they suck compared to Llama or Alpaca or Vicuna



That's not true. For the same number of training tokens, bigger is better. And for the same size, more tokens is better. So obviously more tokens and bigger is better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: