Parameter count is strictly better IF the number of tokens (and ideally better quality tokens) trained on increases, and if the training is done for longer (most LLMs are way undertrained)
Most of the huuuuuge models failed on most or all of these fronts and that's why they suck compared to Llama or Alpaca or Vicuna
That's not true. For the same number of training tokens, bigger is better. And for the same size, more tokens is better. So obviously more tokens and bigger is better.
Most of the huuuuuge models failed on most or all of these fronts and that's why they suck compared to Llama or Alpaca or Vicuna