Note that this is StableLM ALPHA (only 0.52 epochs into training). The fully tra...

Note that this is StableLM ALPHA (only 0.52 epochs into training).

The fully trained version will surely be much better.

Also, you should benchmark GPT-3 Babbage for a fair comparison since that is the same size as 7B.

ALittleLight on April 20, 2023 [–]

How many epochs will they run?