"Compared to GPT2 it’s on par" Any benchmarks or evidence tu support this claim?...

"Compared to GPT2 it’s on par" Any benchmarks or evidence tu support this claim? IF you try to find them, official benchmarks will tell you that this is not true. Even the smallest LLaMa model (7B) is far ahead of GPT2, like an order of magnitude better in perplexity.