Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I never thought I'd see the day when a 13B model was casually referred to in a comments section as a "toy model".


Compared to GPT2 it’s on par. Compared to GPT3, 3.5, or 4, it’s a toy. GPT2 is 4 years old, and in terms of LLMs, that’s several life times ago. In 5-10 years, GPT3 will be viewed as a toy. Note, “progress” will unlikely be as fast as it has been going forward.


GPT-2's largest model was 1.5B params, LLama-65B was similar to the largest GPT3 in benchmark performance but that model was expensive in the API, a number of the people would use the cheaper one(s) instead IIRC.

So this is similar to a mid tier GPT3 class model.

Basically, there's not much reason to Pooh-Pooh it. It may not perform quite as well, but I find it to be useful for the things it's useful for.


"Compared to GPT2 it’s on par" Any benchmarks or evidence tu support this claim? IF you try to find them, official benchmarks will tell you that this is not true. Even the smallest LLaMa model (7B) is far ahead of GPT2, like an order of magnitude better in perplexity.


Start using it for tasks and you'll find limitations very quickly. Even ChatGPT excels at some tasks and fails miserably at others.


Oh, I've been using language models before a lot (or at least some significant chunk) of HN knew the word LLM, I think.

I remember when going from 6B to 13B was crazy good. We've just normalized our standards to the latest models in the era.

They do have their shortcomings but can be quite useful as well, especially the LLama class ones. They're definitely not GPT-4 or Claude+, for sure, for sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: