Wait, so there's a way to make a model as smart as GPT but with less parameters?...

month13 · on March 23, 2023

This is an older paper, but DeepMind alleges in their Chinchilla paper that far better performance can be extracted with fewer parameters; quote

"We find that current large language models are significantly under-trained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant."

It's difficult to evaluate a LLM's performance as it's all qualitative, but Meta's LLaMA has been doing quite well, at even 13B parameters.

astrange · on March 24, 2023

Chinchilla is aimed at finding a cost-performance tradeoff as well, not the optimal amount of training. If cost is no barrier because it'll be used forever, then probably there's no amount of training that's good enough.

monocasa · on March 23, 2023

The rumor I've heard is that GPT4 didn't meaningfully increase the parameter count versus GPT3.5, but instead focused on training and structural improvements.

qumpis · on March 24, 2023

Well the inference time of gpt4 seems to be far greater than gpt3, so it could hint a difference in parameters count.

_oghd · on March 24, 2023

if you watch their announcement Livestream video it looked just as fast as normal ChatGPT.

I think what we have access to is a fair bit slower.

jazzyjackson · on March 24, 2023

You can train a small model to behave like the large model at a subset of tasks.

endisneigh · on March 23, 2023

that's a complicated question to answer. what I'd say is that more parameters makes the model more robust, but there are diminishing returns. optimizations are under way