They shat the bed. They went for super crazy fast compute and not much memory, assuming that models would plateu at a fee billion parameters.
Last year 70b parameters was considered huge, and a good place to standardize around.
Today we have 1t parameter models and we know it still scales linearly with parameters.
So next year we might have 10T parameter LLMs and these guys will still be playing catch up.
All that matters for inference right now is how many HBM chips you can stack and that's it
[0]: https://xcancel.com/CerebrasSystems/status/19513503371867015...
They shat the bed. They went for super crazy fast compute and not much memory, assuming that models would plateu at a fee billion parameters.
Last year 70b parameters was considered huge, and a good place to standardize around.
Today we have 1t parameter models and we know it still scales linearly with parameters.
So next year we might have 10T parameter LLMs and these guys will still be playing catch up.
All that matters for inference right now is how many HBM chips you can stack and that's it