Hacker News new | past | comments | ask | show | jobs | submit login

Your implication is that we have unlimited compute and therefore know that LLMs are stalled.

Have you considered that compute might be the reason why LLMs are stalled at the moment?

What made LLMs possible in the first place? Right, compute! Transformer Model is 8 years old, technically GPT4 could have been released 5 years ago. What stopped it? Simple, the compute being way too low.

Nvidia has improved compute by 1000x in the past 8 years but what if training GPT5 takes 6-12 months for 1 run based on what OpenAI tries to do?

What we see right now is that pre-training has reached the limits of Hopper and Big Tech is waiting for Blackwell. Blackwell will easily be 10x faster in cluster training (don't look on chip performance only) and since Big Tech intends to build 10x larger GPU clusters then they will have 100x compute systems.

Let's see then how it turns out.

The limit on training is time. If you want to make something new and improve then you should limit training time because nobody will wait 5-6 months for results anymore.

It was fine for OpenAI years ago to take months to years for new frontier models. But today the expectations are higher.

There is a reason why Blackwell is fully sold out for the year. AI research is totally starved for compute.

The best thing for Nvidia is also that while AI research companies compete with each other, they all try to get Nvidia AI HW.






The age of pre-training is basically over, I think everyone acknowledged this and it's not to do with not having a big enough cluster. The bull argument on AI is that inference time scaling will pull us to the next step

Except o3 benchmarks are, seemingly, pretty solid evidence that leaving LLM'S on for the better part of a day and spending a million dollars gets you... Nothing. Passing a basic logic test using brute force methods and which falls apart on a marginally easier test that it just wasn't trained on.

The returns on computer and data seem to be diminishing with more and more exponential increases in inputs returning geometric increases in quality, and we're out of quality training data so that is now much worse even if the scaling wasn't plateauing.

All this, and the scale that got us this far seems to have done nothing to give us real intelligence, there's no planning or real reasoning and this is demonstrated every time it tries to do something out of distribution, or even in distribution but just complicated. Even if we got another crank or two out of this, we're still at the bottom of the mountain here. We haven't started and we're already out of gas

Scale doesn't fix this any more than building a mile tall fence stops the next break in. If it was going to work we would have seen to work already. LLM's don't have much juice left in the squeeze, imo




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: