That's definitely happening. The US does this through massive government spending on American solutions. The EU is only starting to go that route as well.
Completely agreed. Apple seriously regressed the multi-lingual experience. They probably have a model per language. If you have to mix languages in a sentence, well, good luck!
They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...
You should read the transcript. He's including 2025 in the age of scaling.
> Maybe here’s another way to put it. Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling—maybe plus or minus, let’s add error bars to those years—because people say, “This is amazing. You’ve got to scale more. Keep scaling.” The one word: scaling.
> But now the scale is so big. Is the belief really, “Oh, it’s so big, but if you had 100x more, everything would be so different?” It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers.
That article is more about feasibility rather than desirability. There's even a section where they say:
> Settling the question of whether companies or governments will be ready to invest upwards of tens of billions of dollars in large scale training runs is ultimately outside the scope of this article.
Ilya is saying it's unlikely to be desirable, not that it isn't feasible.
That article is from August 2024. A lot has changed since then.
Specifically, performance of SOTA models has been reaching a plateau on all popular benchmarks, and this has been especially evident in 2025. This is why every major model announcement shows comparisons relative to other models, but not a historical graph of performance over time. Regardless, benchmarks are far from being a reliable measurement of the capabilities of these tools, and they will continue to be reinvented and gamed, but the asymptote is showing even on their own benchmarks.
We can certainly continue to throw more compute at the problem. But the point is that scaling the current generation of tech will continue to have fewer returns.
To make up for this, "AI" companies are now focusing on engineering. 2025 has been the year of MCP, "agents", "skills", etc., which will continue in 2026. This is a good thing, as these tools need better engineering around them, so they can deliver actual value. But the hype train is running out of steam, and unless there is a significant breakthrough soon, I suspect that next year will be a turning point in this hype cycle.
France does not reimburse homeopathic treatments anymore. The NHS in the UK went even further, they funded homeopathic hospitals like the Royal London Homeopathic Hospital and reimbursed homeopathy until 2017.
Yeah that stuff generated embarrassingly wrong scientific 'facts' and citations.
That kind of hallucination is somewhat acceptable for something marketed as a chatbot, less so for an assistant helping you with scientific knowledge and research.
I thought it was weird at the time how much hate Galactica got for its hallucinations compared to hallucinations of competing models. I get your point and it partially explains things. But it's not a fully satisfying explanation.
First, the migration to 2.0 in 20219 to add eager mode support was horribly painful. Then, starting around 2.7, backward compatibility kept being broken. Not being able to load previously trained models with a new version of the library is wildly painful.
reply