2.5.1 then . semantic versioning works for most scenarios.

JumpCrisscross · 2025-09-26T00:26:09 1758846369

Would that automatically roll over anyone pinging 2.5 via their API?

manquer · 2025-09-26T02:34:12 1758854052

If you want role over then you could specify ^2.5.0 or 2.5.x if you want to pin then it would be 2.5.0

This is all solved for a long time now , llm vendors seems to have unlearnt versioning principles.

This is fairly typical - marketing and business wants different things to do with version number than what version number systems are good at .

dgacmu · 2025-09-26T12:04:47 1758888287

I suspect Google doesn't want to have to maintain multiple sub-versions. It's easier to serve one 2x popular model than two models where there's flux between the load on each, since these things have a non-trivial time to load into GPU/TPU memory for serving.

manquer · 2025-09-27T19:44:09 1759002249

Even if switching quickly was a challenge[1], they are using these models in their own products not just selling them in a service, the first party applications could quite easily adapt to this by switching quickly to the available model and freeing up the in-demand one.

This is the entire premise behind the cloud, the reason it was Amazon did it first, they had the largest workloads at the time before Web 2.0 and SaaS was a thing.

Only businesses with large first party apps succeeded in the cloud provider space, companies like HP, IBM all failed and their time to failure strongly correlated to their amount of first party apps they operated. i.e. These apps anyway needed to keep a lot of idle capacity for peak demand capacity they could now monetize and co-mingle in the cloud.

LLMs as a service is not any different from S3 launched 20 years ago.

---

[1] It isn't, at the scale they are operating these models it shouldn't matter at all, it is not individual GPUs or machines that make a difference in load handling at all. Only few users are going to explicitly pining a specific patch version for the rest they can serve either one that is available immediately or cheaply.