It's always a question of "compared to what?" Local models are no where near cap...

simonw · 2025-08-14T14:28:35 1755181715

Sure, Qwen-3-4B - a 4GB download - is nowhere near as capable as Claude Sonnet 4.

But it is massively more capable than the 4GB models we had last year.

Meanwhile recent models that are within the same ballpark of capabilities as Claude Sonnet 4 - like GLM 4.5 and Kimi K2 and the largest of the Qwen 3 models - can just about fit on a $10,000 512GB of RAM Mac Studio. That's a very notable trend.

grim_io · 2025-08-14T14:47:35 1755182855

It doesn't feel like that the gap is closing at all.

The local models can get 10x as good next year, it won't matter to me if the frontier models are still better.

And just because we can run those models (heavily quantized, and thus less capable), they are unusably slow on that 10k dead weight hardware.

badsectoracula · 2025-08-14T18:58:57 1755197937

El Capitan being much faster than my desktop doesn't mean that my desktop is useless. Same with LLMs.

I've been using Mistral Small 3.x for a bunch of tasks on my own PC and it has been very useful, especially after i wrote a few custom tools with llama.cpp to make it more "scriptable".

jdjdndndn · 2025-08-14T22:46:05 1755211565

I would be interested in hearing about those custom tools