Here are my notes on trying this out locally via Ollama and via their API (and t...

atxtechbro · 2025-06-10T17:43:44 1749577424

Hi Simon,

What's the huge difference between the two pelicans riding bicycles? Was one running locally the small version vs the pretty good one running the bigger one thru the API?

Thanks, Morgan

diggan · 2025-06-10T18:03:29 1749578609

Ollama doesn't like proper naming for some reason, so `ollama pull magistral:latest` lands you with the q4_K_M version (currently, subject to change).

Mistral's API defaults to `magistral-medium-2506` right now, which is running with full precision, no quantization.

otabdeveloper4 · 2025-06-11T05:23:25 1749619405

Nobody should be ever using ollama, for any reason.

It literally only makes everything worse and more convoluted with zero benefits.

jeffhuys · 2025-06-11T06:47:38 1749624458

Could you elaborate?

redman25 · 2025-06-11T09:23:16 1749633796

Not the parent but I would say bad defaults or naming. There are countless posts from newbies wondering why a model doesn’t work as well as it should.

It’s usually either because the context size is set very low by default or they didn’t realize that they weren’t running the full model (ollama uses the distilled version in place of the full version but names it after the full version).

There’s also been some controversy over not giving proper credit to llama.cpp which ollama is/was a wrapper around.

kristianp · 2025-06-11T20:51:20 1749675080

> ollama uses the distilled version

I've never used ollama, but perhaps you mean quantized and not distilled? Or do they actually use distilled versions?

cosmojg · 2025-06-11T22:13:55 1749680035

They actually use distilled versions. The most egregious example of this is their misleading reference to all distillations of DeepSeek-R1, which are based on a variety of vastly different base models of varying sizes, as alternative versions of DeepSeek-R1 itself. To this day, many users maintain the mistaken impression that DeepSeek-R1 is overhyped and doesn't perform as well as claimed by those who have been using the actual model with 685B parameters.

otabdeveloper4 · 2025-06-11T11:21:32 1749640892

ollama is just a wrapper for llama.cpp that adds insane defaults.

Just use llama.cpp directly.

samtheprogram · 2025-06-10T22:02:57 1749592977

Not only the quantization, but what’s available via ollama is magistral-small (for local inference), not the -medium variant.

simonw · 2025-06-10T19:46:45 1749584805

Yes, the bad one was Mistral Small running locally, the better one was Mistral Medium via their API.

internet_points · 2025-06-11T08:16:32 1749629792

> I guess this means the reasoning traces are fully visible and not redacted in any way - interesting to see Mistral trying to turn that into a feature that's attractive to the business clients they are most interested in appealing to.

but then someone found that, at least for distilled models,

> correct traces do not necessarily imply that the model outputs the correct final solution. Similarly, we find a low correlation between correct final solutions and intermediate trace correctness

https://arxiv.org/pdf/2505.13792

ie. the conclusion doesn't necessarily follow from the reasoning. So is there still value in seeing the reasoning? There may be useful information in the reasoning, but I'm not sure it can be interpreted by humans as a typical human chain of reasoning, maybe it should be interpreted more as a loud multi-party discussion on the relevant subject which may have informed the conclusion but not necessarily lead to it.

OTOH, considering the effects of automation fatigue vs human oversight, I guess it's unlikely anyone will ever look at the reasoning in practice, except to summarily verify that it's there and tick the boxes on some form.