Heads up, there’s a fair bit of pushback (justified or not) on r/LocalLLaMA about Ollama’s tactics:
Vendor lock-in: AFAIK it now uses a proprietary llama.cpp fork and builts its own registry on ollama.com in a kind of docker way (I heard docker ppl are actually behind ollama) and it's a bit difficult to reuse model binaries with other inference engines due to their use of hashed filenames on disk etc.
Closed-source tweaks: Many llama.cpp improvements haven’t been upstreamed or credited, raising GPL concerns. They since switched to their own inference backend.
Mixed performance: Same models often run slower or give worse outputs than plain llama.cpp. Tradeoff for convenience - I know.
Opaque model naming: Rebrands or filters community models without transparency, biggest fail was calling the smaller Deepseek-R1 distills just "Deepseek-R1" adding to a massive confusion on social media and from "AI Content Creators", that you can run "THE" DeepSeek-R1 on any potato.
Difficult to change Context Window default: Using Ollama as a backend, it is difficult to change default context window size on the fly, leading to hallucinations and endless circles on output, especially for Agents / Thinking models.
---
If you want better, (in some cases more open) alternatives:
llama.cpp: Battle-tested C++ engine with minimal deps and faster with many optimizations
ik_llama.cpp: High-perf fork, even faster than default llama.cpp
llama-swap: YAML-driven model swapping for your endpoint.
LM Studio: GUI for any GGUF model—no proprietary formats with all llama.cpp optimizations available in a GUI
Open WebUI: Front-end that plugs into llama.cpp, ollama, MPT, etc.
“I heard docker people are behind Ollama” um yes it’s founded by ex docker people and has raised multiple rounds of VC funding. The writing is on the wall - this is not some virtuous community project, it’s a profit driven startup and at the end of the day that is what they are optimizing for.
“Justified or not” — is certainly a useful caveat when giving the same credit to a few people who complain loudly with mostly unauthentic complaints.
> Vendor lock-in
That is, probably the most ridiculous of the statements. Ollama is open source, llama.cpp is open source, llamafiles are zip files that contain quantized versions of models openly available to be run with numerous other providers. Their llama.cpp changes are primarily for performance and compatibility. Yes, they run a registry on ollama.com for pre-packed, pre-quantized versions of models that are, again, openly available.
> Closed-source tweaks
Oh so many things wrong in a short sentence. Llama.cpp is MIT licensed, not GPL license. A proprietary fork is perfectly legitimate use. Also.. “proprietary“? The source code is literally available, including the patches, on GitHub in ollama/ollama project, in the “llama” folder with a patch file as recent as yesterday?
> Mixed Performance
Yes, almost anything suffers degraded performance when the goal is usability instead of performance. It is why people use C# instead of Assembly or punch cards. Performance isn’t the only metric, which makes this a useless point.
> Opaque model name
Sure, their official models have some ambiguities sometimes. I don’t know know that is the “problem” that people make it out to be when ollama is designed for average people to run models, and so a decision like “ollama run qwen3” not being the absolutely maximum best option possible rather than the option most people can run makes sense. Do really think it is advantageous or user friendly, when Tommy wants to try out “Deepseek-r1” on his potato laptop that a 671b parameter model too large to fit on almost anything consumer computer is the right choice and that it is instead meant as a “deception”? That seems…disingenuous. Not to mention, they are clearly listed as such on ollama.com, where in black and white it says the deep seek-r1 by default refers with the qwen model, and that the full model is available as deep seek-r1:671b
> Context Window
Probably the only fair and legitimate criticism of your entire comment.
I’m not an ollama defender or champion, couldn’t care about the company, and I barely use ollama (mostly just to run qwen3-8b for embedding). It really is just that most of these complaints you’re sharing from others seem to have TikTok-level fact checking.
Looking at a Problem from various perspectives, even posing ideas, is exactly what reasoning models seem to simulate in their thinking CoT to explore the solution space with optimizations like MCMC etc.
I hand curate github.com/underlines/awesome-ml so I read a ton about latest trends in this space. when I started to read the article, I felt a lot of information was weirdly familiar and almost outdated.
the space is moving fast after all. they just seem to be explaining QLoRA fine tuning, (yes great achievement and all the folks involved are heroes) but reading a trending article on HN - it felt off.
turns out I was too dumb to check the date: 2024 and the title is mixing up quantized adapter fine tuning with base model training. thanks lol
5070Ti user here: We are 150 people in a SME and most of our projects NDA for gov & defense clients absolutely forbid us to use any cloud based IDE tools like GitHub Copilot etc. Would love for this project to provide a BYOK and even Bring Your Own Inference Endpoint. You can still create licensing terms for business clients.
illegally using annas archive, the pile, common crawl, their own crawl, books2, libgen etc. and embed it into high dimensional space and do next token prediction on it.
afaik, guessing anything not 00000 or 11111 at first step will lead to an optimum strategy of 3 steps. because you introduce possible "right digit at wrong place" as a third state.
guessing 00000 or 11111 removes that third state and leaves you with simple substitution of wrong cells, which leads to an optimal 2 step strategy.
but obviously the shortest strategy is just guessing it right on the first try :D lol
Our family had a computer since 1990 when I was 4yo. As a kid in school we had typing lessons on a typewriter in 2001 (despite having iMacs in the classroom). I specifically tried to type as fast as possible in order to leave typing class early.
It helped my brain to get up to 130 WPM as a kid. I now type at around 100 WPM.
reply