More

underlines · 2025-08-09T17:12:10 1754759530

Jan here too, and I work with LLMs full time and I'm a speaker about these topics. Annoying how many times people ask me if Jan.ai is me lol

dsp_person · 2025-08-09T17:16:44 1754759804

We need a steve.ai

ithkuil · 2025-08-09T22:34:42 1754778882

I want a Robert Duck AI

underlines · 2025-07-31T15:08:11 1753974491

Heads up, there’s a fair bit of pushback (justified or not) on r/LocalLLaMA about Ollama’s tactics:

    Vendor lock-in: AFAIK it now uses a proprietary llama.cpp fork and builts its own registry on ollama.com in a kind of docker way (I heard docker ppl are actually behind ollama) and it's a bit difficult to reuse model binaries with other inference engines due to their use of hashed filenames on disk etc.

    Closed-source tweaks: Many llama.cpp improvements haven’t been upstreamed or credited, raising GPL concerns. They since switched to their own inference backend.

    Mixed performance: Same models often run slower or give worse outputs than plain llama.cpp. Tradeoff for convenience - I know.

    Opaque model naming: Rebrands or filters community models without transparency, biggest fail was calling the smaller Deepseek-R1 distills just "Deepseek-R1" adding to a massive confusion on social media and from "AI Content Creators", that you can run "THE" DeepSeek-R1 on any potato.

    Difficult to change Context Window default: Using Ollama as a backend, it is difficult to change default context window size on the fly, leading to hallucinations and endless circles on output, especially for Agents / Thinking models.

---

If you want better, (in some cases more open) alternatives:

    llama.cpp: Battle-tested C++ engine with minimal deps and faster with many optimizations

    ik_llama.cpp: High-perf fork, even faster than default llama.cpp

    llama-swap: YAML-driven model swapping for your endpoint.

    LM Studio: GUI for any GGUF model—no proprietary formats with all llama.cpp optimizations available in a GUI

    Open WebUI: Front-end that plugs into llama.cpp, ollama, MPT, etc.

J_Shelby_J · 2025-07-31T16:13:51 1753978431

And llamacpp has a gui out of the box that’s decent.

kanestreet · 2025-07-31T21:10:19 1753996219

“I heard docker people are behind Ollama” um yes it’s founded by ex docker people and has raised multiple rounds of VC funding. The writing is on the wall - this is not some virtuous community project, it’s a profit driven startup and at the end of the day that is what they are optimizing for.

therealpygon · 2025-07-31T16:50:53 1753980653

“Justified or not” — is certainly a useful caveat when giving the same credit to a few people who complain loudly with mostly unauthentic complaints.

> Vendor lock-in

That is, probably the most ridiculous of the statements. Ollama is open source, llama.cpp is open source, llamafiles are zip files that contain quantized versions of models openly available to be run with numerous other providers. Their llama.cpp changes are primarily for performance and compatibility. Yes, they run a registry on ollama.com for pre-packed, pre-quantized versions of models that are, again, openly available.

> Closed-source tweaks

Oh so many things wrong in a short sentence. Llama.cpp is MIT licensed, not GPL license. A proprietary fork is perfectly legitimate use. Also.. “proprietary“? The source code is literally available, including the patches, on GitHub in ollama/ollama project, in the “llama” folder with a patch file as recent as yesterday?

> Mixed Performance

Yes, almost anything suffers degraded performance when the goal is usability instead of performance. It is why people use C# instead of Assembly or punch cards. Performance isn’t the only metric, which makes this a useless point.

> Opaque model name

Sure, their official models have some ambiguities sometimes. I don’t know know that is the “problem” that people make it out to be when ollama is designed for average people to run models, and so a decision like “ollama run qwen3” not being the absolutely maximum best option possible rather than the option most people can run makes sense. Do really think it is advantageous or user friendly, when Tommy wants to try out “Deepseek-r1” on his potato laptop that a 671b parameter model too large to fit on almost anything consumer computer is the right choice and that it is instead meant as a “deception”? That seems…disingenuous. Not to mention, they are clearly listed as such on ollama.com, where in black and white it says the deep seek-r1 by default refers with the qwen model, and that the full model is available as deep seek-r1:671b

> Context Window

Probably the only fair and legitimate criticism of your entire comment.

I’m not an ollama defender or champion, couldn’t care about the company, and I barely use ollama (mostly just to run qwen3-8b for embedding). It really is just that most of these complaints you’re sharing from others seem to have TikTok-level fact checking.

underlines · 2025-07-31T14:52:14 1753973534

Looking at a Problem from various perspectives, even posing ideas, is exactly what reasoning models seem to simulate in their thinking CoT to explore the solution space with optimizations like MCMC etc.

underlines · 2025-07-24T21:18:57 1753391937

I hand curate github.com/underlines/awesome-ml so I read a ton about latest trends in this space. when I started to read the article, I felt a lot of information was weirdly familiar and almost outdated.

the space is moving fast after all. they just seem to be explaining QLoRA fine tuning, (yes great achievement and all the folks involved are heroes) but reading a trending article on HN - it felt off.

turns out I was too dumb to check the date: 2024 and the title is mixing up quantized adapter fine tuning with base model training. thanks lol

underlines · 2025-07-14T20:54:11 1752526451

5070Ti user here: We are 150 people in a SME and most of our projects NDA for gov & defense clients absolutely forbid us to use any cloud based IDE tools like GitHub Copilot etc. Would love for this project to provide a BYOK and even Bring Your Own Inference Endpoint. You can still create licensing terms for business clients.

hedgehog · 2025-07-14T21:11:36 1752527496

What models do you use that you've found to be powerful enough to be helpful?

Aherontas · 2025-07-15T04:47:41 1752554861

I have the same question, do you use already an on prem RAG system?

underlines · 2025-06-04T14:52:27 1749048747

yes, every major llm company did it:

illegally using annas archive, the pile, common crawl, their own crawl, books2, libgen etc. and embed it into high dimensional space and do next token prediction on it.

underlines · 2025-06-04T14:47:24 1749048444

afaik, guessing anything not 00000 or 11111 at first step will lead to an optimum strategy of 3 steps. because you introduce possible "right digit at wrong place" as a third state.

guessing 00000 or 11111 removes that third state and leaves you with simple substitution of wrong cells, which leads to an optimal 2 step strategy.

but obviously the shortest strategy is just guessing it right on the first try :D lol

paxys · 2025-06-04T14:58:01 1749049081

It doesn't matter.

Right digit at the wrong place = wrong digit = you should flip it.

This puzzle won't take more than 2 guesses no matter what you input the first time.

Jtsummers · 2025-06-04T14:57:34 1749049054

It's still two steps. You only ever need to flip wrong digits.

underlines · 2025-06-04T14:18:43 1749046723

In evolution there is no metric, that's a human made concept. In evolution the thing that kills you also evolves. The "metric" evolves.

Grimblewald · 2025-06-06T04:54:16 1749185656

there is an extremely clear and simple metric in evolution - number of copies of gene in subsequent iterations of the organism.

underlines · 2025-06-03T07:02:45 1748934165

Our family had a computer since 1990 when I was 4yo. As a kid in school we had typing lessons on a typewriter in 2001 (despite having iMacs in the classroom). I specifically tried to type as fast as possible in order to leave typing class early. It helped my brain to get up to 130 WPM as a kid. I now type at around 100 WPM.

underlines · 2025-05-09T12:43:23 1746794603

vibe coding oriented builders, where you draft an app idea and it gives you a prototype?

I'd say Firebase Studio and OpenHands