Llama.cpp forms the base for both Ollama and Kobold.cpp and probably a bunch of others I'm not familiar with. It's less a question of whether you want to use llama.cpp or one of the others and more of a question of whether you benefit from using one of the wrappers.
I can imagine some use cases where you'd really want to use llama.cpp directly, and there are of course always people who will argue that all wrappers are bad wrappers, but for myself I like the combination of ease of use and flexibility offered by Ollama. I wrap it in Open WebUI for a GUI, but I also have some apps that reach out to Ollama directly.
What is the advantage of ollama/open webui let's say, vs llama-server? I have been using llama.cpp since when it came out, I am used to the syntax etc and I do not have problems building it (probably because I use macos which it supports better), so am I missing something from not using ollama?
Llama.cpp may have gotten better, but when I first started using them:
* Ollama provides a very convenient way to download and manage models, which llama.cpp didn't at the time (maybe it does now?).
* Last I checked, with llama.cpp server you pick a model on server startup. Ollama instead allows specifying the model in the API request, and it handles switching out which one is loaded into vram automatically.
* The Modelfile abstraction is a more helpful way to keep track of different settings. When I used llama.cpp directly I had to figure out a way to track a bunch of model parameters as bash flags. Modelfiles + being able to specify the model in the request is a great synergy, allowing clients to not have to think about parameters at all, just which Modelfile to use.
I'm leaving off some other reasons why I switched which I know have gotten better (like Ollama having great Docker support, which wasn't always true for llama.cpp), but some of these may also have improved with time. A glance over the docs suggests that you still can't specify a model at runtime in the request, though, which if true is probably the single biggest improvement that Ollama offers. I'm constantly switching between tiny models that fit on my GPU and large models that use RAM, and being able to download a new model with ollama pull and immediately start using it in Open WebUI is a huge plus.
I just use ollama. It works on my mac and windows machine and it's super simple to install + run most open models. And you can have another tool just shell out to it if you want more than than the CLI.
I found it only took me ~20 minutes to get Open-WebUI and Ollama going on my machine locally. I don't really know what is happening under the hood but from 0 to chat interface was definitely not too hard.
If anyone on macOS wants to use llama.cpp with ease, check out https://recurse.chat/. Supports importing ChatGPT history & continue chats offline using llama.cpp. Built this so I can use local AI as a daily driver.