I recommend trying the Telosnex* app, it uses llama.cpp and abstracts over LLMs so you can i.e. switch between local/servers at will.
The important part for you is its free, accelerated on macOS, and very easy to use local LLMs with (Settings > AI > LLM > On Device, tap Get)
Prepare to be underwhelmed, slightly: its only when you start hitting 3B that its coherent, anything under that will feel more like a markov chain than an LLM.
Depending on how geeked out you'll be to have it running locally, you might have fun with that Telosnex can run local models on every platform, i.e. you can run local models on iOS/Android/web too.
* because it's mine :3 It is quietly released currently. I want to get one more major update before widely announcing it in Jan 2025
I have no interest in that, I would like small models that I can integrate and run offline in software that I make it myself be IDEs or games. CLion has a nice predictive model for single line C++ completion that has 400 MBs.
Ah, totally possible, but wrapping llama.cpp will likely take a week to spike out and a month to stabilize across models.
The biggest problem for relying on it for local software is there's just too much latency for ex. game use cases currently. (among other UX bugaboos) (https://news.ycombinator.com/item?id=42561095)
If its a (mostly) CI-able process, I'm totally open to it ---
I looked into "What should I do besides Snap?" about 4 months ago; got quickly overwhelmed, because I don't have enough knowledge to understand what's fringe vs. common.
Here's a good range of model sizes that run just fine with llama.cpp on mac: https://huggingface.co/telosnex/fllama/tree/main.
I recommend trying the Telosnex* app, it uses llama.cpp and abstracts over LLMs so you can i.e. switch between local/servers at will.
The important part for you is its free, accelerated on macOS, and very easy to use local LLMs with (Settings > AI > LLM > On Device, tap Get)
Prepare to be underwhelmed, slightly: its only when you start hitting 3B that its coherent, anything under that will feel more like a markov chain than an LLM.
Depending on how geeked out you'll be to have it running locally, you might have fun with that Telosnex can run local models on every platform, i.e. you can run local models on iOS/Android/web too.
* because it's mine :3 It is quietly released currently. I want to get one more major update before widely announcing it in Jan 2025