Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, absolutely -- you'll probably pull 100+ token/s.

Here's a good range of model sizes that run just fine with llama.cpp on mac: https://huggingface.co/telosnex/fllama/tree/main.

I recommend trying the Telosnex* app, it uses llama.cpp and abstracts over LLMs so you can i.e. switch between local/servers at will.

The important part for you is its free, accelerated on macOS, and very easy to use local LLMs with (Settings > AI > LLM > On Device, tap Get)

Prepare to be underwhelmed, slightly: its only when you start hitting 3B that its coherent, anything under that will feel more like a markov chain than an LLM.

Depending on how geeked out you'll be to have it running locally, you might have fun with that Telosnex can run local models on every platform, i.e. you can run local models on iOS/Android/web too.

* because it's mine :3 It is quietly released currently. I want to get one more major update before widely announcing it in Jan 2025




I have no interest in that, I would like small models that I can integrate and run offline in software that I make it myself be IDEs or games. CLion has a nice predictive model for single line C++ completion that has 400 MBs.


Ah, totally possible, but wrapping llama.cpp will likely take a week to spike out and a month to stabilize across models.

The biggest problem for relying on it for local software is there's just too much latency for ex. game use cases currently. (among other UX bugaboos) (https://news.ycombinator.com/item?id=42561095)


Sorry to side track, but question about Telosnex - would you consider a Linux release with something other than Snap? Maybe Flatpak or appimage?


If its a (mostly) CI-able process, I'm totally open to it ---

I looked into "What should I do besides Snap?" about 4 months ago; got quickly overwhelmed, because I don't have enough knowledge to understand what's fringe vs. common.

I'll definitely take a look at Flatpak again in the next month, 30 second Google says its possible (h/t /u/ damiano-ferrari at https://www.reddit.com/r/FlutterDev/comments/z35gdo/can_you_...)

(thanks for your interest btw, been working on this for ~year and this is my first outside feature request :) may there be many more)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: