Yeah, absolutely -- you'll probably pull 100+ token/s. Here's a good range of mo...

a1o · 2025-01-02T23:01:22 1735858882

I have no interest in that, I would like small models that I can integrate and run offline in software that I make it myself be IDEs or games. CLion has a nice predictive model for single line C++ completion that has 400 MBs.

refulgentis · 2025-01-02T23:35:13 1735860913

Ah, totally possible, but wrapping llama.cpp will likely take a week to spike out and a month to stabilize across models.

The biggest problem for relying on it for local software is there's just too much latency for ex. game use cases currently. (among other UX bugaboos) (https://news.ycombinator.com/item?id=42561095)

qskousen · 2025-01-02T21:37:04 1735853824

Sorry to side track, but question about Telosnex - would you consider a Linux release with something other than Snap? Maybe Flatpak or appimage?

refulgentis · 2025-01-02T22:35:13 1735857313

If its a (mostly) CI-able process, I'm totally open to it ---

I looked into "What should I do besides Snap?" about 4 months ago; got quickly overwhelmed, because I don't have enough knowledge to understand what's fringe vs. common.

I'll definitely take a look at Flatpak again in the next month, 30 second Google says its possible (h/t /u/ damiano-ferrari at https://www.reddit.com/r/FlutterDev/comments/z35gdo/can_you_...)

(thanks for your interest btw, been working on this for ~year and this is my first outside feature request :) may there be many more)