Hacker News new | past | comments | ask | show | jobs | submit login

Depends what you’re trying to do. I’m writing a personal assistant app (speech to text) and want to classify the user input according to the current actions I support (or don’t). The flagship LLMs are pretty great at it if you include the classes in the prompt and they will spit out structured output every time. But, man, they are expensive and there’s the privacy aspect I’d prefer to adhere to. I’ve only got 24 GB of RAM, so I can’t run too many fancy local models and things like llama3.1:8b don’t classify very well.

So I’m trying BERT models out :)






Try some of the Quen models. They have some that are slightly larger than 8b that will fit on your 24gb quite nicely. They have been amazing so far.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: