gpt-4o-mini might not be the best point of reference for what good LLMs can do w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		coder543 6 months ago \| parent \| context \| favorite \| on: Tabby: Self-hosted AI coding assistant gpt-4o-mini might not be the best point of reference for what good LLMs can do with code: https://aider.chat/docs/leaderboards/#aider-polyglot-benchma... A teeny tiny model such as a 1.5B model is really dumb, and not good at interactively generating code in a conversational way, but models in the 3B or less size can do a good job of suggesting tab completions. There are larger "open" models (in the 32B - 70B range) that you can run locally that should be much, much better than gpt-4o-mini at just about everything, including writing code. For a few examples, llama3.3-70b-instruct and qwen2.5-coder-32b-instruct are pretty good. If you're really pressed for RAM, qwen2.5-coder-7b-instruct or codegemma-7b-it might be okay for some simple things. > medium specced macbook pro medium specced doesn't mean much. How much RAM do you have? Each "B" (billion) of parameters is going to require about 1GB of RAM, as a rule of thumb. (500MB for really heavily quantized models, 2GB for un-quantized models... but, 8-bit quants use 1GB, and that's usually fine.)

eurekin 6 months ago | [–]

Also context size significantly impacts ram/vram usage and in programming those chats get big quickly

Ringz 6 months ago | [–]

Thanks for your explanation! Very helpful!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact