You will simply need a lot of GPU cores/VRAM. On my $4,000 Mac Studio M2 Ultra w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jdelsman 11 months ago \| parent \| context \| favorite \| on: A Practical Guide to Running Local LLMs You will simply need a lot of GPU cores/VRAM. On my $4,000 Mac Studio M2 Ultra with 64GB of RAM, I can comfortably run deepseek-r1:32b, but a) load times can be annoying (i.e. if you are switching models for different tasks, or let them idle out) and b) you can certainly tell that it requires tuning of the context length, temperature, etc. based on what you need to do. Compare that with the commercial models where a lot of that is done on a large scale for you.

smjburton 11 months ago [–]

Yeah that makes sense. Once the model is loaded though, does it work well in comparison to the commercial models? Do you find that the local models hallucinate more, or don't give the same response quality?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact