Hacker News new | past | comments | ask | show | jobs | submit login

I'm using an M2 64GB MacBook Pro. For the Llama 8B one I would expect 16GB to be enough.

I don't have any experience running models on Windows or Linux, where your GPU VRAM becomes the most important factor.




On Windows or Linux you can run from RAM or split layers between RAM and VRAM; running fully on GPU is faster than either of those, but the limit on what you can run at all isn’t VRAM.


So is it possible to load the ollama deepseek-r1 70b (43gb) model on my 24gb vram + 32gb ram machine? Does this depend on how I load the model, i.e., with ollama instead of other alternatives? Afaik, ollama is basically llama.cpp wrapper.

I have tried to deploy one myself with openwebui+ollama but only for small LLM. Not sure about the bigger one, worried if that will crash my machine someway. Are there any docs? I am curious about this and how that works if any.


Why isn't GPU VRAM a factor on a Silicon mac?


Because there's no VRAM. The “regular” RAM on Apple Silicon devices is shared with the GPU virtually in its entirety.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: