I'm using an M2 64GB MacBook Pro. For the Llama 8B one I would expect 16GB to be...

dragonwriter · 2025-01-21T16:08:25 1737475705

On Windows or Linux you can run from RAM or split layers between RAM and VRAM; running fully on GPU is faster than either of those, but the limit on what you can run at all isn’t VRAM.

akhdanfadh · 2025-01-22T14:24:11 1737555851

So is it possible to load the ollama deepseek-r1 70b (43gb) model on my 24gb vram + 32gb ram machine? Does this depend on how I load the model, i.e., with ollama instead of other alternatives? Afaik, ollama is basically llama.cpp wrapper.

I have tried to deploy one myself with openwebui+ollama but only for small LLM. Not sure about the bigger one, worried if that will crash my machine someway. Are there any docs? I am curious about this and how that works if any.

rane · 2025-01-21T09:42:51 1737452571

Why isn't GPU VRAM a factor on a Silicon mac?

Nekit1234007 · 2025-01-21T10:34:29 1737455669

Because there's no VRAM. The “regular” RAM on Apple Silicon devices is shared with the GPU virtually in its entirety.