Running it in a MacBook Pro entirely locally is possible via Ollama. Even runnin...

vessenes · 2025-01-27T18:35:37 1738002937

That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe!

bildung · 2025-01-28T14:36:12 1738074972

Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.

rsanek · 2025-01-27T18:26:06 1738002366

the 70B distilled version that you can run locally is pretty underwhelming though