Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Running it in a MacBook Pro entirely locally is possible via Ollama. Even running the full model (680B) is possible distributed across multiple M2 ultras, apparently: https://x.com/awnihannun/status/1881412271236346233


That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe!


Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.


the 70B distilled version that you can run locally is pretty underwhelming though




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: