Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The quantized model fits in about 20 GB, so 32 would probably be sufficient unless you want to use the full context length (long inputs and/or lots of reasoning). 48 should be plenty.


I‘ve tried the very early Q4 mlx release on an M1 Max 32GB (LM Studio @ default settings), and have run into severe issues. For the coding tasks I gave it, it froze before it was done with reasoning. I guess I should limit context size. I do love what I‘m seeing though, the output reads very similar to R1, and I mostly agree with its conclusions. The Q8 version has to be way better even.


Does the Q8 fit within your 32GB (also using an M1 32GB)


No, Q4 just barely fits, and with a longer context sometimes things freeze. I definitely have to close Xcode.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: