Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The full thing, 671B. It loses some intelligence at 1.5 bit quantisation, but it's acceptable. I could actually go for around 3 bits if I max out my RAM, but I haven't done that yet.


I've seen people say the models get more erratic at higher (lower?) quantization levels. What's your experience been?


If you mean clearly, noticeably erratic or incoherent behaviour, then that hasn't been my experience for >=4-bit inference of 32B models, or in my R1 setup. I think the others might have been referring to this happening with smaller models (sub-24B), which suffer much more after being quantised below 4 or 5 bits.

My R1 most likely isn't as smart as the output coming from an int8 or FP16 API, but that's just a given. It still holds up pretty well for what I did try.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: