Hacker News new | past | comments | ask | show | jobs | submit login

And slow. They never tell you that quantization of many LLMs slows down your inference, sometimes by orders of magnitude.



It depends on the quantization method, but yes some of the most commonly used ones are extremely slow.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: