And slow. They never tell you that quantization of many LLMs slows down your inf... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Der_Einzige on June 7, 2023 | parent | context | favorite | on: Bard is getting better at logic and reasoning

And slow. They never tell you that quantization of many LLMs slows down your inference, sometimes by orders of magnitude.

arugulum on June 7, 2023 [–]

It depends on the quantization method, but yes some of the most commonly used ones are extremely slow.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact