It seems the trick here is they first quantize it to 1- or 2-bit, and then they ...

andy_xor_andrew · on March 29, 2024

Interesting, and it kinda makes sense. You quantize, which invariably means you lose some precision, but then you can finetune post-quantization to recover at least some of it. Neat idea.

jimmySixDOF · on March 30, 2024

Which is itself a little counterintuitive as the arxiv papers they cite say models need to be pretrained from the ground up using 1- or 2-bit (or 1.58bit). It definitely adds some interesting data points for the open source community who are experimenting in every possible direction.