> the recent (-1,0,1) encoding? A side point, but this "recent" encoding goes ba...

paul_mk1 · on March 9, 2024

Goes back before then. This got popularized by BinaryConnect in 2015, and groups were training binary networks as early as 2011.

You are probably referring to XNOR net, and the novel piece there was also using binary activations (which bitnet is not).

So as far as I can tell, bitnet is basically BinaryConnect applied to LLMs.

https://arxiv.org/abs/1511.00363

gumby · on March 10, 2024

Thanks for your informative comment. What HN is for!

mmoskal · on March 9, 2024

The bitnet paper was showing worse results than fp16 transformer with the same parameter count. The shocking result in the 1.58b paper (same group) is no quality loss compared to fp16.