Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not a stupid question at all.

Early quantization approaches just quantized a high-bit-precision pre-trained model - no need to calculate gradients on the quantized weights. BitNet[1] changed the game by training a low-precision model from scratch. It achieves this by keeping high precision in the gradients, optimizer state, and in "latent weights" which are then quantized on the fly. I don't really understand the finer details of how this works, so check out the paper if you're interested.

This article's approach is interesting. They start by quantizing a pre-trained high-precision model, and then they fine-tune the quantized model using LoRA (which dramatically improves the performance of the quantized model). They don't talk about the bit depth of the values in the LoRA matrices, so it may be that those are higher bit-depth.

[1] https://arxiv.org/pdf/2310.11453.pdf



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: