Wouldn’t it be 1GB (billion bytes) per billion parameters when each parameter is... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		bionhoward 5 months ago \| parent \| context \| favorite \| on: Qwen3: Think deeper, act faster Wouldn’t it be 1GB (billion bytes) per billion parameters when each parameter is 1 byte (FP8)? Seems like 4 bit quantized models would use 1/2 the number of billions of parameters in bytes, because each parameter is half a byte, right?

daemonologist 5 months ago [–]

Yes, it's more a rule of thumb than napkin math I suppose. The difference allows space for the KV cache which scales with both model size and context length, plus other bits and bobs like multimodal encoders which aren't always counted into the nameplate model size.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact