Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As the model weights (even quantized) would be several hundred GBs, it’s unlikely, unless special inference code is written that loads and processes only a small subset of weights and calculations at a time. But running it that way would be painfully slow.


The code is already there: DeepSpeed




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: