Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh, very good question - tbh im not sure. Another close technique is layer offloading - if your network can't fit and has layers 1, 2, ..., 32, we offload layers 16 to 32 to RAM, then load them in to GPU memory on the fly.

I'm gonna guess the performance hit is similar - although I have not tried it myself to verify for benchmarking



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: