Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Impossible? It’s just a bunch of math, you don’t need to keep the entire network in memory the whole time.


Well, any scheme where weights are dynamically loaded/unloaded from memory enough to fit on a 48GB GPU are so slow that training is basically impractical. Your 70B model would be obsolete by the time the finetuning is done.

Some inference frameworks came up with schemes for just this, and it was horrifically slow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: