Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Possible and has been done, but super-slow and inefficient resulting in long training times for small models. To keep compute occupied you need to pass gradients very fast.



This is what piqued my interest in the first place


Yes but could you break it up into chunks of sets of gradients to compute? I know that compute needs the full chunk to compute a set. Again, things I’m exploring but ultimately no different than just having the full dataset on disk and just scaling out compute nodes in ro mode.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: