Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Training requires a lot more memory to keep gradients + gradient stats for the optimizer, and needs higher precision weights for the optimization. It's also much more parallelizable. But inference is kind of a subroutine of training.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: