Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s a heuristic argument that critical points are extremely unlikely to be local minima (ie positive definite second derivative). Loss surfaces of DNNs do typically have a global minimum (zero if they fit the training data exactly).


Arguably, a DNN seems likely to have many global minimums - given the level of (over)parametrization commonly used, a set of parameters that gets the lowest possible loss won't be unique, there will be huge sets of parameters that give exactly identical results.


Due to symmetry, at least, there are many global minima, but with the same minimum value.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: