It’s a heuristic argument that critical points are extremely unlikely to be loca...

PeterisP · on Oct 20, 2020

Arguably, a DNN seems likely to have many global minimums - given the level of (over)parametrization commonly used, a set of parameters that gets the lowest possible loss won't be unique, there will be huge sets of parameters that give exactly identical results.

dougabug · on Oct 20, 2020

Due to symmetry, at least, there are many global minima, but with the same minimum value.