Neural nets virtually always get stuck in local optima. I have no idea what you'...

Houshalter · on Feb 6, 2016

>The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band of low critical points, and that all critical points found there are local minima of high quality measured by the test error. This emphasizes a major difference between large- and small-size networks where for the latter poor quality local minima have non-zero probability of being recovered.

argonaut · on Feb 7, 2016

Your link supports my point, not yours. I'm aware of research showing that local minima are close to the global minima. That does not mean neural nets usually converge to the global minima, only that, the local minima they converge to is close to the global minimum.

> Finally, we prove that recovering the global minimum becomes harder as the network size increases