*"We know why gradient descent and back-propagation works."* I would phrase that...

Leszek · on Feb 6, 2016

Well, we do know why gradient descent works (for smooth data), at least for finding a local minimum, because finding the minimum is basically what it does by construction. Similarly, we certainly know how back-propagation works, because it's simple calculus, backwards application of the chain rule.

Perhaps what you're trying to say is we don't know why finding local minima of these problems is good at solving the problem?