Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"We know why gradient descent and back-propagation works."

I would phrase that differently: we know when gradient descent and back-propagation work, not why they work for so many real world problems.

For the when, there are zillions of published mathematical results stating "if a problem has property X, this and this method will find a solution with property Y in time T", in zillions of variations (Y can be the true optimum, a value within x% of the real optimum, the true optimum 'most of the time', etc. T can be 'eventually', 'after O(n^3) iterations', 'always', etc)

However, for most real-world problems we do not know whether they have property X, or even how to go about arguing that it is likely they have property X, other than the somewhat circular "algorithm A seems to work well for it, and we know it works for problems of type X"



Well, we do know why gradient descent works (for smooth data), at least for finding a local minimum, because finding the minimum is basically what it does by construction. Similarly, we certainly know how back-propagation works, because it's simple calculus, backwards application of the chain rule.

Perhaps what you're trying to say is we don't know why finding local minima of these problems is good at solving the problem?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: