Thank you for the summarisation. Does the article give any examples of such arguments, or is it something stated as a commonly known fact? Coming from an artificial intelligence background I am not aware of such opinions. I know that deep neural nets were considered difficult to train until the re-discovery of backpropagation, but not because of anything to do with the shape of the error function.
However, as usual there is confusion about what "generalisation" means. For example, I was in a summer school at Oxford a couple of years ago and one of the lectures made a similar point, about the surprising generalisation ability of deep neural nets. I approached the lecturer after the lecture and asked what they meant because the way I knew it, neural nets can't generalise, and they explained that they meant that they generalise surprisingly well on the test set but not on unseen data, or out-of-distribution data, i.e. not on any data that was not available to the researcher during training (as training, validation or test set).
In other words, neural nets are great at "generalisation" in the sense of interpolation, but are almost completely incapable of "generalisation" in the form of extrapolation.
I like to quote Francois Chollet of Keras on this:
This stands in sharp contrast with what deep nets do, which I would call "local generalization": the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time. Consider, for instance, the problem of learning the appropriate launch parameters to get a rocket to land on the moon. If you were to use a deep net for this task, whether training using supervised learning or reinforcement learning, you would need to feed it with thousands or even millions of launch trials, i.e. you would need to expose it to a dense sampling of the input space, in order to learn a reliable mapping from input space to output space. By contrast, humans can use their power of abstraction to come up with physical models—rocket science—and derive an exact solution that will get the rocket on the moon in just one or few trials. Similarly, if you developed a deep net controlling a human body, and wanted it to learn to safely navigate a city without getting hit by cars, the net would have to die many thousands of times in various situations until it could infer that cars and dangerous, and develop appropriate avoidance behaviors. Dropped into a new city, the net would have to relearn most of what it knows. On the other hand, humans are able to learn safe behaviors without having to die even once—again, thanks to their power of abstract modeling of hypothetical situations.
However, as usual there is confusion about what "generalisation" means. For example, I was in a summer school at Oxford a couple of years ago and one of the lectures made a similar point, about the surprising generalisation ability of deep neural nets. I approached the lecturer after the lecture and asked what they meant because the way I knew it, neural nets can't generalise, and they explained that they meant that they generalise surprisingly well on the test set but not on unseen data, or out-of-distribution data, i.e. not on any data that was not available to the researcher during training (as training, validation or test set).
In other words, neural nets are great at "generalisation" in the sense of interpolation, but are almost completely incapable of "generalisation" in the form of extrapolation.
I like to quote Francois Chollet of Keras on this:
This stands in sharp contrast with what deep nets do, which I would call "local generalization": the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time. Consider, for instance, the problem of learning the appropriate launch parameters to get a rocket to land on the moon. If you were to use a deep net for this task, whether training using supervised learning or reinforcement learning, you would need to feed it with thousands or even millions of launch trials, i.e. you would need to expose it to a dense sampling of the input space, in order to learn a reliable mapping from input space to output space. By contrast, humans can use their power of abstraction to come up with physical models—rocket science—and derive an exact solution that will get the rocket on the moon in just one or few trials. Similarly, if you developed a deep net controlling a human body, and wanted it to learn to safely navigate a city without getting hit by cars, the net would have to die many thousands of times in various situations until it could infer that cars and dangerous, and develop appropriate avoidance behaviors. Dropped into a new city, the net would have to relearn most of what it knows. On the other hand, humans are able to learn safe behaviors without having to die even once—again, thanks to their power of abstract modeling of hypothetical situations.
https://blog.keras.io/the-limitations-of-deep-learning.html
In short, if the point of the article is that neural networks "work" because they generalise in the sense of extrapolation, then that's not right.