The paper says that: "Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data"
What is a generalization of a random labeling? Does the network become psychic? Is this saying "garbage works fine by these measures"?
The network learns how to map data into an intermediate representation, then learns a hash output for each of these intermediate values. So it learns two things compared to when the intermediate values correspond to the task.
What is a generalization of a random labeling? Does the network become psychic? Is this saying "garbage works fine by these measures"?