I don't understand the random label training part. Presumably you train on rando...

magicalhippo · on March 4, 2021

This was the thing I misread first time around.

If you look at Table 1, you see that the models manage to train almost 100% correctly on the randomized labels, but crucially the control test score is down in the 10% region. This is in stark comparison to roughly 80-90% test score for the properly labeled data.

So it seems to me that when faced with structured data they manage to generalize the structure somehow, while when faced with random training data they're powerful enough to simply memorize the training data.

edit: so just point point out, obviously it's to be expected the test to be bad for random input, after all how can you properly test classification of random data?

So the point, as I understand it, isn't that the randomized input leads to poor test results, but rather that the non-randomized ones manages to generalize despite it being capable of simply memorizing the input.

caddemon · on March 4, 2021

AFAIK that's right, it would be very unlikely to generalize on random labels, which is why I read the comment as suggesting the network shouldn't have low training loss in that situation.