So in the article you write that you found 20% errors in the data, but at what point do you conclude that “this is an error in the data” and “this is an error in the prediction”?
Is that done manually?
Also, do you have strategy for finding errors, where the model learned to mislabel items in order to increase its score? (E.g, red trucks are labeled red cars in both train and test)
There was indeed a manual review of the "potential errors" highlighted by our algorithm to determine is it was indeed an error in the data or if it was an error in the prediction. The 20% corresponds to the proportion of objects that was corrected with this manual review. So it's actually likely that some errors (that were not found by our algorithm) are still in our clean version of the dataset.
Is that done manually?
Also, do you have strategy for finding errors, where the model learned to mislabel items in order to increase its score? (E.g, red trucks are labeled red cars in both train and test)