Yeah, but it is indirectly the biases of the researchers. A researcher is more likely to notice and correct for training data problems that conflict with their biases.
This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.
Come to think of it, isn’t that an interesting venue for GAN-esque methods to detect the relation of patterns falling into these categories of biases? Or is that recursive problem? If not, put me in the paper :-)
> This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.
That's not strictly true. In a lot of cases you start out oblivious to biases in the data, and then when you evaluate the model you notice problems.
But your point about obliviousness to bias is exactly what I'm speaking towards. One might be oblivious to bias that aligns with your own biases, but notice bias that conflicts with it.
People doing ML “research” are not the ones applying it to specific data sets day to day. “I pointed a neural net at our sales data” is not research in the normal sense.
Yeah, I look at the semantic problem with calling it "ML research" and just throw up my arms. These discussions aren't generally driven by people who care about semantics.