Because ML models don’t exist in a vacuum. The intentions and biases of the people who build and use them affect which models exist, and how they’re used. Creating a perception that models are unbiased mathematical oracles because the dataset is unbiased can be used to support harmful uses.
ML models don't exist in a vacuum, but they do exist in an empirical reality. And in an empirical reality, there is always the fundamental unbiased measure of success: predicting whatever it is the model is built to predict.
And this, I think is the knife that separates the different schools of thought on the issue.
People who are judging whether an ML model is "good" or "bad" based on this criteria necessarily see the accusation of "bias" as a claim that their model is not successfully predicting things. They rightfully retort that they would do a better job with an unbiased dataset. To argue they are their models are always wrong on their terms is to argue that there is a Ken Thompson-like hack in their mathematics. [1]
On the other hand, people who judge ML models by criteria like how they might be used or interpreted by laypeople are fundamentally talking about something other than ML models-qua-mathematical models. To the modelers, you might as well be arguing that the theory of nuclear fission is biased against the Japanese. But you are not actually talking about the empirical quality of their model, and so on your own terms you are correct. The models can be used improperly, and researchers should be careful about how their findings are perceived.
Thanks for exactly this example - nobody stated yet but the charitable / best faith interpretation I can see of the Gebru angle here really is exactly that there’s a Ken Thompson hack at play, or at least a high risk of vulnerability to such a hack.
I just don’t know how one would prove it, and as others have noted I don’t understand what the mitigating alternative in the short term should be other than just stopping the research.
Fundamentally, the choice of training data set, and the biases that went into it's collection.
Also, in the case of statistical models, the crafting of the trained features themselves.
Actually, this is also relevant for neural networks despite the fact that they learn their own features because some amount of "framing" of the raw data often takes place in order to focus the neural network on the portion of the input data the trainer sees as relevant. This removes noise, but also removes context.
You asked about the biases of the people building the model, which is what I answered.
You didn't ask about the biases that occur during the requirements specification stage, or the biases that occur during operational implementation of the trained model.
Those are just as important - and arguably even more important - than the choice of the training data and the technical implementation.
The responsibility for the ethics of using ML neither begins nor ends with the ML engineer who builds the machine, and there are serious questions arising from the application of ML in certain domains that cannot simply be addressed by "better training data".