That's not what I got from LeCun's comment. I read it more like:
LeCun - "ML is biased when datasets are biased. It's not the responsibility of researchers to ensure that ML is used responsibly in all cases, but the responsibility of the engineers of a particular implementation who need to use the correct models for the task at hand."
> "...I don't see any ethical obligation to use "unbiased" datasets for pure research or tinkering with models..."
I don't think his comment was addressing the larger ethical discussion at all. I didn't interpret it as a discussion of ethical responsibilities, rather a strictly technical, matter-of-fact statement about the nature of ML training.
Please don't interpret my comment as an attack on yours, it was more pointing out I interpreted his statement differently.
I guess I see your view and my view as two sides of the same coin - that “research ethics” is different from “application ethics”. I inferred that view from the following exchange:
Twitter user: “ML researchers need to be more careful selecting their data so that they don't encode biases like this.”
YLC: “Not so much ML researchers but ML engineers. The consequences of bias are considerably more dire in a deployed product than in an academic paper.”
Perhaps I’m wrong. That’s the whole problem with Twitter though - you can’t convey much nuance or sophistication in 140 characters.
your summary is not making a lot of sense either to be honest. well, at least the last part...
> I'm open to being convinced if she had made any effort to show/prove that "use of biased datasets in research" is correlated with "biased outcomes in real world production deployments".
what does that mean? is there anything that would auto-magically eliminate bias if it were introduced into research?
> what does that mean? is there anything that would auto-magically eliminate bias if it were introduced into research?
Let me rephrase. Yann is basically saying "bias is the engineer's responsibility, not the researcher's". Gebru (presumably) disagrees.
Now I might agree with Gebru if:
(a) she can show empirically that "researchers releasing biased datasets/models" is correlated with "real-world deployment of said datasets/models that leads to injustice"; and
(b) she can make a convincing argument why one person (a researcher) should be responsible for the actions of another (an engineer).
But she didn't address either these points on Twitter. She actually didn't bother to address anything on Twitter. Her whole argument was "You're wrong, I'm tired of explaining, you need to listen to minorities, I'm not going to engage".
That's not reasoned discussion or debate. It's posturing and point-scoring. The Twitter format only serves to encourage this type of interaction, so Yann basically gave up on the whole platform.
okay, so... you seem to understand where the other researcher is coming from and agree with most points. i am also going to assume that you read, or perhaps know, some of the sources cited numerous times on this page.
but because she did not explicitly state those on twitter, or because of the way she brought it up, we need to invalidate her whole argument?
> but because she did not explicitly state those on twitter, or because of the way she brought it up, we need to invalidate her whole argument?
No-one said anything that could be remotely interpreted as "her whole argument is invalid".
I'm sure he'd be more than happy to discuss with Gebru where he agrees and where he differs on his Facebook page or at a conference panel. I think he explicitly said this.
He's just decided that Twitter is not the platform for that kind of reasoned debate. Gebru's attitude in this instance - providing nothing more than "I'm tired of this, you need to listen to marginalized communities" - was the straw that broke the camel's back.
Because the points of disagreement are the reason she's upset, and the reason there is an argument in the first place.
Of course she's right about all the things that everyone agrees on. Everyone in the conversation is right about most points, if you break down their stance into a list of points.
It's not that the points of disagreement invalidate the correct points, it's that having a bunch of correct points doesn't really tell you much about the thesis.
LeCun - "ML is biased when datasets are biased. It's not the responsibility of researchers to ensure that ML is used responsibly in all cases, but the responsibility of the engineers of a particular implementation who need to use the correct models for the task at hand."