LeCun - ML is biased when datasets are biased. But unlike deploying to real world problems, I don't see any ethical obligation to use "unbiased" datasets for pure research or tinkering with models.
Gebru - This is wrong, this is hurtful to marginalized people and you need to listen to them. Watch my tutorial for an explanation.
Headlines from tutorial (that Gebru didn't even link herself): The CV community is largely homogenous and has very few black people. Here's a bunch of startups that purport to use CV to predict IQ, hiring, etc.
Marginalized people don't work on these platforms and there's no legal vetting for fairness before these platforms are deployed. Facial analysis has the highest rate of inaccuracy (gender classification) on fair-skinned men (?) and dark-skinned women. Datasets are usually white/male. Most object detection models are biased towards Western concepts (e.g. marriage). Crash test dummies are representative of males, so women and children are overrepresented in car crash injuries. Nearest neighbor image search is a unfair because of automation bias and surveillance bias. China is using face detection for surveilling ethnic minorities. Amazon's face recognition sold to police had the same biases (greater difficulty distinguishing between black women).
Now, I largely agree with what Gebru said in the tutorial. So does LeCun, who explicitly agreed a number of times that biased datasets/models should never be used for deployed solutions.
But it's a huge leap in logic to then demand that every research dataset be "unbiased". It's like criticizing someone for using exclusively male Lego figures to storyboard a movie shoot, or if I attacked a Chinese researcher because they only used Chinese faces to train a generative model, and none the outputs looked anything like me.
That being said, I'm open to being convinced if she had made any effort to show/prove that "use of biased datasets in research" is correlated with "biased outcomes in real world production deployments". But she didn't, which is why her criticism of LeCun smacks of cheap point-scoring rather than genuine debate (a criticism I made of Twitter generally the last time this topic came up).
I became convinced of cheap point-scoring when LeCun invited Gebru out for a chat because he felt like twitter wasn’t the right medium and got a link with how to properly apologize back, but this rundown would have saved me a lot of time when I was trying to figure out wtf happened.
Do you believe that industry uses pre-trained model that researchers release?
Do you believe that industry uses pre-made datasets that researchers promote in their work?
Would yes to the above two question be sufficient to show "use of biased datasets in research" is correlated with "biased outcomes in real world production deployments"?
"What I believe" is completely irrelevant, because it has zero grounding in research or evidence. She's claiming that we should listen to her because she's an expert, but then provides no evidence or explanation between the two for a causal link.
I'm not saying she's wrong, I'm saying we don't know because she defaulted to the argument that "you should listen to minorities", not "here is the evidence".
What's more, every single example of injustice in her tutorial was an image recognition/classification problem - entirely different from the generative model that originally sparked the debate.
I don't think there's an argument that needs to be made; it's pretty clear to me that people use researcher-produced models in production all the time without regard for whether they're biased because it's the easiest solution. If you think there needs to be an argument made to justify that, that's fine, but I don't think it's valuable to assume that people come into a discussion without a basic understanding of the software engineering ecosystem.
And the point being made isn't "biased input data isn't responsible for a biased model", it's "you need to look one step further and ask why is the input data biased and how that impacts the world."
I want to point out that while I disagree with you, you've already engaged in far more actual debate than Gebru did on Twitter.
Like I said, LeCun (and myself) largely agree with most of Gebru's points. But when LeCun went to great lengths to defend/explain his position, Gebru then literally responded with "I don't have time for this". Even before that, she didn't even bother to link the presentation she referred to (which again, didn't even directly address any of the points that Yann was making!).
It's this complete lack of good-faith engagement that prompted LeCun to quit Twitter, not the underlying discussion on ethics itself. LeCun clearly feels that Twitter is not the place for reasonable discussion, and after this episode, I'm inclined to agree.
> If you think there needs to be an argument made to justify that, that's fine, but I don't think it's valuable to assume that people come into a discussion without a basic understanding of the software engineering ecosystem.
I'm not saying that people don't use off-the-shelf models. I'm saying that I don't know if forcing research datasets to be "unbiased" will make any difference to real-world injustice. I don't even know if any of the examples of bias she gave in her tutorial (HireQ/Microsoft/etc) could be ascribed to the use of pretrained models. She could be right. I don't know. You probably don't either.
Going beyond the empirical question, she'd also need to explicitly argue why responsibility should lie at the feet of the researcher, not the engineer. Gebru did neither, which is why I say it's a huge leap of logic.
That, fundamentally, is LeCun's position. He completely agrees that warning labels should be put on these kind of models that say "Model has been trained on biased data and unsuitable for use in real-world applications where racial fairness is expected". In fact, this is exactly what the authors did.
> And the point being made isn't "biased input data isn't responsible for a biased model", it's "you need to look one step further and ask why is the input data biased and how that impacts the world."
And I'd argue you need to account for the context in which your model is deployed. If I'm using StyleGAN to synthesize facial textures for a video game, biased datasets and models are desirable, not something to be eliminated. I'll use the appropriately biased model depending on whether I want to generate white faces, Chinese faces, or black faces.
It's the use case that dictates the risk, hence why LeCun (and I) believe it's the engineer's responsibility, not the researcher.
In fact, they did so after Timnit brought up her objection, and did so by taking advantage of Timnit's research (the model card they added is a direct result of research Timnit was involved with: https://arxiv.org/abs/1810.03993).
Which is precisely why LeCun - like myself - agrees with 95% of what Gebru has said in the past.
The issue at hand was her lack of good-faith engagement on Twitter and the subsequent pile-on from the mob. LeCun is quitting Twitter, he's not quitting ethical debates.
> Which is precisely why LeCun - like myself - agrees with 95% of what Gebru has said in the past.
Even now, it's not clear to me that this is the case. LeCun still hasn't actually acknowledged any of the broader ethical arguments Gebru made, either on twitter or on his followup posts on facebook.
In fact, he makes no references to her research anywhere (beyond the vaguest "I value the research you're doing" in his apology tweet). I found that rather suspicious, I still do.
Like, having read through the entire conversation, I have no confidence that Yann could explain any of Timnit's research if asked about it, even in broad strokes. That's really, really weird given everything that happened.
> Even now, it's not clear to me that this is the case. LeCun still hasn't actually acknowledged any of the broader ethical arguments Gebru made, either on twitter or on his followup posts on facebook.
Well, to be fair, she didn't provide Yann with any ethical arguments in this instance.
But being equally fair, you're right, I can't speak for Yann. I can only speak for myself, and I personally agree with most of what I've read of Gebru's work (though not all).
But the issue at hand wasn't the research itself - it's the way the dialogue was conducted on Twitter, and Gebru accounted for herself very poorly.
> Well, to be fair, she didn't provide Yann with any ethical arguments in this instance.
Yes and no. A lot of what I'm saying is directly from the tutorial she repeatedly suggested he watch. Because I took the time to watch it, because that's the reasonable thing to do when someone suggests that you aren't fully informed on a subject and suggests a resource to improve your understanding.
A major aspect of crypto research today is crypto UX and making crypto systems that are difficult to misuse. There are academics who actively work on these issues. They aren't the only academics obviously, but they exist.
Building ML systems that are difficult to misuse is underexplored, and Timnit is one of the relatively few researchers actively doing work in this area.
I'd call them both examples of applied cryptography research. I think these projects compare very, very closely to applied ML research:
They come out of industry research labs, are worked on by respected experts, usually involving some academics, ultimately you end up with an artifact beyond just a paper that is useful for something and improves upon the status quo.
I'm admittedly not a total expert, so I don't know how far down to the level of crypto "primitives" this kind of work goes, but I believe there is some effort to pick primitives that are difficult to "mess up" (think "bad primes") and I know tink actively prevents you from making bad choices in the cases where you are forced to make a choice.
Even more broadly, just consider any tptacek (who I should clarify is *not a researcher, lest he correct me) post on pgp/gpg email, or people like Matt Green (http://mattsmith.de/pdfs/DevelopersAreNotTheEnemy.pdf).
Edit: Some poking around also brought up this person: https://yaseminacar.de/, who has some interesting papers on similar subjects.
Misuse has two meaning: to use for a bad purpose (criminals using it to do bad things) or use it incorrectly (hold it wrong). I was using misuse in the "hold it wrong" sense, but I agree that there's ambiguity there.
That's not what I got from LeCun's comment. I read it more like:
LeCun - "ML is biased when datasets are biased. It's not the responsibility of researchers to ensure that ML is used responsibly in all cases, but the responsibility of the engineers of a particular implementation who need to use the correct models for the task at hand."
> "...I don't see any ethical obligation to use "unbiased" datasets for pure research or tinkering with models..."
I don't think his comment was addressing the larger ethical discussion at all. I didn't interpret it as a discussion of ethical responsibilities, rather a strictly technical, matter-of-fact statement about the nature of ML training.
Please don't interpret my comment as an attack on yours, it was more pointing out I interpreted his statement differently.
I guess I see your view and my view as two sides of the same coin - that “research ethics” is different from “application ethics”. I inferred that view from the following exchange:
Twitter user: “ML researchers need to be more careful selecting their data so that they don't encode biases like this.”
YLC: “Not so much ML researchers but ML engineers. The consequences of bias are considerably more dire in a deployed product than in an academic paper.”
Perhaps I’m wrong. That’s the whole problem with Twitter though - you can’t convey much nuance or sophistication in 140 characters.
your summary is not making a lot of sense either to be honest. well, at least the last part...
> I'm open to being convinced if she had made any effort to show/prove that "use of biased datasets in research" is correlated with "biased outcomes in real world production deployments".
what does that mean? is there anything that would auto-magically eliminate bias if it were introduced into research?
> what does that mean? is there anything that would auto-magically eliminate bias if it were introduced into research?
Let me rephrase. Yann is basically saying "bias is the engineer's responsibility, not the researcher's". Gebru (presumably) disagrees.
Now I might agree with Gebru if:
(a) she can show empirically that "researchers releasing biased datasets/models" is correlated with "real-world deployment of said datasets/models that leads to injustice"; and
(b) she can make a convincing argument why one person (a researcher) should be responsible for the actions of another (an engineer).
But she didn't address either these points on Twitter. She actually didn't bother to address anything on Twitter. Her whole argument was "You're wrong, I'm tired of explaining, you need to listen to minorities, I'm not going to engage".
That's not reasoned discussion or debate. It's posturing and point-scoring. The Twitter format only serves to encourage this type of interaction, so Yann basically gave up on the whole platform.
okay, so... you seem to understand where the other researcher is coming from and agree with most points. i am also going to assume that you read, or perhaps know, some of the sources cited numerous times on this page.
but because she did not explicitly state those on twitter, or because of the way she brought it up, we need to invalidate her whole argument?
> but because she did not explicitly state those on twitter, or because of the way she brought it up, we need to invalidate her whole argument?
No-one said anything that could be remotely interpreted as "her whole argument is invalid".
I'm sure he'd be more than happy to discuss with Gebru where he agrees and where he differs on his Facebook page or at a conference panel. I think he explicitly said this.
He's just decided that Twitter is not the platform for that kind of reasoned debate. Gebru's attitude in this instance - providing nothing more than "I'm tired of this, you need to listen to marginalized communities" - was the straw that broke the camel's back.
Because the points of disagreement are the reason she's upset, and the reason there is an argument in the first place.
Of course she's right about all the things that everyone agrees on. Everyone in the conversation is right about most points, if you break down their stance into a list of points.
It's not that the points of disagreement invalidate the correct points, it's that having a bunch of correct points doesn't really tell you much about the thesis.
LeCun - ML is biased when datasets are biased. But unlike deploying to real world problems, I don't see any ethical obligation to use "unbiased" datasets for pure research or tinkering with models.
Gebru - This is wrong, this is hurtful to marginalized people and you need to listen to them. Watch my tutorial for an explanation.
Headlines from tutorial (that Gebru didn't even link herself): The CV community is largely homogenous and has very few black people. Here's a bunch of startups that purport to use CV to predict IQ, hiring, etc. Marginalized people don't work on these platforms and there's no legal vetting for fairness before these platforms are deployed. Facial analysis has the highest rate of inaccuracy (gender classification) on fair-skinned men (?) and dark-skinned women. Datasets are usually white/male. Most object detection models are biased towards Western concepts (e.g. marriage). Crash test dummies are representative of males, so women and children are overrepresented in car crash injuries. Nearest neighbor image search is a unfair because of automation bias and surveillance bias. China is using face detection for surveilling ethnic minorities. Amazon's face recognition sold to police had the same biases (greater difficulty distinguishing between black women).
Now, I largely agree with what Gebru said in the tutorial. So does LeCun, who explicitly agreed a number of times that biased datasets/models should never be used for deployed solutions.
But it's a huge leap in logic to then demand that every research dataset be "unbiased". It's like criticizing someone for using exclusively male Lego figures to storyboard a movie shoot, or if I attacked a Chinese researcher because they only used Chinese faces to train a generative model, and none the outputs looked anything like me.
That being said, I'm open to being convinced if she had made any effort to show/prove that "use of biased datasets in research" is correlated with "biased outcomes in real world production deployments". But she didn't, which is why her criticism of LeCun smacks of cheap point-scoring rather than genuine debate (a criticism I made of Twitter generally the last time this topic came up).