One very interesting thing that was mentioned in the interview is how much Facebook relies on deep learning right now; specifically, how hate speech detection went from 75% manual to something like 2.5% manual, and how manual false negatives detection allowed this improvement.
What I'm wondering is about false positive detection, which wasn't mentioned, and how much of this incredible decrease in false negatives came at the expense of an increase in false positives.
Anecdotally, FB's hate speech detector is pathetic. I have lots of friends run afoul of it for trivial things. It seems no more coherent than a bunch of regular expressions.
I had a post in a group get flagged by the algorithm for something "your politics shouldn't involve saying '[bad word] about [protected group]'".
I suspect the problem is there's nothing that would flag as wrong for a system that just defaults to such crudeness. So that's what happens.
I was once asked to use machine learning to make a record linkage system for some crappy dataset. I got no requirements, of course, so I set it up to have a reasonable balance of precision and recall. After all, the point of asking for an ML system must be to allow fuzzy matches that a simple exact matching system would miss, right?
But my boss apparently got complaints about bad matches, so he changed it to allow exact matches only.
The machine learning system ended up being a Rube Goldberg machine for linking people based on exact name match.
Anecdotally this is pretty typical of the evolution of mo systems.
Some heuristic/standard algorithm works pretty well, but people see cases it didn't catch and think it could be better. An engineer/scientist takes a look and realizes the extra features required need a more complex/ml algorithm to add support for. Years later a model has made it to production, but there are now complaints about too lenient matching.
In my experience, ML works best in scenarios where there is either such immense data volume that .2% improvement is a real benefit (these are rare) or the very notion of a heuristic method simply wouldn’t work.
A guy I know got banned this week because he posted a cute kitten that looked like it was suffocating someone with the icanhazcheezeburger like caption of "i kil you"
Banned for 24 hours, lol. Anyway. I think that is within bounds.
Hey, I am an editor at The Gradient and host of this interview! As this is only episode 6 the podcast domain is pretty new to us^, so would definitely welcome feedback on question choice or my style as an interviewer. We tried focusing much more on research than other interviews out there such as Lex Fridman's, would be curious to hear if you think that worked well.
^(we've existed as a digital publication focused on AI for way longer, see thegradient.pub if you're curious.)
You don't seem to include a transcript here. Seems like a serious flaw (I personally prefer transcripts to audio but it's actually an accessibility issue for some people).
This Twitter AI lobbyist phenomenon is so weird. When did it become normal to give any attention to these "intellectual" types who seemingly don't do AI work but love to rave about it on Twitter with non-technical arguments?
LeCun just got rhetorically beat here because he's not a fucking rhetorician/lobbyist/politician like all "AI ethics" person.
Just FYI, as I said in another comment in this case "many (senior) researchers agreed with Gebru's opposition to LeCun's original point - see tweets by Charles Isbel, yoavgo, Dirk Hovy embedded here https://thegradient.pub/pulse-lessons/ under 'On the Source of Bias in Machine Learning Systems' (warning - it takes a while to load). There was a civil back-and-forth between him and these other researchers as you can see in that post, so it was a point worth discussing. Gebru mostly did not participate in this beyond her initial tweets as far as I remember."
Also, for anyone who thinks Gebru does not have real credentials as an AI researcher, she did get her doctorate at Fei-Fei Li's lab at Stanford (which is now the lab I am part of), and did research at both Microsoft and Google and has co-authored papers with a variety of other well respected researchers. Just sayin', she's obviously not as senior as Lecun but let's not caricature her as if she has no expertise in this area.
Thanks for sharing this article. Can someone knowledgeable about this issue explain why this is not a data issue? I have read people claiming that ML researchers may bring their own biases into the models but I haven't seen any concrete example of that. Even in the Twitter exchange in this article, Gebru doesn't explain how this is not just data bias. She just throws a lot of insults at LeCun but anyone can do that, right? I would have loved to see her explanation as she is the expert in this area.
Well, technically, the way you choose the algorithm and set the hyper-parameters can influence accuracy in a non-uniform way over the distribution of data, introducing additional bias.
Bias can be introduced at every step in the ML pipeline, from data collection to labeling, model selection, training, evaluation, deployment and monitoring. Not to mention the complex effects of feedback when you collect data that has been influenced by biases in a prior iteration of the model.
That said, I am disappointed in Gebru, I read her "Gender Shades" paper where she completely ignored the existence of the Asian race. That's a pretty big limitation for a paper named in such general terms. In another place she labels Asians as "white adjacent" [1]. That doesn't seem to be ethical behavior for an ethics researcher. She also refused to debate with Yann LeCun on the topic of bias, instead sent him to reeducate himself [2].
Gebru has been a terrible representative for AI ethics all around, but this was inevitable.
"AI Ethics" is a pop science topic based on fantasies derived from the Terminator movie series.[1] When the foundation of the entire field is shaky it attracts careerist opportunists who have found an easy way to score scientific reputation points using shoddy or non-existent science. Google doubled down on this to build up a facade of an ethical company (guys, lets pretend project Dragonfly doesn't exist) and appointed the loudest person they could find in this pseudo science group.
The inevitable happened - inevitable if you are familiar with Gebru's forced confrontations with LeCun. She publishes a bogus paper about the carbon footprint of training AI models, comparing it with jet exhaust. A paper more focused on rhetoric than substance. Completely ignoring that Google is carbon neutral and AI chips can run on solar power - just like a Tesla car. No reference is made to how these language models enable cross-language communication, eliminate driving around in circles in a foreign country because you don't speak the language and end up saving CO2. Google, by any measure, saves CO2 emissions for the user.
Instead we have a dummy paper pretending that Google uses jet fuel for a one time training of an AI model. How low can "scientific research" fall.
[1] Certainly, there needs to be work done to remove bias from public datasets; but there is not much to be done beyond this. NLP programs having great models for English and poor ones for Swahili is, frankly speaking, not a problem for ML algorithm researchers to solve. There are one-shot learning approaches but these are not motivated by the desire to eliminate bias.
I think Gebru is serving two masters. She wants societal change based on critical social justice and at the same time she publishes papers on AI ethics. A conflict of interests, you can't be an unbiased scientist while being a die hard activist at the same time.
CJ ideology says to avoid debating with people of the "oppressing class" in order to not give them a platform, or because language itself is biased, or because the opponents are believed to act in bad faith, while the scientific method requires unbiased, impersonal, open debate. See the quote about "master's tools will never dismantle the master's house" for reference. They also practice cancelling their opponents instead of debating, which Gebru and her cohort tried to do on Twitter.
> That said, I am disappointed in Gebru, I read her "Gender Shades" paper where she completely ignored the existence of the Asian race. That's a pretty big limitation for a paper named in such general terms. In another place she labels Asians as "white adjacent"
You are scratching the surface of a much bigger racial divide in the Bay Area. [0] [1]
> That doesn't seem to be ethical behavior for an ethics researcher. She also refused to debate with Yann LeCun on the topic of bias, instead sent him to reeducate himself
I don't think she had the technical background required to really argue with LeCun.
Thanks for the reply. I’m sorry I still don’t see an example of the bias due to hyperparameter selection or tuning. Or model selection. The bias due to data is pretty clear to me and I see so many examples in the wild.
More concretely, say you train a gender detection model on people of all races. The way you do image preprocessing might influence results because black people faces have less contrast. The way you pick your regularisation could influence how well your model works on outliers (L1 reg tends to be less sensitive than L2). The way you set class weights could influence the trade-off between accuracy on different classes.
Other algorithmic choices influence if the model does well on longer or shorter sequences (transformer vs LSTM). This might not seem socially biased, but if your classes correlate with input size, then the length bias might convert into class bias.
I think math itself is socially unbiased (duh!), but models could amplify preexisting dataset biases when they correlate with some quality of the data such as its volume, resolution, contrast or complexity.
I don't know how folks can be aware of how the exchange went down and say that it was a "successful" "grievance strategy". LeCunn wasn't necessarily in the right here, and it wasn't only Gebru's twitter followers going on the offensive.
Well LeCunn quit Twitter, so it is "one down". That is what I meant by successful. And Gebru's "arguments" weren't even arguments, just "whatever you say is wrong because you are white and don't recognise our special grievances".
I personally agree with what he said when he said it is a difference between a research project and a commercial product. No actual harm was done when the AI completed Obama's image into a white person. You could just laugh about it and move on.
* LeCun did not really quit Twitter, he's still active on there and has been for a while - but I guess he did temporarily when all this happened.
* many researchers agreed with Gebru's opposition to LeCun's original point - see tweets by Charles Isbel, yoavgo, Dirk Hovy embedded here https://thegradient.pub/pulse-lessons/ under 'On the Source of Bias in Machine Learning Systems' (warning - it takes a while to load). There was a civil back-and-forth between him and these other researchers as you can see in that post, so it was a point worth discussing. Gebru mostly did not participate in this beyond her initial tweets as far as I remember.
* Lecun got into more heat when he posted a long set of tweets to Gebru which to many seemed like he was lecturing her on her subject of expertise aka 'mansplaining'. I am sure many would see that as nonsense, but afaik many people making that point was the cause of quitting twitter.
Thanks for the further background information. I have to say it doesn't really make it better for me. The "angry people" are of course correct that you can also create bias in other ways than data sets. But are they implying that people generally deliberately introduce such biases to uphold discrimination? That seems like a very serious and offensive claim to make, and not very helpful either.
The whole way to think about issues is backwards in my opinion. I would think usually when you train some algorithm, you tune and experiment until it roughly does what it wants you to do. I don't think anybody starts out by saying "let's use the L2 loss function so that everybody starts white". They'll start with some loss function, and if the results are not as good as they hope, they'll try another one. In fact the usual approach will lead back to issues with the data set, because that is what people will test and tweak their algorithms with. If the dataset doesn't contain "problematic" cases, they won't be detected.
But overall, such misclassifications are simply "bugs" that should get a ticket and be fixed, not trigger huge debates. I think it is toxic to try to frame everything as an issue of race.
> Thanks for the further background information. I have to say it doesn't really make it better for me. The "angry people" are of course correct that you can also create bias in other ways than data sets. But are they implying that people generally deliberately introduce such biases to uphold discrimination? That seems like a very serious and offensive claim to make, and not very helpful either.
The humans who ultimately validate the model (and who decide on the dataset) are a hyperparameter. Often ignored, yes, but they are still part of the training loop. They decide what the other hyperparams are, when to stop training and publish, etc.
To use a question I've asked on HN before: say you're training a model to detect criminality based on facial structure. This has come up as a real world example, papers have been published on this topic. What does a "good" dataset look like? Or similarly, for a system that decides on bail or sentence length. Do you use historical data on bail or sentencing? We have very well documented examples of bias in both of those things, even in the ground truth. So how do you decide to mitigate that bias? Or do you choose not to, and to continue enforcing said biases in your model?
> But overall, such misclassifications are simply "bugs" that should get a ticket and be fixed, not trigger huge debates
But when such "bugs" aren't prioritized because people don't think they are bugs, you have to debate whether or not they are bugs at all! The hyperparameter here is "who decides what is or isn't a bug"
"say you're training a model to detect criminality based on facial structure. This has come up as a real world example, papers have been published on this topic. What does a "good" dataset look like?"
I don't think anybody who is respected says "here is this data set of criminals, we have trained the algorithm on it, and therefore it is proven that such and such facial features predict criminality". I mean yeah this mistake has been made over and over again (even before the invention of computers), but it has long been debunked.
Also presumably "black skin" is a good predictor for criminality - in the current day, the crime rate is higher for black people. The algorithm only detects that, it doesn't interpret it. It is up to the humans who use the algorithm to interpret it. If you interpret it as "black people have a genetic disposition to criminality", you are wrong. But it wouldn't be the fault of the algorithm. What is insanity, but basically what the "AI ethics" people demand, is to tweak the algorithms to make them pretend the prevalence of criminality is not higher in certain populations.
"But when such "bugs" aren't prioritized because people don't think they are bugs, you have to debate whether or not they are bugs at all!"
Nobody says they are not bugs. You are creating an imaginary problem here. You really think, say, researchers at Amazon said "let's make it so that women are ranked down by the algorithm"? Likewise I don't think anybody says "the algorithm should rank black people worse for crime".
It is also not a novel idea to look out for bias int he algorithms, delivered to us by the woke crowd. The whole field is about treating bias - a machine learning algorithm is all about training some bias.
> Nobody says they are not bugs. You are creating an imaginary problem here. You really think, say, researchers at Amazon said "let's make it so that women are ranked down by the algorithm"? Likewise I don't think anybody says "the algorithm should rank black people worse for crime".
They did though, at least until Gebru and those like her came along and forced the issue.
It's really sad to see people say that this was never a concern as though bias and ethics were taken seriously by the field as a whole more than, say, 5 years ago. They weren't. Idk if you're new to the field or weren't paying attention, but it just wasn't a thing. Like most of the foundational papers in terms of racial misclassification and such are from 2017 and 2018.[2] It's more recent than...GANs or AlphaZero. Not to mention that there's attempts to publish garbage like this[1] every year!
> You really think, say, researchers at Amazon said "let's make it so that women are ranked down by the algorithm"? Likewise I don't think anybody says "the algorithm should rank black people worse for crime".
No, I already said this. Someone failing to notice a bug isn't malice. But there issue is that no one even considered that these kinds of things were bugs so they didn't get noticed or researched.
> The whole field is about treating bias - a machine learning algorithm is all about training some bias.
Yes, but thinking about race as a particular category where we should avoid unintended bias (and indeed prefer generalization across categories) was a novel idea when proposed by those ethicists!
> But it wouldn't be the fault of the algorithm. What is insanity, but basically what the "AI ethics" people demand, is to tweak the algorithms to make them pretend the prevalence of criminality is not higher in certain populations.
But...you're making the algorithm. If your goal is to build a model that tries to detect "racial criminality", I'm going to suggest that you probably are doing something racist, because there isn't really a useful, non-racist, reason to train a model that incorrectly classifies people as criminal based on their skin color.
On the other hand, if you're having to do additional interpretation of the model output, why aren't you integrating that additional interpretation into the model? And if you can't, then is the model even adding any value? Probably not.
And that's not even ignoring questions like what "prevalence of criminality". I think you mean "are arrested more often". We often think that that correlates with criminality, and for some crimes it may, but for e.g. drug crimes we know that it doesn't. The point is, if you don't have at least thoughtful answers to all of those questions and more, you have no business trying to do "criminality" prediction, because your algorithm is not doing whatever you think its doing.
[2]: Seriously, Gender Shades is 2018, Debiasing word embeddings is 2016 which I think is the earliest you could argue people were taking this stuff seriously, and it cites "Unequal Representation and Gender Stereotypes in Image Search Results for Occupations" from 2015, which is kind of it.
> Lecun got into more heat when he posted a long set of tweets to Gebru which to many seemed like he was lecturing her on her subject of expertise aka 'mansplaining'.
Gebru is not a Woz-level wizard like LeCun but someone who worked at Apple as an engineer and did a PhD with Fei-Fei Li cannot be dismissed as “not technical.”
I had a phone screen shortly before the pandemic where I emphasized that I liked understanding and solving problems for people and didn't care what specific technology I used.
I got feedback from the recruiter that the company passed on me because I was not technical enough. They literally had asked me zero technical questions.
Not too long after, the company was in the news for a massive data breach.
Well, unless you were interviewing for a CTO type job, they were probably looking for an indication of what technologies you’re most proficient in. If you’re equally proficient in 20 programming languages, that proficiency level is, with high probability, pretty low.
Did he say he doesn't believe in the concept of AGI? What does that mean? He doesn't believe machine intelligence will ever surpass human intelligence? That's surprising.
Having been in the industry, I think a lot of the researchers at that level get tired of every layman jumping to discussion of AGI, so Ive heard some of them refuse to comment on it. (or have a prepped one liner) Who can blame them?
AGI at some point will become more about the semantics of the term. Does it mean an intelligence that can self improve? etc. Generally I think game worlds are likely to have the right conditions for the first emergence of primitive agi-precursors.
He means AGI is not a meaningful term, because 'general' is too ambiguous - ie, even human intelligence is not truly general. So it's a meaningless term according to him. I think he would not oppose a term such as 'superhuman AI'. He goes into this in some depth in lex fridman's interview of him, can check that out if you're curious.
This is not surprising, because virtually every AI expert (except Kurzweil) don't think AGI is even a remote possibility in this century.
The only "AI experts" who believe that AGI is something to worry about are self appointed AI experts - Elon Musk, Sam Harris, Sam Altman etc. Zuckerberg had actually set up a dinner with Musk and LeCun to explain to him that AGI is not a realistic possibility. This led to Musk stating that Zuckerberg had a very poor understanding of AI. Looks like Musk is the self appointed AI expert in that table of 3!
Oh BTW, Musk, if you are reading this - how is the FSD project coming along?
Note: Scientists working for Open AI will pretend that AGI is a real possibility because the folks funding Open AI have a predilection for AGI.
The notion of anyone being an "expert" on AGI is rather silly considering that we have no AGI and don't even have a plausible path to build one. It's sort of like being an expert on warp drives or time machines.
Do you consider gwern not to be an AI expert? I think this is a reasonably technical explanation of how we may not be as far away from AGI as some choose to believe: https://www.gwern.net/Scaling-hypothesis
Nah, he’s more of a hobbyist in AI. I don’t necessarily think you need to be explicitly in academia to produce good academic work (independent researchers do exist), but he hasn’t really produce anything (in terms of actual theoretical/experimental results) that could be regarded as a substantial contribution to the field. He’s made some anime datasets though, maybe it could be useful to some.
I often wonder if being too close to the experimental results and current techniques blinds some people from seeing the bigger picture. Gwern certainly seems good at analyzing large-scale trends in AI research, perhaps broad high-level knowledge of the field is advantageous for this vs deep understanding of specific neural network details.
I trust Hinton and LeCun who have actually created the trends and bigger picture and invented the now "current" techniques over commentators who show up years later and offer shallow non peer reviewed analysis and predictions.
Reminds me of this talk by the failed tech startup's non technical CEO who apparently was able to see the bigger picture that scientist aspergers needs weren't able to see.
I am not sure who this person is. How much has he published in ML and what is his h-index.
As examples, Le Cun has a h-index of 127
Hinton 164
Goodfellow 75 etc. None of them consider AGI to be a realistic possibility this century. Kurzweil, the AGI proponent, has an hindex of 22.
I don't recall any such thing. His papers were peer reviewed and published in science journals by the scientific community. Minkowski famously followed up with the concept of spacetime as a consequence of relativity.
This poll covers a group of the "100 Top authors in academic AI research according to Microsoft Academic search" and shows a mean "50% confidence that High level machine intelligence will exist" date of 2074. Clearly at least some AI researchers think it's a remote possibility this century.
I am no one. I have greatest respect for all Turing award winners.
But one thing I am wary is that LeCun - while special and excellent - is just as many others, working at a place where "AI" is already used to "engage people up" - it is just the nature of the business if you are in the engagement business. And your "AI" will gladly help you in all kind of subtle ways. What is also nice is that it's uncharted territory now, so you can freely roam - and engage the heck out of your audience.
And LeCun is just - as a "neutral" scientist - just doing his part.
IDK, I get it. Grad school sucks. Post-docs suck. Pre-tenure sucks. Post-tenure isn't any better. For that entire period of time you are working on de facto fixed term contracts. Which is extremely uncommon among salaried engineers, and those that take these sorts of contingent employment contracts are typically paid quite well. It's like 10-15 years of low pay, "will I have a job next year?" stress, and moving your family around all the time (or, more commonly, just not starting a family).
And not even for good pay. These days, even after a decade or more of experience, you're making less than your undergrads. Half as much or even less in some cases.
So, your undergrad once-peers start retiring -- or at least thinking about it -- around the time that you're finally transitioning from de facto fixed-term positions to something resembling a normal employment contract, but, again, for a third to a fifth of what you'd be making in industry at that point in your career.
So, yeah, people say fuck it and cash in on influence/engagement/reputation where they can. The only real alternative is the public sector paying researchers better, but that's never going to happen.
> It's like 10-15 years of low pay, "will I have a job next year?" stress, and moving your family around all the time (or, more commonly, just not starting a family).
While true for many/most PhDs going the academic route, I somehow don't think this held true for Yann LeCun. None of this answers the question of "Why FB?" - he could easily make a slightly smaller boatload of money if he chose to go somewhere else.
IMO it's similar to why Hinton works for Google - this gives him huge resources (data, compute, money to pay researchers) to do research with, unlike anything to be found in academia. Perhaps this is a naive view, but this is a guy who spent decades pushing for a direction in AI that was not popular but which he really believed in, so it seems natural he would want to accept resources to further research in that direction. Of course, he's also been public about thinking Facebook does more good than bad for the world, in his view.
Also, TBH I doubt he has much to do with the AI used for engagement optimization, his specialty is in other topics and he seems to be focused on the work of Facebook AI Research (which openly engages with academia and so on). And to be fair he is also still a professor at NYU and has students there.
Because Facebook makes the most money, and probably offered him the most.
Same reason why back in the day, a lot of people got electrical engineering degrees but went into software development or finance. The skills were transferable, and the pay was a lot higher.
We hope to produce polished transcripts in the future, but have yet to figure out the best way.
A quick summary is:
* First ~15 minutes is intro + discussion of Yann's early days of research in the 80s
* Minutes 15-~45 cover several notable recent works in self-supervised learning for computer vision (including SimCLR, SWAV, SEER, simsiam, barlow twins).
* Final ~15 minutes are discussion of empirical vs theoretical research in AI, how AI is used at Facebook, and whether there will be another AI winter.
I've heard great things about Descript. It's not free (aside from a limited trial), but apparently it makes it really easy to get good transcripts, and also allows you to clean up the audio as well.
I wonder if you could post the transcript to a git repo and allow corrections via pull request. Auto-captioning is a great first step to get phrases set to time-codes, and then open it up to the community for corrections and translations.
Not as well as I would have expected tbh - see the trint link above, it's pretty good but there are lots of errors, so correcting it is quite time consuming.
What I'm wondering is about false positive detection, which wasn't mentioned, and how much of this incredible decrease in false negatives came at the expense of an increase in false positives.