That is difficult to do because there are many things that correlate with race. So even if you remove race explicitly, things like neighborhoods, level of education and such end up being proxies. Removing race from the model is a very hard problem.
One example: Credit card issuers have been known to take into account places that you have made purchases in assessing risk. If you shop at businesses that are primarily frequented by minorities, odds are you'll get a lower credit limit than if you didn't.
I guess the question is why would that be germane to issuing credit? Is it correlated to something that indicated actual credit risk? If it makes erroneous assumptions correlations, would it not behoove them to eliminate the contamination?
Sure: some of your purchase history definitely indicates behaviors that indicate you're a credit risk. Some of your purchase history indicates you're likely black (and if you're black, your credit risk is, empirically, a little higher-- even if it is not fair to not issue credit to people because they're black).
ML will figure out both but can't explain the hidden variables: it'll figure out behaviors that directly indicate higher risk, and behaviors that indicate you're a minority that tends to be a higher risk.
By the way many methods are excellent at explaining their decisions. The favorite is decision trees which break the decision up into discrete decisions for individual features. E.g. a person was put in class 1 because their income was above X, their savings was above Y, etc. Other "shallow" methods like regression are fairly easy to interpret as well.
It is mainly deep learning for which this is difficult. Which I'm sure is the method in the twitter discussion, but that is a different kind of problem where interpretability doesn't have a clear use.
Basing your entire decision on something as simple as the applicant's income will presumably produce outcomes that differ with race. Is that what you mean by racist? Or do you mean the system identifies some individuals as high credit risk when they aren't and may actually lose money versus a more colorblind algorithm?
Many people would say that identifying people who spend a lot of money at bars or casinos as a credit risk isn't racist, even if it happens to pick up more minorities. The mechanism for the credit risk and behavior seems tightly correlated.
Many people would also say that identifying people who spend some money at clothing retailers that market to minorities as a credit risk is racist. Here, the relationship seems like a hidden way to spot minorities, who happen to be more of a credit risk.
When banks drew red lines around all the minority neighborhoods and didn't lend to people there, because they couldn't consider race anymore, most thought this behavior unacceptable, even though it did genuinely reduce credit risk for banks. ML can't explain its rationale, and very often does the exact same thing inside an opaque box.
Presumably black people who are great loan customers would be incorrectly refused by a system based on fashion or addresses alone, so your answer seems to be that yes, racist methods actually lose the bank money.
My tendency is to think that since it is optimized to maximize profit, the more features about a person we get in the data, the less "racist" the system becomes under this definition. Yes it's more possible to "hide" racism in a complex method using tons of features, but increasingly less likely it would happen. If we can use income and debt and whatever other things to make either a good classifier that ignores race, or an inferior one that sneakily identifies race and bases the decision on that, mathematical optimization of the model should result in the former. ML can be trusted more than humans in this situation, not less.
Of course there's still the issue of the data being biased, which is where it all started.
> Presumably black people who are great loan customers would be incorrectly refused by a system based on fashion or addresses alone,
Strawman. The question is whether predictions are improved by using race (or direct proxies for race), not whether it'd be wise to use proxies for race alone.
Loan default rates are higher for disadvantaged minorities, even after controlling for many, many other variables (income, neighborhood, level of education, etc). Therefore, using race (or inferring race) improves prediction quality, but is ethically dubious.
Can we skip this kind of snippy arguing please? Anyway I take it you agree the statement is true but don't want to.
So now you're saying those great minority loan customers are impossible to identify? I think you just need to figure out what information is still missing. What's the effect size at this point anyway?
If you don't start off by willfully mischaracterizing your opponent's argument in order to be able to more easily refute it, I think you'll find that they accuse you of this less.
I think you're just willfully missing the point.
Ideally, you'd just lend money to the people who will pay you back. Unfortunately, we can't predict this perfectly. Adding race, or proxies for race, to the things you consider improves your predictions somewhat.
Nope it was a completely honest question and what I considered a constructive line of thought. You should assume good faith. I asked about two competing definitions of what is racist and you seemed to prefer one over the other.
And what have I claimed to "easily refute" exactly? More like I ran with the definition, and considered how to address the problem as stated. I said more features were needed, I didn't say enough features were currently used. You keep pointing to a dichotomy between perfect and flawed, while I was talking about relative improvements. There isn't even necessarily a disagreement there.
I sez, "other variables + race makes better predictions of default risk than other variables alone". U sez, "LOL -only- using race will cost banks money" I sez, "???"
I wonder what a court would rule if you went through your algorithm line by line and demonstrated that is was only designed to maximize profit, and the fact that it was racist was only incidental.
I'm guessing it's illegal to discriminate based on race, gender or ability no matter how much it extra it costs a business, whether its deliberate or not.
Well in places like NYC the laws are strict regarding rental housing. Landlords need to establish consistent requirements then accept the first applicant that fits them. Typically they require the income to rent ratio is greater than three or something like that, which undoubtedly has a disparate effect when it comes to people of different races given how high the rents are and how incomes differ. Are you calling this incidental racism?
The issue is that "fashion for minorities" is correlated with being a minority which is correlated with poverty which is correlated with high credit risk.
This makes "buys fashion for minorities" quite an accurate predictor of high credit risk, but also a profoundly racist one.
Is it too entangled to say disregard anything that leads to a path that makes inferences based on demographics? Although I suppose age is also tough to avoid in assigning risk.
Almost anything you'd use is correlated to demographics somehow or another. We tend to think certain systems are "OK" and not too racist, because they have a well-articulated rationale independent of demographics. But ML is not very good at explaining rationale for decisions, so we don't have the benefit of that argument and it's difficult to even empirically understand how biased the system is.
Capital One has admitted doing the same, as well as a number of other lenders.
There's been analyses done since, that show that when controlled for all other variables in credit reports, credit issuers issue less credit to blacks.. e.g. Cohen-Cole, 2011, Credit Card Redlining, Review of Economics and Statistics
I said "One example: Credit card issuers have been known to take into account places that you have made purchases in assessing risk. If you shop at businesses that are primarily frequented by minorities, odds are you'll get a lower credit limit than if you didn't"
You asked for a source.
I provided a source of the same. Yes, things like purchase histories are used as proxies for race and used to deny credit to mostly people of color.
Then you go here:
> Credit Card applications do not ask for race.
which was never asserted.
> I suspect it’s more likely based on zip code or similar.
As for the latter, no, it’s not a proxy for race. It might be highly correlated with race, but that is very different. I don’t know why everyone does these gymnastics to find a racist angle to everything, but it’s nonsense and it’s very harmful to spread that narrative.
Nobody is searching for black zipcodes and using that as an input to deny credit. The goal is to increase revenue by giving as much credit as possible with the least risk possible, so it wouldn’t even make sense.
Scoring based on where you shop is still behavioral analysis. I think it’s a bad policy, but it isn’t targeting people of color. Your zip code affects insurance rates too, and that’s based on claims. It doesn’t cost more to insure your car in south side chicago than beverly hills because there are black people there, it costs more because there are more claims. The same effect is seen in “white” zip codes that are in areas with severe winters.
> I don’t know why everyone does these gymnastics to find a racist angle to everything, but it’s nonsense and it’s very harmful to spread that narrative.
Black people default on loans more, adjusted for income, education status, etc. But we've decided it's socially un-okay to ask people if they're black and adjust how much we're willing to loan based on the answer.
But instead, we measure all kinds of ancillary things, many of which are highly correlated with being black and not obviously correlated to ability to repay, and dump them into a model, where we come up with weights that make basically the same decisions as if we'd asked if they were black. Is this is fundamentally better somehow?
Mind you, I don't have a wonderful answer as to what to do: it's important to accurately price credit risk. But it's also important for society to treat minorities equitably, especially when inequitable treatment might reinforce the very problems/reasons why some minorities are worse credit risks.
It might be better to explicitly have the "is African American?" question in the model, because then regulators, policymakers, the public could perhaps eventually know the exact contribution of this factor... while approaching it obliquely makes the effect far less clear.
It's pretty hard to make the system "blind" to race. If you blind a system predicting recidivism to race, but it is able to see a lot of things that are correlates of race... it can end up being racist anyways.
E.g. for a non-ML example -- redlining is theoretically "blind" to race, but makes extremely racist decisions.
It's really, really hard to erase biases that are deeply systemic.
For example, from the outset, would you object to an AI that made decisions on how harshly to sentence someone based on age & number of prior crimes?
What about hiring or admitting people to college based on the results of an IQ test?
Yeah, those end up being badly biased, this first has been studied and the second is the reason for dropping SAT/ACT scores--every mental ability test correlates with IQ.
The more interesting thing, IMHO, is that the opposite is also helpful. For example, if you help all poor people equally, you help even the playing field by disproportionately helping out all disadvantaged groups.
Application of law should always have room for judicial discretion. Mandatory minimum sentences should not exist, but then you get outrage over some light sentence here people get all riled over about and want to impeach judges and all.
Education is different. IQ shouldn’t be relevant but competency and aptitude should and in addition we should recognize some people are better off going to vocational school where they may do better economically during their lifetimes.
The first one was intentionally minimalist in order to see how little it took for racial bias to be reconstructed.
For the second, any mental ability test correlates with IQ, so you end up with a different set of difficulties. There have been attempts to, e.g. correct for cultural bias in the tests, but these actually made the problems worse.
I'm not presently aware of anything that makes that situation better.