This article seems to misrepresent a number of important issues, and as a result significantly overstates their claims. I'll pick just one illustrative (but important) example:
> "For instance, when ProPublica journalists tried to explain what was in the proprietary COMPAS model for recidivism prediction, they seem to have mistakenly assumed that if one could create a linear model that approximated COMPAS and depended on race, age, and criminal history, that COMPAS itself must depend on race. However, when one approximates COMPAS using a nonlinear model, the explicit dependence on race vanishes, leaving dependence on race only through age and criminal history. This is an example of how an incorrect explanation of a black box can spiral out of control."
The concern about the strong relationship between race and COMPAS predictions is not largely based on a concern about whether there is an explicit dependence in the model. The concern is whether there's a relationship either explicitly or implicitly. And in particular, whether such a relationship results in unfair outcomes. The findings of the ProPublica study (https://www.propublica.org/article/how-we-analyzed-the-compa...) strongly suggested this was the case:
"- Black defendants were often predicted to be at a higher risk of recidivism than they actually were. Our analysis found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent).
- White defendants were often predicted to be less risky than they were. Our analysis found that white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent).
- The analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 45 percent more likely to be assigned higher risk scores than white defendants."
I understand the desire of the MIT researchers to promote the value of their work, but in this case they appear to be doing so in a potentially damaging way.
No, they're not misrepresenting this at all. ProPublica's article https://www.propublica.org/article/machine-bias-risk-assessm... was pushing the claim that somehow, the COMPAS black box was implicitly deducing defendants' race from the "137 questions" input to it and labelling them likely reoffenders based on it in a way that was indepdendent of the key factors known to affect reoffending rates, such as age and gender. The paper in question seems to demonstrate the exact opposite is true: ProPublica were inadvertently using race as an imperfect proxy for age at sentencing, which is what the COMPAS algorithm really cared about, because their attempt to control for age at sentencing didn't work. After controlling for the actual age factor, COMPAS results didn't have any relationship to race anymore. (It's not even a weird weighting factor: predicted reoffending risk falls off rapidly with increasing age at first, then more slowly in a smooth fashion. It's just not linear.)
Now, it's of course possible to argue that judging reoffending risk based on age is in fact unfair and racist because it has disproportionate impact on certain racial groups, even though it's strongly predictive across all racial groups. That's not the argument ProPublica made, though. Their argument was about the supposed perils of black boxes, and they kind of acknowledged that age probably wasn't a racist criteria - or at least that it would be a lot harder to justify calling it one - by attempting to strip out its effects in the first place. It's also a different kind of argument entirely, one that revolves not around whether the algorithm is somehow treating people differently based on their detected race - because it isn't - but around what it means for a decision like this to be fair in the first place.
Whether a variable is latent or explicit isn't really relevant to the question of algorithmic fairness.
The link I provided gives the actual details of the method and findings; this is probably a more useful source for the details. The claim that the actual source of the difference is 'age' doesn't really make sense. There isn't enough of a difference in the number of young people between black and white populations to result in the differences found in the analysis.
(I do agree that the actual attempt to control for age was poorly done; it really shouldn't have been done at all, since it had nothing useful to add to the analysis or results.)
> "For instance, when ProPublica journalists tried to explain what was in the proprietary COMPAS model for recidivism prediction, they seem to have mistakenly assumed that if one could create a linear model that approximated COMPAS and depended on race, age, and criminal history, that COMPAS itself must depend on race. However, when one approximates COMPAS using a nonlinear model, the explicit dependence on race vanishes, leaving dependence on race only through age and criminal history. This is an example of how an incorrect explanation of a black box can spiral out of control."
The concern about the strong relationship between race and COMPAS predictions is not largely based on a concern about whether there is an explicit dependence in the model. The concern is whether there's a relationship either explicitly or implicitly. And in particular, whether such a relationship results in unfair outcomes. The findings of the ProPublica study (https://www.propublica.org/article/how-we-analyzed-the-compa...) strongly suggested this was the case:
"- Black defendants were often predicted to be at a higher risk of recidivism than they actually were. Our analysis found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent).
- White defendants were often predicted to be less risky than they were. Our analysis found that white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent).
- The analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 45 percent more likely to be assigned higher risk scores than white defendants."
I understand the desire of the MIT researchers to promote the value of their work, but in this case they appear to be doing so in a potentially damaging way.