This article seems to misrepresent a number of important issues, and as a result...

makomk · on Jan 5, 2020

No, they're not misrepresenting this at all. ProPublica's article https://www.propublica.org/article/machine-bias-risk-assessm... was pushing the claim that somehow, the COMPAS black box was implicitly deducing defendants' race from the "137 questions" input to it and labelling them likely reoffenders based on it in a way that was indepdendent of the key factors known to affect reoffending rates, such as age and gender. The paper in question seems to demonstrate the exact opposite is true: ProPublica were inadvertently using race as an imperfect proxy for age at sentencing, which is what the COMPAS algorithm really cared about, because their attempt to control for age at sentencing didn't work. After controlling for the actual age factor, COMPAS results didn't have any relationship to race anymore. (It's not even a weird weighting factor: predicted reoffending risk falls off rapidly with increasing age at first, then more slowly in a smooth fashion. It's just not linear.)

Now, it's of course possible to argue that judging reoffending risk based on age is in fact unfair and racist because it has disproportionate impact on certain racial groups, even though it's strongly predictive across all racial groups. That's not the argument ProPublica made, though. Their argument was about the supposed perils of black boxes, and they kind of acknowledged that age probably wasn't a racist criteria - or at least that it would be a lot harder to justify calling it one - by attempting to strip out its effects in the first place. It's also a different kind of argument entirely, one that revolves not around whether the algorithm is somehow treating people differently based on their detected race - because it isn't - but around what it means for a decision like this to be fair in the first place.

jph00 · on Jan 5, 2020

Whether a variable is latent or explicit isn't really relevant to the question of algorithmic fairness.

The link I provided gives the actual details of the method and findings; this is probably a more useful source for the details. The claim that the actual source of the difference is 'age' doesn't really make sense. There isn't enough of a difference in the number of young people between black and white populations to result in the differences found in the analysis.

(I do agree that the actual attempt to control for age was poorly done; it really shouldn't have been done at all, since it had nothing useful to add to the analysis or results.)

PS: It's 'COMPAS', not 'COMPASS'.