By Gordon Hull
As a criterion for algorithmic assessment, “fairness” has encountered numerous problems. Many of these emerged in the wake of ProPublica’s argument that Broward County’s pretrial detention system, COMPAS, was unfair to black suspects. To recall: In 2016, ProPublica published an investigation piece criticizing Broward County, Florida’s use of a software program called COMPAS in its pretrial detention system. COMPAS produced a recidivism risk score for each suspect, which could then be used in deciding whether someone should be detained prior to their trial. ProPublica’s investigation found that, among suspects that did not have a rearrest prior to their trial, black suspects were much more likely to have been rated as “high risk” for rearrest than white suspects. Conversely, among suspects who were arrested a second time, white suspects were more likely to have been labeled “low risk” than black ones. The system thus appeared to be discriminating against black suspects. The story led to an extensive debate (for an accessible summary with cites, see Ben Green’s discussion here) over how fairness should be understood in a machine learning context.
The debate basically showed that ProPublica focused on outcomes and demonstrated that the system failed to achieve separation fairness, which is met when all groups subject to the algorithm’s decisions receive the same false negative/positive rate. The system failed because “high-risk” black suspects were much more likely than white to be false positives. In response, the software vendor argued that the system made fair predictions because among those classified in the same way (high or low risk), both racial groups exhibited the predicted outcome at the same rate. In other words, among those classified as “high risk,” there was no racial difference in how likely they were to actually be rearrested. The algorithm thus satisfied the criterion of sufficiency fairness. In the ensuing debate, computer scientists arrived at a proof that, except in very limited cases, it was impossible to simultaneously satisfy both separation and sufficiency fairness criteria.
In the meantime, on the philosophy side, Brian Hedden has argued that a provably fair algorithm could nonetheless be shown to potentially violate 11 of 12 possible fairness conditions. In a response piece, Benjamin Eva showed the limits of the twelfth with a different test and proposed a new criterion:
Continue reading "Base Rate Tracking often can’t fix algorithmic fairness" »
Recent Comments