By Gordon Hull
AI (machine learning) and people reach conclusions in different ways. This basic point has ramifications across the board, as plenty of people have said. I’m increasingly convinced that the gap between how legal reasoning works and how ML works is a good place both to tease out the differences, and to think about what’s at stake in them (I’ve made a couple of forays into this, here and here). One good reason for this is that legal theory is full of good examples, which can function as rules in the old-fashioned sense of paradigms, as described by Lorraine Daston in her fascinating historical account. This use of examples and cases is deeply entwined with legal reasoning; as Daston notes, “common law, casuistry, and the liberal and mechanical arts all came to inhabit the territory staked out by the Roman libri regularum. They all depended on rules that got things done in the world, consistent with but not deduced from higher principles and often in the form of an example extended by analogy” (Rules, 30)
Another is that one of those examples, the blue bus, has been an enduring focal point for the difference between the two. The scenario stipulates that an accident has happened involving a bus, during a time and at a location where 80% of the buses present belong to the blue bus company. Do we assign liability to the bus company? Work by David Enoch and Talia Fisher notes that most people prefer a 70%-reliable witness to the information that a bus company owns 70% of the buses in an area, and theorizes that the reason is that we prefer the available counterfactual in case the eyewitness is wrong. There is presumably some process that explains her incorrect identification of the bus, whereas the statistical evidence is what it is. In this case, we expect a specific failure rate.
The connection to questions of algorithmic fairness is obvious: if the algorithm says you are “80% likely to default on your loan,” that means it’s put you in a reference class of individuals based on some determination that members of that class are 80% likely to default. Most of us have the intuition that losing a loan on that basis is likely unfair, but of course the theoretical details are on why that is the case matter a lot. Here I want to go back to the original presentation of the blue bus case, which derives from a 1971 article by Lawrence Tribe on the problems of admitting statistical evidence into court. The blue bus is just one of his examples, and he mounts a number of objections to such a move. I will cite two of them here because they seem particularly relevant.
1. Reference Class. The California Supreme Court said in a 1968 case that “[m]athematics, a veritable sorcerer in our computerized society, while assisting the trier of fact in the search for truth, must not [be allowed to] cast a spell over him.” (People v Collins, 68 Cal 2d 2319, 320, 439; qt in Tribe, 1334). The case in question involved the conviction of an interracial couple of a crime. After identifying a half dozen characteristics that this couple had, arguing that they were independent (so there is only a 25% chance of having both of two characteristics that each possessed by 50% of the population), the prosecutor arrived at the conclusion that there was a one in 12 million chance that a given couple would match all 6 attributes of the couple identified by eyewitnesses. Guilty! The California Supreme Court pointed to a number of problems here, such as the undefended claims about the prevalence of the various characteristics identified and the claim that they were independent of one another (suppose most Black men at the time had beards; being Black and bearded then wouldn’t be independent in the relevant sense). But the biggest problem is that:
“The prosecutor erroneously equated the probability that a randomly chosen couple would possess the incriminating characteristics with the probability that any given couple possessing those characteristics would be innocent. After all, if the suspect population contained, for example, twenty-four million couples, and if there were a probability of one in twelve million that a couple chosen at random from the suspect population would possess the six characteristics in question, then one could well expect to find two such couples in the suspect population, and there would be a probability of approximately one in two - not one in twelve million - that any given couple possessing the six characteristics would be innocent” (1336)
We can recognize this problem from in the sorts of big data blacklisting that showed up in popular and legal thinking during the post-9/11 war on terror. All of the hijackers were Arab, which was used to justify a lot of racial profiling. But of course it asked exactly the wrong question: not how many hijackers were Arab, but how many Arabs were hijackers? The prosecutor makes the same kind of mistake, asking “how many criminals have this attribute” instead of the correct question, “how many people who have this attribute are criminals?”
2. Absurd results. This one will resonate with anyone who has been denied something they want by the blockheaded application of an algorithm. We know that an 80% accurate prediction will be wrong 20% of the time. How do we use such a prediction in decision-making? A lot of decisions are binary: incarcerate the person, or don’t. Hire them, or don’t. Issue a loan, or don’t. For those decisions, you have to establish some sort of threshold value above which the prediction will be taken as true. The law deals with this sort of thing all the time: it may mean a lot of different things, but “beyond a reasonable doubt” is clearly not met by an 80%-reliable metric. So if all you have is an 80% reliable eyewitness, you should vote to acquit. Similarly, a lot of civil issues are tried on a “preponderance of the evidence” basis, which can be interpreted to mean a subjective probability of more than .5. Tribe shows why this generates an absurd result when applied to the blue bus case. Suppose the plaintiff introduced the statistical evidence and nothing else:
“Unless there is a satisfactory explanation for the plaintiff's singular failure to do more than present this sort of general statistical evidence, we might well rationally arrive, once the trial is over, at a subjective probability of less than .5 that defendant's bus was really involved in the specific case. And in any event, absent satisfactory explanation, there are compelling reasons of policy to treat the subjective probability as less than .5 or simply as insufficient to support a verdict for plaintiff. To give less force to the plaintiff's evidentiary omission would eliminate any incentive for plaintiffs to do more than establish the background statistics. The upshot would be a regime in which the company owning four-fifths of the blue buses, however careful, would have to pay for five-fifths of all unexplained blue bus accidents – a result as inefficient as it is unfair” (1350).
One obvious takeaway is that acting on data has an inherently normative dimension - where you set that threshold will make an enormous difference in what happens next. As Tribe indicates, these are public policy reasons, broadly construed - which is to say that they're political (again, broadly construed). Dumb reliance on the statistics not only doesn't engage the sort of causal reasoning on which law typically relies, it actively discourages its production. Not only that, it dumbly translates risk above a certain threshold into liability. Finally, it assumes that all of the buses in the area pose an equal risk - that if 80% of the buses are owned by the same company, that means that the company accounts for 80% of the risk. I’ll have more to say on this last point next time…
Recent Comments