By **Gordon Hull**

Last time, I looked at the Lawrence Tribe article that was the original source of the blue bus thought experiment. Tribe’s article is notable for its defense of legal reasoning and processes against the introduction of statistical evidence in trials. He particularly emphasizes the need for legal proceedings to advance causal accounts, and of the numerous background conditions that legal reasoning seeks (and that statistical evidence tends to ignore). The tendency to fetishize numeric results also tended to generate absurd results.

Here I want to focus on a recent paper by Chad Lee-Stronach that centers on the blue bus argument, and that formally explains what I take to be the intuition behind a lot of Tribe’s examples, even if he directs them differently. Lee-Stronach begins with the “statistical proof paradox,” which he formulates roughly as follows:

- Probability threshold: any legal standard of proof is reducible to some threshold of probability t, such that the defendant should be found liable when the probability that they are liable exceeds t
- Statistical inference: merely statistical evidence can establish this
- Conclusion: a defendant can be found liable on the basis of merely statistical evidence

It’s a paradox because most of us don’t accept the conclusion. Lee-Stronach surveys the literature, and notes that most of it goes after Probability Threshold. Instead, he thinks, we’d be better off going after the second premise. Accordingly, he argues that the statistical inference premise can’t be met.

His discussion is substantial, and highly structured, so you should note that I’m emphasizing only a few parts of it here. Statistical Inference basically establishes a reference class (say, of buses in the area), establishes something about that reference class (that 80% of those buses are owned by Blue Bus, Inc.) and then uses that fact to establish that the threshold for liability has been met. The core problem is that for Statistical Inference to be straightforwardly true, all the members of the reference class would have to be equally likely to have committed the offense or to pose an equal risk of having done so. The problems with this are immediately obvious in the blue bus case: what legitimates the inference from market share to damage? That only makes sense if all the buses on the road at a given time create an equal risk of causing damage.

Whether this is true or not it isn’t something that the evidence provided picks up. Suppose that Blue Bus Inc. has a lot of safety training and only hires experienced drivers with impeccable records. We know that there’s other buses in the area. Suppose that half of them are owned by Rickety Bus, Inc., whose buses have a history of braking and steering problems. And that the rest of the non-Blue buses are owned by Intoxicated Bus, Inc., a company whose drivers have a history of failing sobriety tests. Now, we may know that most of the buses are owned by Blue Bus, but we most certainly do not know what percentage of the accident risk is posed by Blue Bus.

This is why Tribe says, as I noted last time, that “unless there is a satisfactory explanation for the plaintiff's singular failure to do more than present this sort of general statistical evidence, we might well rationally arrive, once the trial is over, at a subjective probability of less than .5 that defendant's bus was really involved in the specific case” (1350). At the very least, the determination of the actual probability that Blue Bus is liable is something that can only be determined at trial, with a weighing of factors other than the mere distribution of the number of buses in the area. That Blue Bus owns most of the buses that could have caused the accident is *a* relevant piece of information, but you need to do more to make it useful – you have to know what to do with that information and how it fits into the overall explanation of the accident.

(aside: Lee-Stronach also offers a good reason why we’ll accept the eyewitness over the statistical inference: “the incident is, by assumption, a direct cause of the content of the eye-witness’s testimony. Because there is no intermediate cause (e.g. the eye-witness is not learning about the event by hearsay), the chance that the eye-witness testimony would identify the Blue Bus Company, given the incident, can be determinately estimated” (15). This is again opposed to the statistical evidence, which is derived entirely independently of the accident in question.)

Finally, Lee-Stronach introduces a further experiment to show what’s wrong with the stochastic distribution of risk – the assumption that all the individuals in a given situation are equally likely to have committed an offense. His point is that, at least in some situations, that procedure generates absurd results, *even if *the individuals in question are, in fact, equally likely to have committed the offense. Here is the experiment:

“

Prison Yard: One hundred prisoners are in a yard under the supervision of a guard. At some point, ninety-nine of those prisoners collectively kill the guard. Only one prisoner refrains, standing alone in a corner. We know this from a video recording. The video shows that the participation ratio is 99:1, but it does not allow for the identification of the ninety-nine killers. No other evidence is available. After the fact, a prisoner is picked at random and tried. If ninety-nine prisoners, collectively, killed the guard and the defendant on trial is a prisoner, the probability of his guilt is 99%”

As Lee-Stronach notes, “randomization renders the selection of the defendant dependent on their group membership, rather than on their actions” (16) and “the selection procedure implies, without justification, that each prisoner is statistically exchangeable; that each defendant is just as likely as any other to have actually committed the attack” (17): surveillance footage shows 99 out of 100 prisoners kill a guard, with the other standing in the corner.

But even if you assume that the odds that any given prisoner being innocent are the same, that makes justice arbitrary: if you lined them all up and started randomly putting them on trial, the first one is 99% likely to be guilty; the other is 97/98, all the way down to the last one you select, who is 100% likely to be innocent if the earlier people are convicted. Surely the chance of your being innocent shouldn't depend on the order of the trials! At the very least, that contradicts the assumption that all of the prisoners were equally likely to have been involved with the murder.

It’s easy to spin this into additional absurd results. Let’s suppose that the threshold for “reasonable doubt” is less than 2%. So if I am at least 98% sure that a prisoner is guilty, then I vote to convict (leave aside your skepticism about this number; Tribe also has a lot to say about why setting a number at all is weird to do here. But those are questions about Probability Threshold, which Lee-Stronach is bracketing here). The first prisoner I see is 99% likely to have been involved in the killing, so I vote to convict. The second is 98/99, and so forth. It turns out you can convict a lot of prisoners this way – half of them, to be precise. But then we get to the 51st randomly selected prisoner, who is 48/49 likely to be guilty. My calculator says that’s 97.959183673% likely to be guilty. Acquit! (depending on where you round: if you round to the nearest percent, it’s 98%. Guilty! If you round to the nearest percent, the lucky 61st prisoner gets acquitted, since he's 38/39 likely to be guilty, which is 97.44%. The poor 60th prisoner is 39/40, or 97.5%, which rounds to the unfortunate 98%). But that generates an even more absurd result: the rest of the prisoners are now 100% likely to be guilty, since there's only one innocent one! As Lee-Stronach points out, this sort of thing can’t possibly be right – your innocence or guilt can't depend on when you’re randomly selected from a lineup. And it certainly shouldn't depend on where you round.

That of course supposes that we have the trials seriatim. Suppose we have them in parallel? Then we're back to Tribe's problem where the statistical evidence would lead to Blue Bus being found liable 100% of the time, despite only owning 80% of the buses. Based on a 98% guilt threshold, and a 99% chance that each prisoner is guilty, we will arrive at 100 convictions, which means that we have convicted the innocent prisoner.

Lee-Stronach concludes that “merely statistical evidence is consistent with multiple causal models and causal hypotheses. To nevertheless assign a determinate probability is to go beyond our evidence: it imposes more precision on a causal scenario than is permitted by the evidence” (20). He adds that “we should resist the stipulation made in these cases that no other evidence exists. From an epistemological point of view, this is false” (20). There is always more evidence in the abstract – relying on Statistical Inference alone is basically making a decision to stop looking for it.

So “how do we go about determining which probability values are permitted by the evidence” (21)? As usual – observation, intervention, simulation, etc., including assessments of counterfactuals, normalcy, relative plausibility and relevant alternatives. Statistical Inference is a starting point for juridical reasoning, not an endpoint. Next time I’ll have more to say about some things that emerge in the last three quotes.

## Recent Comments