It appears that HUD, as part of the general initiative to stop enforcing the housing laws it’s supposed to enforce, is poised to allow landlords to hide behind computer algorithms as they discriminate against minority tenants. As Andrew Selbst – who co-authored one of the foundational pieces on exactly this sort of problem – describes the proposed rule change in Slate:
“The proposal, while billed as a mere update to bring regulations in line with the Supreme Court’s 2015 decision, creates entirely new rules for landlords using algorithms. There are functionally two separate defenses being created. First, the proposal allows landlords to use an algorithm as long as its inputs are not “substitutes or close proxies” for protected characteristics and as long as it’s predictive of what it purports to predict—or a “neutral third party” certifies that fact. So if a hypothetical landlord decides to predict number of noise complaints as a proxy for difficult tenants, using music streaming data they somehow obtained, they might find a correlation between preferred musical genre and how difficult a tenant is. Of course, musical preference is not a substitute or close proxy for race, but an algorithm that equates a preference for hip-hop with noise complaints is probably picking up on race as a factor in frequency of noise complaints. Unlike under existing law, under this rule, a landlord would be off the hook even where there may be less discriminatory alternatives. Second, the landlord is also immunized from any discrimination claim if he uses a tool developed and maintained by a recognized third party. These safe harbors supposedly ensure that a model is legitimate and the landlord has not himself “caused” the discriminatory outcome”
A brief discrimination backgrounder: legally, discrimination can be on the basis of disparate intent, or disparate impact. Disparate intent is what it sounds like: I discriminate against black tenants if I decide not to rent to any, because they are black. As you can imagine, this sort of discrimination is hard to prove, because not so many people are dumb enough to advertise it. The more standard strategy is to mask discrimination behind proxies that name race (or some other protected category) without actually naming it. Given residential segregation, for example, zip code is a pretty decent proxy for race, and so a landlord might systematically disfavor applicants whose previous address is in a specific zip code. That’s also not ok, because although the rule doesn’t (directly) “intend” to discriminate, the impact of this policy is disparate between racial groups, and so is the same as if it did.
Enter algorithms. Employers, landlords, and everybody else who deals with applicants love algorithmic systems that are supposed to preselect in (or out) certain of those applicants. What goes into your credit score? You don’t know – but your landlord or employer will use it to decide whether you’re a desirable applicant. Credit score isn’t protected, but race is, and this generates a further problem. You can tell a system not to look at race, but if it’s a machine-learning system, it might come up with a proxy variable – like zip code – that can substitute for race. It’s not actually easy to get these systems to stop doing that, and even well-intentioned programs that try to avoid discrimination can have troubling outcomes.
The problem is particularly acute for those who belong to a group that’s suffered from institutionalized discrimination in the past. For example, predictive policing software is designed to send more police to areas where crime is more likely to happen. It sounds like a great idea at first – but it’s a complete mess from a disparate impact standpoint. For the system to work, you have to give it some data so it can look for variables that are associated with markers of crime. You might, for example, look at arrest records in a given area. The system churns, and figures out that concentration of non-white people is predictive of high arrest rates. Alternately, you could tell it not to look at race, and then it concludes that certain zip codes are predictive of high arrest rates. The system then dutifully tells police departments to send more cops to minority neighborhoods. But there’s a fundamental problem: arrest rates are high in minority neighborhoods not because crime is actually higher there, but because of a history of racist policing tactics that harass minorities (and often arrest them for made-up reasons; “arrest” is not as good a proxy for “crime” as lots of people think). In short, Garbage In, Garbage Out.
The proposed HUD rule makes it a lot harder to try to root out this sort of problem in housing. Systems will categorize you in all sorts of ways; as John Cheney-Lippold notes, Google thinks his young, female colleague in science is an older man, because reading lots of articles in biology journals is associated with men, not women. But that’s because of well-known problems with women rising to the level of senior faculty in science faculties. This is just how systems like this work. As Cheney-Lippold underscores, this isn’t about you, it’s about how ‘you’ (the aspects of you represented in whatever datasets) lead the system to classify you:
“Void of subjective assessment, our ‘gender,’ ‘race,’ and ‘class’ has little resemblance to how we encounter gender, race, and class. This is why the idea of ‘personalization’ – the assumption that you, as a user, are distinctive enough to receive content based on you as a person, with a history and with individual interests – largely does not exist. We are instead communicated to through profilization, the intersections of categorical meaning that allow our data, but not necessarily us, to be ‘gendered,’ ‘raced,’ and ‘classed’” (87)
Since it’s a little hard to know what a “substitute or close proxy” for race is, it’s going to be hard to stop a lot of mischief from entering through that door. Music preference is no more a substitute or close proxy for race than reading biology articles is for gender. And yet being put into either category can result in systems classifying you in ways that could hurt you.
Even worse, all the landlord has to do is trust some other system (what landlord builds their own big data system?) that can hide behind its trade secrets, and it will be staggeringly hard to mount a disparate impact claim. Even well-intentioned landlords will be freed of any need to even think about the impact of their tenant screening policies, since the certification of the systems they rely on will signal to them that the process is fair. As Dan Burk and Tarleton Gillespie noted of digital rights management systems, this default to systems treats individuals as incompetent, and relieves them of any sort of responsibility for their decisions.
Margaret Hu warned a couple of years ago that we risk creating a society of “Algorithmic Jim Crow:” everyone will be equally screened and assessed by systems – reversing the injustice of old systems of discrimination that only scrutinized minorities. But precisely this equality in surveillance will enable systems to deeply entrench and mask disparate impact discrimination. HUD’s proposed rule us takes a step in that direction.
Recent Comments