By Gordon Hull
A current paper by Mireille Hildebrandt sent me to a paper from 1994 that I’m embarrassed to say I hadn’t read before: Philip Agre’s “Surveillance and Capture.” Agre’s paper has been cited over 300 times, but it’s missing in a lot of the privacy literature I know. After reading it, I’ve decided that’s a mistake, and it’s time to make amends. I’ll begin by saying why I think Hildebrandt is exactly right to bring the paper up in a context of big data.
Agre starts with quotidian examples of tracking, things like employee ID cards that let systems know where they are, to UPS package tracking. The core argument of Agre’s paper is that there’s two conceptual models of privacy that need to be distinguished in making sense of phenomena like these: one, the one that we all talk about all the time, is surveillance. The surveillance model grew out of experience with state bureaucracies, particularly in the Soviet bloc, and features visual metaphors (typically Orwell and Bentham); assumes that the watching is nondisruptive and secret; involves territorial metaphors like invasion of space, which tend to then lead to a dichotomy between coercion and consent; involves centralized orchestration; and is identified with the state. This model, though ubiquitous, isn’t the only or even the best one. Agre spends most of his time developing the alternative model, though he notes that “when applied as the sole framework of computing and privacy, the surveillance model can lead to oversimplified analysis” and suggests that it lends itself to caricature and easy dismissals such that “genuinely worrisome developments can be seen as ‘not so bad’ simply for lacking the overt horrors of Orwell’s dystopia” (116). It’s not hard to say that Agre got this one right: whether it’s the well-trodden problems with notice-and-consent or the stubborn persistence of the “I’ve got nothing to hide” deflection, the surveillance model isn’t adequate to privacy worries now.
The second model, which Agre ties to computer science and information systems management, is built around the idea of “capture.” The capture model also has five characteristics, which Agre develops as a contrast with surveillance. Capture: uses linguistic metaphors for human activities; assumes that linguistic parsing of activities is active intervention; defaults to structural metaphors; is decentralized; and is driven by to reconstruct human activity “through [its] assimilation to a transcendent (‘virtual’) order of mathematical formalism” (107). The core concept here is grammar: the basic mechanism of capture is to pick a set of atomic elements (people, UPS packages) and to then develop a grammar that describes their possible movements or changes. This model is then imposed – sometimes coercively – on the activities it purports to describe, mechanisms are used to then track and measure the activity according to the grammar; the system can then be further calibrated.
Even a quick survey suggests the applicability to data science, which spends its time discovering models that describe how human processes work, and then using those models to measure those processes, make predictions for the future, and so one. What Agre does very well is unmask the sleight of hand on which this depends – a slippage from descriptive to normative:
“This cycle is normally attended by a kind of mythology according to which the newly constructed grammar has not been ‘invented’ but ‘discovered.’ The activity in question, in other words, is said to have already been organized according to the grammar” (110)
Hildebrandt underscores precisely this: “his capture model emphasizes the fact that data-driven systems reconfigure their environment to gain access to more data, turning both our environment and ourselves into data engines” (94) and, since we are components of the system to be captured, “we must be reconfigured in ways that enable the capture of behavioral and other data “from” us” (95)
Agre thoroughly dismantles this mythology, adducing eight different reasons why the supposedly neutral descriptive process of capturing and modeling systems in fact inexorably alters them to conform to the model. I will mention only a few here, with examples from social media, since social media is a primary site of data extraction. First, the representation often suggests rearrangements of the activity, sometimes to facilitate the capture process. For example, Facebook’s initial flattening of social networks into “friend” and “not friend” facilitated the inflation of all acquaintances into the “friend” category – and thereby facilitated data capture by FB.
Second, people alter their conduct to anticipate the system. To use another older social networking example, danah boyd documented in detail how teenagers performed “friendship” on MySpace, fully aware that the mediating social network changed the meaning of what they were doing. Many would use dual identities on MySpace, one for their parents and one for their friends. Today, people edit their social network activities to present a version of themselves that they are not afraid for employers to see. As Agre puts it, “inasmuch as the captured actions are addressed to an ‘audience’ via computer-mediated representation, they take on a ‘performative’ quality that belies the intendedly objective character of the representational process” (112).
Third, the capture process introduces new forms of politics, especially as people try to game or subvert the system. Consider Kate Crawford and Tarlegon Gillespie’s discussion of the impoverished vocabulary of flagging content online. Flagging is a perfect example of an effort at capture: it operationalizes a grammar of being offended to facilitate a data system that makes an online environment more efficient.. That is, these flags “not only individualize expressions of concern, they transform them into data points” (3). The obvious problem is that the vocabulary is completely impoverished (the third of Agre’s eight is that grammars “frequently oversimplify”): there is no “that’s offensive but worth it” button, for example. More generally, as Crawford and Gillespie put it (follow along with your Agre notes):
“These data subsume and come to stand in for the users and their objections. Some of the moderation is then handed off to algorithms, or to human-algorithm hybrids, designed to sort flags into manageable expressions of the user base, prioritized by abstract judgments of content egregiousness and the inferred flag motivation, all designed to simulate human judgments of importance and urgency. Human operators tasked with responding to these flags are similarly bound by the categories and sorting procedures, and find they must respond in algorithmically rule-bound ways: approve, deny, or escalate” (3-4).
To return to my third point, flagging also involves a politics that undermines its credibility as a neutral representation of user sentiment: “Flags are also deployed by users in more social and tactical ways, as part of an ongoing relationship between users” (11). Some “fine people” might flag all anti-racist content as offensive, for example. It follows that “the fact that flagging can be a tactic not only undercuts its value as a ‘genuine’ expression of offense, it fundamentally undercuts its legibility as a sign of the community’s moral temperature” (11).
In short, as Agre puts it, “The picture that emerges is at odds with the mythology of transparent representation” (112). This mythology is of course endemic to data science. Hildebrandt highlights one version of it – the “n=everything” view according to which machine learning encompasses everything and therefore achieves some sort of capacity to transcend limitations of perspective, data and so one (92). But other iterations of the point are easily derived. They generalize to the thought that, understanding data as capture underscores that we are invariably dealing with a process of subjectification in Foucault’s sense – data science is about making people, because when it purports to represent their behaviors, it is inexorably altering them.
In Siting Translation, Tejaswini Niranjana talks about how the British used grammar as a tool in their colonial project in India: the very act of going into indigenous communities, studying the languages, and compiling Western-style grammars and dictionaries subtly altered them, making the people who used them more legible to the British, and more easily governed. With a nod to Niranjana, then, let’s call this a grammatical colonialism of big data.
Recent Comments