It must be summer: Facebook has released a controversial study of its users. Last year, it was the demonstration that the emotional contagion effect did not require direct contact, and could in fact spread across social networks without direct, face-to-face contact (the controversy wasn’t in the result, it was in the fact that FB did the study by manipulating its users’ Newsfeeds to present more happy content) This time, Facebook’s research wing published a paper in Science purporting to demonstrate that Facebook wasn’t responsible for whatever online echo-chamber effect its users might demonstrate. Or, at least, if the site did contribute to an echo-chamber, it wasn’t the main contributor. From the FB blog discussing the paper:
“We found that people have friends who claim an opposing political ideology, and that the content in peoples' News Feeds reflect those diverse views. While News Feed surfaces content that is slightly more aligned with an individual's own ideology (based on that person's actions on Facebook), who they friend and what content they click on are more consequential than the News Feed ranking in terms of how much diverse content they encounter.”
Sorry Prof. Sunstein, nothing to see here! Move along… It should be noted that this isn’t the first time that FB has found that it either is irrelevant to Internet echo chambers or actually works to reduce their impact. This is from a 2012 study:
“Instead, we found that even though people are more likely to consume and share information that comes from close contacts that they interact with frequently (like discussing a photo from last night’s party), the vast majority of information comes from contacts that they interact with infrequently. These distant contacts are also more likely to share novel information, demonstrating that social networks can act as a powerful medium for sharing new ideas, highlighting new products and discussing current events.”
As with the emotional contagion study, these FB studies come with jaw-droppingly-large numbers of participants, often in the hundreds of millions. That sort of thing doesn’t ordinarily happen in social science research, so it comes with a certain gravitas.
Of course, that also isn’t the end of the story. In an absolutely essential response piece, Cyborgology’s Nathan Jurgenson dissects some of the ideological problems at work behind the curtain. The problems start at the beginning: the study was limited to FB users who self-identify their politics on the site, but that’s only 9% of users. As Jurgenson points out, even if that’s a big set of people, it’s a self-selected set of people, and we don’t know anything at all about whether those people are representative of FB users in general. So the study results are valid for less than 1 in 10 FB users, and that’s not something the site advertises (and I haven’t seen it reflected in news coverage of the study, though admittedly I’ve not been reading that much of the coverage).
More significantly, the study contributes to an emerging social ideology according to which the results of big data and other sophisticated analytics are somehow “objective” or “true.” I’ve got a paper up with a Foucauldian take on this as an operation of power in the context of online privacy notices. My argument there is that the reigning theory of online privacy, according to which we “consent” to the usage of our data by clicking to accept a privacy notice, needs to be read as a form of subjectification, producing neoliberal subjects who think privacy is something that they should view as an alienable commodity, the value of which is correctly set by individual preferences and utility judgments. Jurgenson is making an analogous but even more fundamental point: we are being taught to believe that something with a lot of numbers, or a big sample size, is “true” or “objective,” which requires studiously ignoring the fact that even super-big data absorbs a lot of the biases of those who code the systems and the societies they inhabit.
Some of this is easy enough to understand: Facebook discovered that a simple “don’t forget to vote” reminder on election day could nudge election participation rates up slightly; as Zeynep Tufekci and Jonathan Zittrain pointed out, the increase in voting behavior caused by the reminder was small in the absolute sense, but big enough to swing a close election – and that this means that FB could invisibly swing elections if it wanted. So “big” is a relative term, depending on whether you mean the size of the dataset, the amount of variation, the impact of the variation, and so forth. More difficult questions have to do with the extent to which big data projects pick up on existing implicit biases. There’s an emerging literature on the point; a good, representative piece by Solon Barocas and Andrew Selbst explores in detail how various problems in coding can write racial biases into the data, and then make it appear that everything is unbiased.
A more general point to emphasize is that, partly because it presents itself as “true” and with an enormous sample size, we are inclined to believe that the results of big data are primarily epistemic. That may be accurate in some limited cases. But most of the time, with big corporate sponsors in particular, the goal is what one might call “actionable intelligence.” Degli Esposti puts it succinctly:
“Although the insights analysts derive from data are based on rigorous analytical procedures, they should not be considered neutral or objective realities. A complex mix of hidden intentions, systematic and random errors, partial information or biased visions of the problem, contributes to making this new knowledge as situated and partial as any other type of knowledge” (212).
The goal is to nudge behavior, not to know people as they are. This matters because, as Esposti suggests, all kinds of non-objective material comes into the results of analytics. The claim isn’t that data analysts are intentionally skewing their studies any more than when Haraway wrote about situated scientific knowledges. It’s that little things happen, and those things matter. For example, as Martin French notes in the healthcare context, you get “gaps, cracks and blind spots” when understaffed local offices code differently from one another, or when the program does something wrong (like the date you update a patient’s HIV data suddenly becomes the date of initial diagnosis). Or, as Evelyn Ruppert puts it in a study of the British National Health Service, the sheer complexity of the systems involved belies the notion of a single, well-integrated “system” in the first place; “because these databases depend on diverse and complex socio-technical arrangements – of professionals, computers, software, forms, and all of the many actors involved in long chains of relations – their operation is highly variable and contingent, resulting in multiple actually operating systems in practice” (118)
All this means that we need to be very, very careful about the ways that the generation of statistical norms and statistical risk constitute us as subjects, even when we're not being overtly and obviously manipulated. That is, not only do we need to think carefully and critically about the specific ways that big data constitutes us, and the various biases and so forth that get written into that. We also need to note the fundamental subjectification involved in treating ourselves as describable by norms in the first place. As Mary Beth Mader says from a Foucauldian perspective:
“We might say, then, that the normal curve is the compulsory self-reflection of the society or the nation. But it is not a conscious self-reflection; it is relfection displaced onto the objective figure of the line, the curve, the histogram’s alleged indifference, the purity of number. Nor is it a direct relation with others; it is a relation to others that is in its essence a detour through the numerical amalgamation of all – a ligature so ontologically alien to the social world that it fails to qualify as a relation at all” (65).
All of this is an operation of power; in Foucault’s words, power is “a total structure of actions brought to bear upon possible actions; it incites, it induces, it seduces, it makes easier or more difficult; in the extreme it constrains or forbids absolutely.” In particular, one needs to look at “the different modes by which, in our culture, human beings are made subjects” and the various modes of subjection (ibid.), i.e., “the way in which the individual establishes his relation to the rule and recognizes himself as obliged to put it into practice” (History of Pleasure, 27)
One of the main problems is that many of the factors that big data treats as exogenous – from individual preferences to the coding of the system – turn out to be endogenous, parts of the socio-technical system itself. This makes it very hard to try to extract and study one variable like “individual choices to look at ideological content in Newsfeed” as though the individuals making those choices popped up like Hobbesian mushrooms at the moment their preferences or behaviors were measured. Anybody who tends to drive faster in a sports car than a minivan will see the point immediately. Bruno Latour has a nice essay using as example that “man with gun” is really not the same thing as “man” or “gun” or the simple addition of those two (and so the “guns don’t kill people, people kill people” line is very poorly formulated). The problem is that we refuse to admit the complexity of our technical systems into our analysis, and in particular, we refuse to admit that those systems both mediate our actions and constitute us as actors. Thus, “you are different with a gun in hand; the gun is different with you holding it. You are another subject because you hold the gun; the gun is another object because it has entered into a relationship with you” (33). This is a point for methodology, because it is a mistake “to start with essences, those of subjects or those of objects. That starting point renders impossible our measurement of the mediating role of techniques. Neither subject nor object (nor their goals) is fixed” (33)
Channeling Latour, Jurgenson puts the problem this way:
“This whole business of conceptually separating the influence of the algorithm versus individual choices willfully misunderstands what algorithms are and what they do. Algorithms are made to capture, analyze, and re-adjust individual behavior in ways that serve particular ends. Individual choice is partly a result of how the algorithm teaches us, and the algorithm itself is dynamic code that reacts to and changes with individual choice. Neither the algorithm or individual choice can be understood without the other.”
We need a lot more Foucault and Latour in our thinking about big data.