By Gordon Hull
Surely one of the more striking features of the rise of data science is how readily it can be incorporated into processes of capitalist valuation, to the point that data may not just be a commodity - it may also be capital. At one level, this sounds intuitive enough: one might suggest that data presents information about the world, and that knowledge is valuable. But this view encounters an immediate limitation in that firms seem to want to accumulate data far beyond their ability to value it. According to this “data imperative,” data is something to be accumulated, even if one has no clear use for it. An additional problem is that the companies that accumulate all of this data are generally not so interested in whether it represents the world accurately. They want it because they want to influence behaviors.
So what is going on, and is there something about data itself that makes it particularly suitable for capitalist accumulation? To answer that question, one needs to have a handle on what data is. Here, a recent paper by Sabina Leonelli is particularly helpful. Lenoelli defines data as follows:
“I define ‘data’ as a relational category applied to research outputs that are taken, at specific moments of inquiry, to provide evidence for knowledge claims of interest to the researchers involved. Data thus consist of a specific way of expressing and presenting information, which is produced and/or incorporated in research practices so as to be available as a source of evidence, and whose scientific significance depends on the situation in which it is used. In this view, data do not have truth-value in and of themselves, nor can they be seen as straightforward representations of given phenomena. Rather, data are essentially fungible objects, which are defined by their portability and their prospective usefulness as evidence” (811)
Leonelli is writing from the perspective of the philosophy of science, and her paper is especially useful in that regard because she is able to embed the definition in recent work in the sociology of science and STS, such as Latour. The key takeaway is that data needs to be defined pragmatically – not what it is, but what it does. And the key implication of that is that we need to stop thinking of data as representative. Data is generated for a purpose, and in science, a central purpose is portability:
“Scientists engage in data generation in full awareness that the outputs of that activity need to travel beyond the boundaries of their own investigation. That awareness is built into the choice of instruments used, the recording of procedures and protocols carried out in lab books, and the decisions about how outputs may be stored, shared with peers, fed into models, and integrated with other data sources” (816)
In this sense, data are material artifacts that are designed to circulate. In order to circulate, they have to be standardized in such a way that they can be commensurable with (or at least relate in a predictable way to) other data, and that need for standardization fundamentally drives their production process. In other words, data is always already packaged to the extent that its meaning as data is inseparable from its packaging.
To see this, consider the literature on commodification. As Frank Cochoy argues at length, commodification is a complex process involving both standardization and product differentiation. A good traded at a bazaar is not a commodity. In that archetypical transaction, I buy rice from a local vendor, and he gives me a bag of it, scooped out with an idiosyncratically-sized cup, which I then take home and cook. This process is entirely local, any my willingness to buy rice from that vendor is a product of my prior experiences with him or his ability to catch my eye. He’s offering rice, and I can see it and inspect it – but there is a relatively limited apparatus for comparison, especially in objective terms.
As that rice becomes increasingly a commodity, the local context and my social relation with the vendor are gradually submerged. A critical step occurs when the rice is packaged. If the rice is to be traded outside of the local bazaar, vendors need a way to mark rice as their own, both to enable it to be traded as rice, and in order to differentiate it from the rice of others. Producers accordingly use a strategy of various layers of packaging and marks to create identifiable differences between otherwise indistinguishable goods. One step in this process is the introduction of packaging according to brand, so consumers can shop for “wheat x” as opposed to simply “wheat.” As Cochoy indicates, the packaging functions as a screen in a double sense: on the one hand, it screens/hides direct access to the good; on the other hand, it serves as a projection screen to indicate features of the good’s origin and the qualities the producer wants to associate with it. A brand’s indication of a product’s serves to occlude any details of the process by which it is actually produced, such as what it is made of, or where it is made (61-2). The attribution of origin is mythological in the sense that it presents whatever narrative about the origin of the product the producer wants to communicate.
Standardization is necessary in the production process because consumers, particularly non-local ones, need to know that products they purchase from a given vendor and with a given designation today will be the same as the supposedly identical product purchased yesterday or tomorrow. The biggest challenge here is that the production process has to be homogenized enough so that its output is consistent. In order for that to happen, there must be some sort of specification as to which of the indefinitely many possible characteristics of the good are to count as its qualities, and standards for how those qualities are to be measured and evaluated. Some of these qualities, such as weight, are obvious, but others – terms like “pure” or “traditional methods” – can require extensive specification. There is extensive ethnographic research into niche products such as canola oil, “quality” salmon, Burmese teak, and organic vegetables. The salmon case is particularly interesting because it shows that the processes of standardization and specification are bidirectional: in converting the salmon of copper river into a more upscale product, the presentation had to resemble that of aquaculture salmon. All of the individual salmon had to be presented in such a way that they became legible as quality salmon, such that supermarkets could classify them as quality salmon and consumers could compare them with other quality salmon.
When used to illuminate the context of data as something already-packaged, this literature helps to underscore two things: first, data is produced with an eye towards its circulation; and second, commodities are also produced with an eye toward their circulation. The fundamental congruence here suggests part of why data is so easily subject to capitalization and commodification: it’s already a lot of the way there. This basic point can be obscured by the fact that data is often taken to be representative of some underlying substrate. But that’s not what data does. It is a truism by now in the data literature that data is action-oriented – and here we see a specific consequence of that. As Mary Beth Mader notes of the difference between individuals and the members of a population, the difference is ontological. The process of generating data is, like the process of generating quality salmon, one of producing an item into a classificatory schema. This classificatory process, as Marion Fourcade and Kieran Healy argue, is constitutive of the process of big data as it is used in markets under capitalism.
As Leonelli notes, this means that the potential reach of data is nearly infinite; or, rather, its reach extends precisely insofar as it can circulate.
“This does not mean that whoever gathers data already knows how they might be used. Rather, what matters is that observations or measurements are collected with the expectation that they may be used as evidence for claims about the world in the future. Hence, any object can be considered as a datum as long as (1) it is treated as potential evidence for one or more claims about phenomena and (2) it is possible to circulate it among individuals.” (817)
In short, part of why it is so easy to commodify data is that data is already a long ways towards being a commodity.
Recent Comments