A recent paper by Ermanno Bencivenga in Philosophical Forum argues that it’s “time for philosophy to step into the conversation” (135) about big data, in particular to refute the thesis, which the article identifies in a 2008 piece in Wired, that big data will mean that we no longer need theory: “with enough data, the numbers speak for themselves” (qt. on 135). The paper draws on concerns about spurious correlations: to demonstrate that a correlation is legitimate, it “must be shown to manifest a lawlike regularity; there must be a theoretical account of it,” that laws have to cohere with one another, and so on (139). In other words, “knowledge is constitutionally dependent on theory” (ibid.). Bencivenga concludes:
“Big Data enthusiasts are (unwittingly advocating a new definition of what it is to know. Their agenda is (unwittingly) semantical. Except that it is not worked out, and any attempt at developing it in the semantical terms that have been current (and antagonistic) for the past two millennia is hopeless. I will not rule out that a new set of terms might be forthcoming, but the burden is / on those enthusiasts to provide it; simply piling up data and being awed by them will not do. What would be needed, ironically, is a new theory of knowledge, which so far I have not seen. This is the reason why I have made an effort to get clearer about the claims being made, so that we can have a more orderly discussion of them and what it would take to make progress in it” (141-2).
Fair enough, though I do want to note that the paper does not engage with any literature about big data other than the dated piece from Wired; and to hear enthusiastic techno-babble from Wired is not surprising. That’s what they do.
It’s also worth pointing out that these sorts of concerns have been expressed before. Here are danah boyd and Kate Crawford from a widely-cited paper in 2012. After noting that “Big data reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and categorization of reality” (665), they caution that:
“Interpretation is at the center of data analysis. Regardless of the size of a data, it is subject to limitation and bias. Without those biases and limitations being understood and outlined, misinterpretation is the result. Data analysis is most effective when researchers take account of the complex methodological processes that underlie the analysis of that data” (668)
These are cherry-picked quotes – arguably, the entire paper is a response to the sort of enthusiasm in the Wired piece, and the focus on our hidden rules for interpretation is clearly directed at the view that somehow data self-interprets. And there are certainly more papers that bring up this and analogous topics; Luciano Floridi raises similar ones here (Floridi’s worries resonate well with Amoore’s, discussed below). That said, I think there’s something to be said for speaking of big data in Kantian terms, though not perhaps for the reasons Bencivenga advances.
Recent Comments