A federal judge today ruled that some of the NSA’s broad, warrantless collection of data from American citizens, particularly of so-called ‘metadata,’ which includes routing information for phone calls (what phone numbers have been in contact with each other, and so on. This can be very damaging!) – did not violate the constitution. This ruling contradicts an earlier federal court ruling that the data collection was unconstitutional, and the issue seems likely headed to the Supreme Court.
If we set aside the details of this case for the moment, it seems to me that an important set of issues around big data is emerging. That is, to be succinct, that the concept of ‘privacy’ is absolutely no good at all in slowing it down. I don’t, however, think that the problem is either the frequently announced ‘end of privacy’ or the so-called ‘privacy paradox’ (that people say they value privacy but then act as if they don’t). Rather, I think the problem is more basic.
Today’s opinion correctly reports a fact about privacy law: if I voluntarily disclose information to a third party – any third party – I lose any Fourth Amendment claim to privacy over that information, no matter how many times it changes hands afterwards. That’s a problem, one well captured by Helen Nissenbaum’s work on privacy as ‘contextual integrity’ (or see the original paper here), which argues that moving information out of one context and into another can very well change the appropriate norms for sharing it. But the pairing of ‘voluntary’ and ‘information’ also suggests that I have some cognizance of the semantic content of what I am sharing. I may not know why information is valuable to someone else, but I at least know what that information is.
Big data challenges that. We have no idea of the meaning of this material we are providing as we go about our daily lives, or even that it is meaningful: we are providing data, not information. The NSA case is exemplary:
One could certainly object to this example on the grounds that the NSA’s lack of other information doesn’t demote my phone records to mere data. After all, I still voluntarily release my metadata. But there’s a lot of examples where it’s not clear that I voluntarily release information, as a good deal of what big data does is mining: it generates new, emergent information from large data streams. If nothing else, there's a lot of material that is going to be 'data' when viewed at one moment or from one point of view, and 'information' when viewed from another.
In short, a meaningful notion of information needs to include some sort of semantic content, and probably some de minimis level of significance. Most of what makes big data so powerful is its ability to generate new information from vast data flows. Most of what we give to big data doesn’t meet the bar for information, at least not at the time we give it. It only becomes information in conjunction with other data and after a complex analysis. So privacy is late to the game. Every time.