By Gordon Hull
In the previous two posts (here and here) I’ve developed a political account of authorship (according to which whether we should treat an AI as an author for journal articles and the like is a political question, not one about what the AI is, or whether its output resembles human output), and argued that AIs can’t be property held accountable. Here I want to argue that AI authorship risks social justice concerns.
That is, there are social justice reasons to expand human authorship that are not present in AI. As I mentioned in the original post, researchers like Liboiron are trying to make sure that the humans who put effort into papers, in the sense that they make it possible, get credit. In a comment to that post, Michael Muller underlines that authorship interacts with precarity in complex ways. For example, “some academic papers have been written by collectives. Some academic papers have been written by anonymous authors, who fear retribution for what they have said.” Many authors have precarious employment or political circumstances, and sometimes works are sufficiently communal that entire communities are listed as authors. There are thus very good reasons to use authorship strategically when there are minoritized individuals or people in question. My reference to Liboiron is meant only to indicate the sort of issue in the strategic use of authorship to protect minoritized or precarious individuals, and to gesture to the more complex versions of the problem that Muller points to. The claim I want to make here is that , as a general matter, AI authorship isn’t going to help those minoritized people, and might well make matters worse.
If anything, therre’s a plausible case that elevating an AI to author status will make social justice issues worse. There’s at least two ways to get to that result, one specific to AI and one more generally applicable to cognitive labor.
(a) AI Specific: The AI specific reason is that the entire AI industry is, as Kate Crawford explains, an extractive industry based on a variety of forms of exploitation, from the communities where the rare earth metals that compose the hardware are mined, up to the people whose data is incorporated into systems without their knowledge, up to the Mechanical Turkers who have to label the data. We’ve known for a while thast LLMs are bad for the environment and poorly serve the people who will suffer the most from that environmental harm. We’ve also known that content moderation for platform companies depends on human labor, which is often terribly paid and traumatized. One of the great myths of AI and data in general is that it all happens without human intervention.
It should come as no surprise, then, that Open AI employed Kenyans at less than $2 an hour to make the system less toxic. The model trains language scraped from the Internet, which is often toxic. So the designers took a page from Facebook’s playbook and built an AI to detect and remove toxic speech. As Billy Perrigo reports for Time:
“To build that safety system, OpenAI took a leaf out of the playbook of social media companies like Facebook, who had already shown it was possible to build AIs that could detect toxic language like hate speech to help remove it from their platforms. The premise was simple: feed an AI with labeled examples of violence, hate speech, and sexual abuse, and that tool could learn to detect those forms of toxicity in the wild. That detector would be built into ChatGPT to check whether it was echoing the toxicity of its training data, and filter it out before it ever reached the user. It could also help scrub toxic text from the training datasets of future AI models.”
Of course, to train that model, you have to give it lots of examples of correctly-labeled toxic speech, and therein lies the problem:
“To get those labels, OpenAI sent tens of thousands of snippets of text to an outsourcing firm in Kenya, beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet. Some of it described situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest.”
Open AI outsourced to Sama, which claims to be an “ethical” AI company and to have lifted thousands of people out of poverty. However, as Perrigo’s reporting details, they also paid people $2 an hour or less to read graphic textual descriptions of sexual violence (etc) and label it, while pocketing most of the money that Open AI paid for the service. The workers were, predictably, often left traumatized.
Recall from Foucault that an author is considered the “origin” point for a text – in other words, we stop our inquiry for most texts when we arrive at the author. We may ask where the author gets a certain claim, and demand a footnote, but the text itself is generally taken to start with the author. Treating an AI as an author is essentially to fetishize its output as its product, which occludes both the content it scraped, and (this is often overlooked) the human labor that cleaned up that content and made it usable for the AI. We know that this labor is exploited and traumatized; calling the AI an “author” makes it even harder to see.
Including the AI as an “author” thus risks the opposite effect of calling a custodial worker or childcare provider an author. In the latter case, you are naming and crediting the workers without whom research couldn’t have happened. In the former case, you are burying those workers further under the AI. There is no reason to do that, and doing so is swimming upstream against a discussion about how to be just in understanding research authorship.
(b) Cognitive Labor: The more general reason is that intellectual labor is produced in common. As Michael Hardt and Antonio Negri emphasized some twenty yers ago:
“Immaterial labor tends to take the social form of networks based on communication, collaboration, and affective relationships. Immaterial labor can only be conducted in common, and increasingly immaterial labor invents new, independent networks of cooperation through which it produces” (Multitude, 66)
That is, to produce knowledge, you depend on social and intellectual networks. This is also a point made repeatedly in the early 2000s literature on large-scale social production of knowledge, in books like Yochai Benkler’s Wealth of Networks. This argument interacts with a substantial critique of intellectual property: it often depends on a notion of solitary authorship that ultimately dates to literary romanticism. As a theoretical construction of creativity and its credit, in other words, this notion of authorship is a poor fit with knowledge production today (if it ever was a good fit). Indeed, this is why joint and other collective forms of authorship have been tricky to integrate into intellectual property law.
Hardt and Negri ground their argument in a reading of Marx’s so-called “Fragment on Machines” from the Grundrisse. In it, Marx argues that machinery (broadly construed) becomes the repository for social knowledge, as various inventions get built into machines which are then used by workers:
“The accumulation of knowledge and skill, of the general productive forces of the social brain, is thus absorbed into capital, as opposed to labor, and hence appears as an attribute of capital, and more specifically of fixed capital, in so far as it enters into the production process as a means of production proper”
Fixed capital refers in this case to machinery used in production; all the knowledge and skill of the people who invented that machine – the scientists, the engineers, etc. – is represented in the production process in the form of the machine. As a result, this social labor appears as the machine:
“In so far as machinery develops with the accumulation of society’s science, of productive force generally, general social labor presents itself not in labor but in capital. The productive force of society is measured in fixed capital, exists there in its objective form; and, inversely, the productive force of capital grows with this general process, which capital appropriates free of charge” (694-5)
Chat GPT is such a device, and as Marx implies will happen, it’s exploited by capital free of charge. If Kenyan content moderators are directly exploited, most of what LLMs depend on just gets taken for free from the recently declared biopolitical public domain of data on the Internet, scraped up. Tiziana Terranova argued in a famous paper in Social Text that this was how the Internet worked in general: value gets added to websites because we contribute the content (= do the work), which they appropriate for free. LLMs do exactly that.
Calling an LLM an “author,” then, participates in a straightforward process of commodity fetishism, where all the processes that generated the LLM are submerged below the surface, and the LLM itself is represented as somehow generating the social capital that it has scraped from the Internet. Calling a human an author arguably does the same thing, but there are reasons to do so – accountability, for example – that justify applying the term, especially in the context of research. But there is no comparable advantage for the AI, and so insisting on calling the AI an author just repeats the myth of the romantic author for no obvious offsetting gain.
There may well be cases where the strategic use of authorship to protect precarious individuals is in order. However, those strike me as exceptions to a general rule that would require individualized treatment, rather than the other way around. Next time I’ll try to articulate a more philosophical claim about human-technology interaction.
Recent Comments