By Gordon Hull
The diversity of language has been a philosophical problem for a while. Hobbes was willing to bite the bullet and declare that language was arbitrary, but he was an outlier. One common tactic in the seventeenth-century was to try to resolve the complexity of linguistic origins with a reference to Biblical Hebrew. Future meanings could be stabilized with reference to Adamite naming. I’ve essentially been arguing (one, two, three, four) that we’re seeing echoes of this fusion of orality, intentionality and origin in the various kinds of implicit normativity that make it into Large Language Models (LLMs) like ChatGPT. In particular, LLMs depend on iterability as articulated by Derrida, but we tend to understand them with models of intentionality that occlude subtle (and not so subtle) normativities that get embedded into them. Last time, I looked at Derrida’s critique of Searle for what it had to say about intentionality. I also suggested that there was a second aspect of Derrida’s critique that is relevant – Derrida accuses Searle of relying too much on standardized speech situations. I want to pursue that thought here.
Let’s start with a joke:
For Derrida, an emblematic problem is that Searle prioritizes “serious literal speech” (qt 67) in setting out his speech act theory and in proposing model speech acts to analyze. The Derridean counter, of course, is “an explicit deconstructive critique of the oppositions ‘serious/non-serious,’ ‘literal/non-literal’ and of the entire system of related oppositions” (67). I will return to this point, but let’s begin by noting the evidence Derrida accumulates to support his thesis: he quotes Searle as saying that he will “deal only with a simple and idealized case” and will be “ignoring marginal, fringe, and partially defective promises.” This set of enabling conditions for “serious and literal linguistic communication” together imply:
“such things as that the speaker and hearer both know how to speak the language, both are conscious of what they are doing; they have no physical impediments to communication, such as deafness, aphasia, or laryngitis; and they are not acting in a play or telling jokes, etc. it should be noted that this condition excludes both impediments to communication such as deafness and also parasitic forms of communication such as telling jokes or acting in a play” (qt on 68, emphasis Derrida’s).
Derrida’s concern is the normative idealization present in excluding parasitic communication, and he argues that this betrays Searle’s proximity to a phenomenological tradition that Searle disavows. That is, “in this passage I find confirmation not only of the fact that the criterion of intention (responsible, deliberate, self-conscious) is a necessary recourse in order that the ‘serious’ and the ‘literal’ be defined … but also and above all of the fact that this intention must indeed, according to his own arguments, be situated ‘behind’ the phenomenal utterance … no criterion that is simply inherent in the manifest utterance is capable of distinguishing an utterance when it is serious from the same utterance when it is not” (68). To anticipate things a bit, Searle is hanging his theoretical hat on intentionality, when he should be hanging it on iterability.
Searle’s proximity to phenomenology, broadly conceived, as dealing with a reduction in typical acts of perception (etc). is evident in Searle’s reply, where he argues that:
“Austin’s exclusion of these parasitic forms from consideration in his preliminary discussion is a matter of research strategy; he is, in his words, excluding them ‘at present’; but it is not a metaphysical exclusion: he is not casting them into a ditch or perdition, to use Derrida’s words. Derrida seems to think that Austin’s exclusion is a matter of great moment, a source of deep metaphysical difficulties, and that the analysis of parasitic discourse might create some insuperable difficulties for the theory of speech acts. Bu the history of the subject has proved otherwise. Once one has a general theory of speech acts … it is one of the relatively simpler problems to analyze the status of parasitic discourse …. But the terms in which this question can be intelligibly posed and answered already presuppose a general theory of speech acts” (“Reiterating the Differences,” 205).
From a Derridean point of view, this is pretty clearly question-begging: Derrida wants to know why we should elevate a “typical” speech act into the model for the theory, and Searle responds that, having elevated the typical speech act into the model, we can now explain deviations from it.
It is worth noting a few things about Searle’s procedure in the context of LLMs.
First, the model is fundamentally oral in the sense that Derrida means it in the grammatology writings. That is, it presupposes that the eidos of the communicative act is one of two people speaking together, face to face. There is an implicit demotion of written text, along the lines of Plato’s account in the Phaedrus. Language models create a scene that imitates speaking subjects, even though the language model itself has ingested text that may or may not be on an oral model.
At the same time, the implementation of the language model is, at its core, computational. Katherine Hayles noted some fundamental differences between code – she is thinking of code that one might write to generate a “hello world” message. The sort of thing you create in computer science class – and language.:
“Whatever messages on screen may say of imply, they are themselves generated through a machine dynamics that has little tolerance for ambiguity, floating signifiers, or signifiers without corresponding signifieds. Although the computational worldview is similar to grammatology in not presuming the transcendental signified that Derrida detects in Saussure’s speech system [here: Searle’s as well], it also does not tolerate the slippages Derrida sees as intrinsic to grammatology. Nor does code allow the infinite iterability and citation that Derrida associates with inscriptions, whereby any phrase, sentence, or paragraph can be lifted from one context and embedded in another” (My Mother was a Computer, 47-8).
Now this is about code. What about when code produces language?
“Somewhat paradoxically, then, the more data that are stored in computer memory, all of which are ordered according to specified addresses and called by executable commands, the more ambiguities are possible. Flexibility and the resulting mobilization of narrative ambiguities at a high level depend upon rigidity and precision and a low level” (ibid., 53).
Hayles is writing well before LLMs were a thing (the book is from 2005), but I think it nonetheless underscores the need to be very careful in thinking about how LLMs produce language. The underlying architecture involves code, which tolerates no ambiguity. It incorporates data that has been curated both in an abstract sense to be legible as text and more specifically to prune out certain undesirable elements. As a result this data is used to produce language, which works through a system of statistical predictions but generates output that precisely produces narrative ambiguities at this high level (to the point that the assumption has been that more data = better model performance) This series of predictions produces something that appears to imitate a language system of the sort analyzed by Searle, where there are “normal” uses. Except that the sense of normal is statistical, and not transcendental – more on this in a minute.
The second point is that there are a number of normative assumptions shadowing Searle that are relevant in the context of LLMs. One of them is the frank ableism: in prioritizing an idealized speaking scenario, Searle also prioritizes one in which all the participants are able to use their voices to communicate. On the one hand, this indicates the way that the prioritization of the speech situation (over, say, text) is itself a normative priority. On the other hand, it indicates the extent to which the communicative situation is quietly idealized beyond the move to orality and intention – it’s a particular oral situation, and in particular it’s one in which spoken words are sufficient to indicate language, absent gestures or other forms of “non-verbal” communication.
Third, as I alluded to above, the dismissal of “parasitic” speech calls to mind the curation efforts that go into establishing the corpora for AI models. One level of this is explicit: removing toxicity, though of course (as I suggested last time) there are a lot of things that get built-in to that, such as the over-filtering of LGBTQ speech as sexually explicit when it is not. Another level is what is included and not filtered – the strange and often disturbing morass of materials Birhane et al found in the LAION dataset. But it is important to note that LLMs are probability models; low frequency tokens can drop out of the model entirely. Not only that, the model, by design, will default to versions of what is most common. This is the sense in which ChatGPT returns a blurry jpeg of the Internet. This is an idealization that’s different from the one Searle makes, of course, since ChatGPT is probably not committed to the metaphysics of presence. But it’s an implicit normativity. Adding RLHF will just make all of this fuzzier, since it involves aggregating the inputs of unknown people, working with snippets out of context, to determine what the “better” or “more typical” responses are.
Evidence that LLMs are internalizing the Searle model of language, where some kinds of speech acts are considered “typical” and others “parasitic” can be seen in their performance in cases of parasitic speech. Consider jokes. Recent research suggests that ChatGPT is not very good at humor; it can tell jokes, but they’re not new, not particularly funny, and it seems to recycle the same 25 or so jokes. When it tells jokes that make no sense, it can’t explain them. Another recent paper by Nima Zargham et al starts to delve into the reasons why, suggesting that “humor might be, given the relatively higher importance of the social dimension in its operation, one of the most challenging human qualities to imbue AI with” (2). The core problem is that humor is irreducibly social; as Zargham et al put it, “For humor to be delivered at the right moment and in an appropriate situation, the agent must possess adequate prior knowledge (e.g. about the user and their environment), emotional aptitude, situational understanding, and cultural sensitivity. This requires agents to be proactive at times in order to get the timing right.” (3). This is precisely the sort of thing language models are not good at: because they do interact only with the text submitted to them, or ingested as part of their training data, they possess at best accidental situational awareness. Even if one were to describe a situation in (painful) detail to the model, it could only ever produce a generalized image of that model, a blurry jpeg of it. In this sense, the difficulty LLMs have with humor can be seen as an aspect of the vector grounding problem, that is, their lack of referentiality.
I mean, it tries!
But meh.
The explanation for the long-term relationship is disclosive, not because it made the joke funny, but because it shows the way the model is trying to emulate humor by putting two different things together, and how that either doesn’t work, or shows us something else: the eternal sunshine of the spotless language model doesn’t explain why anything about the length of relationships is relevant to going on a date!
I feel like it’s time to drop the subject.
Zargham et al conclude that the social embeddedness of humor means that LLMs and other conversational assistants are unlikely to develop the ability to use humor, and not just for technical reasons. At the very least, surmounting those technical reasons is likely to involve unacceptable intrusions into privacy, since those are the only way models could gain adequate situational awareness. Where humor is most likely to happen is incidentally – when the model does something that is incongruous or incorrect in a weird or funny way.
But it has real trouble generating language that deviates from the norm.
Recent Comments