I’ve been exploring some Derridean implications of the distributional understanding of meaning in language models (one, two, three, four), following a couple of papers by Lydia Liu that situate an important strand of LLM development in Wittgenstein.  From there, I’ve argued that a good Derridean contribution is in seeing the politics behind the non-Wittgensteinian view – that “Platonism” is the project of assigning metaphysical labels to a preference for voice over writing, even as that preference is a political decision that cannot be metaphysically justified.  Thus for Derrida the relevant Platonic move is to use the preference for voice as a representation of the eidos over writing as a bad pharmakon is really a distinction between two forms of writing and a preference for the former.  Here I’ll say some more about what I take it Derrida is doing and then get back to language models.

As Derrida explains it, the distinction between speech as good writing and writing as bad writing (one can see Plato’s problem!) amounts to a distinction between dialectics and grammar, which I’m sorry to report needs to be quoted at length:

“What distinguishes dialectics from grammar appears twofold: on the one hand, the linguistic units it is concerned with are larger than the word (Cratylus, 385a-393d); on the other, dialectics is always guided by an intention of truth. It can only be satisfied by the presence of the eidos, which is here both the signified and the referent: the thing itself. The distinction between grammar and dialectics can thus only in all rigor be established at the point where truth is fully present and fills the logos. But what the parricide in the Sophist establishes is not only that any full, absolute presence of what is (of the being-present that most truly "is",: the good or the sun that can't be looked in the face) is impossible; not only that any full intuition of truth, any truth-filled intuition, is impossible; but that the very condition of discourse–true or false-is the diacritical principle of the sumploki. If truth is the presence of the eie/os, it must always, on pain of mortal blinding by the sun's fires, come to terms with relation, nonpresence, and thus nontruth. It then follows that the absolute precondition for a rigorous difference between grammar and dialectics (or ontology) cannot in principle be fulfilled. Or at least, it can perhaps be fulfilled at the root of the principle, at the point of arche-being or arche-truth, but that point has been crossed out by the necessity of parricide. Which means, by the very necessity of logos. And that is the difference that prevents there being in fact any difference between grammar and ontology” (Dissemination, 166).

Again, a few comments to help bring out what I think Derrida is getting at:

(1) The parricide is an elaborate metaphor and refers to some comments made by the Stranger in the Sophist.  The act of parricide refers to the establishment, against Parmenides, that “what is not, in some respect has a being, and conversely that what is, in a way is not” (241d, quoted p. 164).  In other words, presence and absence are not pure categories, and it will therefore turn out to be impossible to establish an ontological priority of one over the other: talking, at least, means that presence and absence are co-constituted as conditions of discourse itself.  This is what Derrida means by differance, “the disappearance of any originary presence, [which] is at once the condition of possibility and the condition of impossibility of truth.” In other words, “what is not what it is, identical and identical to itself, unique, unless it adds to itself the possibility of being repeated as such. And its identity is hollowed out by that addition, withdraws itself in the supplement that presents it.”  So, “the true and the untrue are both species of repetition” (168). The key to the passage is thus the sumploki (συμπλοκή) – which refers to mingling and mixture, intertwining – and the basic point that everything is always already mixed up.

(2) The invocation of the withdrawal of originary presence, and the possibility of repetition that lies behind it, recalls Derrida’s use of “iterability” to describe the repeatability of a word, and the consequent inability to reduce a word’s meaning to its context.  I’ve talked a lot about this in the context of LLMs (first round here, here and here), so here I just want to add that this is the same kind of point Liu sees Masterson developing out of Wittgenstein: if you want to understand how word meaning works, at least in order to get a computer to do it, you need to drop the OED model of “word plus variant definitions” and pick up a “thesaurus” model where words cluster together and the meaning exists at that cluster, such that no particular definition exhausts all possible uses of a word.  It could always be iterated anew. 

Citing Hegel’s account of writing as the “practical exterior activity” that “comes to the aid” of spoken language, Derrida remarks that “this classical motif carries along with it the condemnation of all mnemotechniques, all language machines, all the supplementary repetitions which cause the life of the spirit, living speech, to emerge from its interior [de toutes les mnémotechniques, de toutes les machines à langage, de toutes les répétitions supplémentaires qui font sortir de son dedans la vie de l'esprit, la parole vivante].”  He adds that “such a condemnation paraphrases Plato” (Margins, 94 n23 / Marges 110 n13).  As I’ll talk about in a future installment, the connection between the condemnation of writing machines with a perceived, unified interiority of a speaking subject is important.  To put it differently, to the extent that LLMs are speaking machines, they show that language can be produced without such a unified interiority, because language can be produced from other language.

(3) On Plato’s account, the distinction between dialectics (good) and grammar (bad) is signaled by an attitude toward truth.  To import a modern idiom, dialectics is not bullshit.  The dialectician/philosopher cares about truth and philosophy.  Writing, not so much.  Language models don’t have attitudes or intentions, so the thought that language models are bullshitting (as an explanation of their tendency to confabulation/hallucination) is therefore profoundly correct in this sense.  The critique of language models on this point is however necessarily political: the problem is not that the model says something predictively plausible but unhinged from reality.  The problem is that it’s not guided by an intention toward truth or representation in a context where we judge that intention to matter.

The qualifying clause is important, as recent work highlights a couple of points.  First, as Daniel Tigard points out, Harry Frankfurt’s original discussion of bullshit also notes that the bullshitter is motivated to hide their indifference to truth.  In other words, they have a political objective.  It’s not clear how to apply this to an LLM, even as it clearly matters.  Second, insofar as bullshitting is confabulation, it’s a narrative strategy that we all use to fill in plausible but unknown details in stories, and to make sense of our world.  This may well have benefits in the case of LLMs, at least sometimes.  The point – as Foucault said about the death of the author – is the politics.  All of that said, the various post hoc efforts to deal with referentiality in language models – ranging from filtering out toxic input data to RLHF – are designed to nudge their performance toward at least the appearance of caring about truth.  That’s why these efforts are also political; more on this later.

(4) When Derrida says there is in fact no difference between grammar and ontology, he is indicating the Platonism I’m referring to in the title of this series of posts: the effort to define an ontology independent of grammar. Deconstructing this Platonism means calling it out for what it is: an effort to use the ontology/grammar or truth/appearance dichotomy to define and police a good versus a bad grammar.  Ontology in this sense is like looking for the mouse under the gray rags and dust.

Thus for a basic setup of Platonism. But why do we care?  In a recent paper advocating for the use of deconstruction – specifically Derrida – in understanding current AI, Mark Coeckelbergh and David Gunkel point to why LLMs generate such a shock to our systems:

“The fundamental challenge (or the opportunity) with LLMs, like ChatGPT or Google’s Bard, is that these algorithms write without speaking, i.e. without having access to (the) logos and without a living voice. In response to this seemingly monstrous problem, contemporary critiques proceed from and reassert logocentric metaphysics with little or no critical hesitation whatsoever” (2226).

A couple of pages later, and citing Derrida’s Limited, Inc., they add:

“A text—whether it is written by a human writer or artificially generated by an LLM like ChatGPT (with the help of a human prompt)—comes to have meaning not by referring and deferring to some transcendental signified (what Aristotle would call thoughts or the things to which thoughts ultimately refer). It comes to enact and perform meaning by way of interrelationships to other texts and contexts in which it is already situated and from which it draws its discursive resources. It is for this reason that we can say, following Ludwig Wittgenstein (1995, 5.6) that for these technologies the limit of their language (model) mean the limits of their world.” (2228; the Wittgenstein reference is to the earlier Tractatus).

This sort of awareness is critical for having an actual ethico-political discussion:

“Once we recognize this and affirm the primacy of ethics and politics, we can then proceed with a critical and normative analysis of technologies. When we use technologies such as ChatGPT, we need to make sure that the performances, processes, and texts are morally and politically responsive. And fortunately, we can do this without (absolutist) metaphysics …. We can try to make good chatbot technologies and large language models without relying on such a metaphysics, without appealing to transcendental truth or meaning. We can consider what would be good for us and for others, for humans and for non-humans, without relying on a Platonic Idea of the Good. Once we recognize that the ethics and politics of technologies such as ChatGPT is primary, we can and must develop a critical relation to these technologies that does not relying on prefabricated metaphysical prejudices like that which divides the real from appearances.” (2229)

As I suggested in my paper on Cartesianism in AI discussions, we’ve always used language as a way to separate humans from artifacts, and we’ve tended to attach the attribute of humanness to intelligence and subjectivity.  Language models generate language without any of the subjective interiority that we’ve always assumed that the presence of language is a both necessary and sufficient of.  The Derrida becomes relevant because of his interventions into debates in the late 1960s and 1970s about the status of text and language and writing.  The specific issue of Platonism emerges as a political problem: it’s up to us how to sort out how we deal with different kinds of language users and uses, but the Platonic move covers over these inevitably political decisions with the calm veneer of ontology.  That ontology then gives you a preferred solution set, as well as disabling further inquiries by marking them as theoretically off-limits because of the diktat of metaphysics.  That’s why we need to move from a Cartesian to what I called a Hobbesian view of language models.

Next time I want to look at language models and the idea of a unified speaking subject.

Posted in

Leave a comment