By Gordon Hull
I’ve been using (part 1, part 2) a new paper by Fabian Offert, Paul Kim, and Qiaoyu Cai to think more about Derrida’s use of iterability as a way in to thinking about transformer-based large language models (LLMs) like ChatGPT. Here I want to wind that up with some thoughts on Derrida and Searle.
Near the end of the paper, Offert, Kim and Cai summarize that:
“The transformer does exactly more and less than language. It removes almost all non-operationalizable sense-making dimensions (think, for instance of interior paratextuality, of performativity and contingency, or of anything metaphorical that is more complex than a simple analogy) – but it also adds new sense-making dimensions through subword tokenization, word embedding, and positional encoding. Importantly, these new sense-making dimensions are exactly not replacing missing information, but they are adding new, continuous information” (15).
This returns us to the Derridean concerns I’ve been articulating. Recall that in his polemic against Searle, Derrida accuses Searle’s version of speech act theory of too closely modeling phenomenology, both in assuming an intentional agent behind speech acts and in taking “typical” speech situations as central, as opposed to “parasitic” ones like humor. I suggested that there is good evidence that various efforts to impose normative structures on language models – RLHF, detoxification, etc. – push them to perform in ways that call to mind Derrida’s critique of Searle. By taking certain language situations as normal, Searle is making the account of speech acts normative before it even gets started. In his own defense, Searle argues that the reduction to typical speech acts is for convenience only, and for keeping the model tractable. Techniques like detoxification and RLHF similarly reduce the range of the models’ output. Evidence of this is that LLMs lack the contextual richness to have a sense of humor, no matter how otherwise sophisticated their output.
Offert, Kim and Cai’s paper lets one add that this reduction runs much deeper. The very processes of tokenization, for example, are designed to reduce the number of possible tokens to a tractable number. In this respect, the move is analogous to Searle’s. It is defensible for the same reason: it lets you get to a generalizable model. But it’s vulnerable to the Derridean critique, also for the same reason. The model makes a number of assumptions that aren’t the same as what language does. So there is a certain sloppiness in talking as though it’s an accurate representation of language. All models abstract; that’s not the point. For subword encoding, the point is that the abstraction isn’t choosing to ignore certain aspects of reality in order to produce a model, it’s that the abstraction changes the nature of what it is modeling. That’s fine – but that also means that the although the transformer model is producing something that looks like language, the process by which it gets there is definitively not linguistic.
Continue reading "Transformer models, Iterability, and language (part 3)" »
Recent Comments