By Gordon Hull
Large Language Models (LLMs) are well-known to “hallucinate,” which is to say that they generate text that is plausible-sounding but completely made-up. These difficulties are persistent, well-documented, and well-publicized. The basic issue is that the model is indifferent to the relation between its output and any sort of referential truth. In other words, as Carl Bergstrom and C. Brandon Ogbunu point out, the issue isn’t so much hallucination in the drug sense, but “bullshitting” in Harry Frankfurt’s sense. One of the reasons this matters is defamation: saying false and bad things about someone can be grounds to get sued. Last April, ChatGPT made the news (twice!) for defamatory content. In one case, it fabricated a sexual harassment story and then accused a law professor. In another, it accused a local politician in Australia of corruption.
Can LLMs defame? According to a recent and thorough analysis by Eugene Volokh, the answer is almost certainly yes. Volokh looks at two kinds of situation. One is when the LLM defames public figures, which is covered by the “actual malice” standard. Per NYT v. Sullivan, “The constitutional guarantees require … a federal rule that prohibits a public official from recovering damages for a defamatory falsehood relating to his official conduct unless he proves that the statement was made with ‘actual malice’ – that is, with knowledge that it was false or with reckless disregard of whether it was false or not” (279-80).
The LLM obviously doesn’t think, but it’s a product produced by a company like OpenAI, and OpenAI could be held accountable if they know the product is producing defamatory content and don’t do anything to stop it. Volokh suggests that something like a notice-and-block mechanism is likely to emerge. The model is the takedown provisions under the DMCA, which allows companies like YouTube to avoid liability for hosting content that infringes copyright if they remove material on being notified that it is infringing (yes, you can argue this causes substantial over-removal of material and is arguably a prior restraint on free speech). In the case of LLM defamation, on notice that the model was producing defamatory speech, the company that makes it would have to block the model from reproducing that content in the future. This would require a layer of output filtering, but it’s fairly clear that OpenAI has already done this – ChatGPT will turn down all sorts of requests now on the grounds that it’s a language model, and it’s really hard to get DALL-E to produce a trademarked logo (the image here resulted from my instructing it to produce a picture of a “consumer comparing Budweiwer and Grolsch brand logos”). So let’s stipulate this is possible.
But will it work? Volokh argues that “taking reasonable steps to block certain output, once there is actual notice that the output is incorrect, should be necessary to avoid liability for knowing defamation. And it should be sufficient to avoid such liability as well” (519). This is not perfect, but he also thinks that “despite its imperfections, it’s likely to be the best remedy that’s available to public officials and public figures, given the requirement that they show knowing or reckless falsehood” (518).
The other kind of situation is when the defamation is of private matters, and there Volokh thinks that some sort of analogy to product liability law’s negligent design standards might emerge. The reasoning goes that if I know my product emits sparks sometimes, and I fail to take measures to mitigate the damage those sparks may do, I have failed to design a safe product. By analogy, if an AI company produces an LLM that’s known to spit out periodic defamation, they should have to show evidence of proactive attempts to address the problem. He suggests a number of possibilities, and here part of the issue is that companies need to wake up to liability issues around defamation. For example, Volokh points out that when OpenAI releases ChatGPT-4 and brags that it hallucinates less than 3.5, “the very existence of the assertedly more reliable GPT-4 is evidence that ‘the product could have been made safer by the adoption of a reasonable alternative design’” (531-2). Again, notice-and-blocking emerges as a minimal standard:
“The notice-and-blocking remedy … should also be required under a negligence regime. Failure to provide a means of preventing repeated publication of known errors – errors that the AI company has been informed about – should likely render the AI program ‘not reasonably safe’ as to the risk of defamation” (531).
I am not sure if notice-and-block will emerge as the legal standard for liability in AI defamation cases. But I worry that, as a standard, it’s likely to amount to a least-bad option, rather than a good one. It seems to me that there’s at least two problems with notice-and-block. The first is that, in most cases, victims won’t know they’ve been defamed. Any number of people who go to the LLM to search for a public figure will get defamatory content, believe it, but not otherwise reproduce it in a way that would allow for legal scrutiny. The merit of the standard in Sullivan is that it ran through a major newspaper – defamation would be hard to miss. Now, one might argue here that without the equivalent of a major newspaper publication, the harm of defamation is reduced. Sort of a no-harm, no foul. I don’t think that works, however. First, as Volokh points out, defamation doesn’t require traditional publication. Second, if the LLM is functioning as a de facto newspaper – as the place where people learn about the world – then it won’t have to actually be physically spread around. The indefinitely many private viewings could easily cause substantial harm to the victim, even if none of those went viral in the sort of way that would cause public scrutiny.
The other problem is that this is an ad hoc solution to a systemic problem. The LLM will try to generate plausible content in answer to queries, and it will do so without fact checking (unless that is coded in directly). It can generate indefinitely many defamatory statements every single minute. A notice-and-block regime risks becoming a game of wack-a-mole. To be sure, this objection can be raised against the DMCA takedown regime too. But in that case, the damage is limited in at least two ways. On the one hand, the dependence on actual, hand-created content or mash-ups limits the rate at which offending content can be created (of course, AI videos may require this discussion to be updated soon). On the other hand, much of the content that is taken down does not actually damage the interests of the copyright owner, no matter how many times it is viewed. A video of a Toddler dancing to Purple Rain isn’t going to hurt record sales very much, even if it’s viewed a million times. The same can’t be said for defamatory content.
One thing that’s happened with the takedown regime is the deputization of algorithms and bots that continuously scan YouTube and other places for signatures of material that copyright owners want to see protected; if the algorithm thinks it detects material, it sends a takedown notice. One could imagine something like that here – an AI or other algorithm is programmed to continually try to get another AI to issue defamatory content, which is then sent to the notice-and-block regime. It’s not clear to me this is a good regulatory strategy – but if we get to notice-and-block, it’s the obvious next step.
Recent Comments