Bing also hallucinates, even with footnotes

Large Language Models (LLMs) like ChatGPT are well-known to hallucinate – to make up answers that sound pretty plausible, but have no relation to reality. That of course is because they’re designed to produce text that sounds about right given a prompt. What sounds kind of right may or may not be right, however. ChatGPT-3 made up a hilariously bad answer to a Kierkagaard prompt I gave it and put a bunch of words into Sartre’s mouth. It also fabricated a medical journal article to support a fabricated risk to oral contraception. ChatGPT-4 kept right on making up cites for me. It has also defamed an Australian mayor and an American law professor. Let’s call this a known problem. You might even suggest, following Harry Frankfurt, that it’s not so much hallucinating as it is bullshitting.

Microsoft’s Bing chatbot-assisted search puts footnotes in its answers. So it makes sense to wonder if it also hallucinates, or if it does better. I started with ChatGPT today and asked it to name some articles by “Gordon Hull the philosopher.” I’ll spare you the details, but suffice it to say it produced a list of six things that I did not write. When I asked it where I might read one of them, it gave me a reference to an issue of TCS that included neither an article by me nor an article of that title.

So Bing doesn’t have to be spectacular to do better! I asked Bing the same question and got the following:

This is better, in a couple of ways. First, footnotes! I like to know where my information comes from, and one of my fears for the spread of AI-assisted search is that people are going to get lazy (even lazier) about sourcing their information. For self-help, that may not matter, as it’s all pretty much the same anyway and very little of it is based on a reality beyond generic bullet points. But for any sort of socially controversial or complicated topic, you precisely don’t want the “blurry JPEG” version, at least if you want to understand it. Second point in Bing’s favor: some of those are actually things I wrote.

The use of footnotes is odd though. The Hobbes book, for example, is listed on my website (which is the source it’s using. Note that almost everything on the site includes a link and/or a doi, so it could be very precise in response to “where can I find” questions, even though it’s not). I asked it about the (unfootnoted) Hobbes book and it gave me the right answer:

So it gets the publisher right and a decent blurb. But what about the third – the Deleuze Dictionary? I didn’t write that, alas (and if I did, you probably shouldn’t trust it). So I asked about that, and it generated a correct answer by dropping me from the reference:

Ok, so it started to hallucinate but then sobered up. But now here’s the weird part. The fourth answer, which it sources to my university website, is not something I’ve written. It seems to have misread the site in lossy-JPEG style. I did in fact write a paper called the “Banality of Cynicism” on Foucault and the problems with authenticating online information, and I wrote one about library filtering. But those are separate papers, and the website has them separated. So I asked it:

We’re into sort-of hallucination mode. If you wanted library filtering, [1] points to the correct paper on ssrn, On the other hand, the underlined text is not a direct quote from the abstract like the underlining might make you think. It’s not an awful paraphrase, though. If you wanted the “Banality of Cynicism” paper, you’re outta luck, and Bing hasn’t noticed that it’s conflated two papers. One can only speculate as to why it picked the library filtering paper, but my hunch is that paper is more prominent online (it’s both older and has been cited more) and so it’s more likely to be associated with me than “The Banality of Cynicism.” Things go off the rails a bit from here. [2] is labeled ssrn, but actually links to the NCAC. [3], labeled NCAC refers to ICIDR. [4] is labeled ICIDR and sends you to blablawriting. So the numbers across the bottom are correct but the ones in the bullet list are not. That’s just weird.

Also, only the actual ssrn cite answers the original question of where you can read the paper. The ncac.org link uses some of the same examples I do, but I’m not involved. The icidr.org link is a direct link to a paper on internet filtering that doesn’t cite my paper. Blablawriting appears to be a paper mill site, and the cnn link is the news story about the Court case I criticize.

All of which is to say that Microsoft has clearly tried to put some guard rails on the hallucination problem. But they haven’t solved it; the AI search isn’t trustworthy, at least in contexts like this.

New APPS

recent posts

about