By Gordon Hull
There’s been a lot of concern about the role of language models in research. I had some initial thoughts on some of that based around Foucault and authorial responsibility (part 1, part 2, part 3). A lot of those concerns have to do with the role of ChatGPT or other LLM-based product and how to process that. The consensus of the journal editorial policies that are emerging is that AI cannot be an author, and my posts largely agreed with that.
Now there’s news of a whole other angle on these questions: a research letter in JAMA Ophthalmology reports that the authors were able to use ChatGPT-4’s Advance Data Analysis capabilities to produce a fake dataset validating their preferred research results. Specifically:
“The LLM was asked to fabricate data for 300 eyes belonging to 250 patients with keratoconus who underwent deep anterior lamellar keratoplasty (DALK) or penetrating keratoplasty (PK). For categorical variables, target percentages were predetermined for the distribution of each category. For continuous variables, target mean and range were defined. Additionally, ADA was instructed to fabricate data that would result in a statistically significant difference between preoperative and postoperative values of best spectacle-corrected visual acuity (BSCVA) and topographic cylinder. ADA was programmed to yield significantly better visual and topographic results for DALK compared with PK”
This is a very technical request! It took a bit of tweaking, but soon “the LLM created a seemingly authentic database, showing better results for DALK than PK,” P < .001.
The authors suggest some possible strategies to manage this but suffice it to say it is terrifying. There is already a longstanding, huge problem with fabricated, doctored or otherwise bogus scientific research out there. One report suggests that 70,000 “paper mill” (= almost completely faked) papers were published in the last year alone. In real papers, references are often inaccurate. Publishers already are having to grapple with lots of problematic doctored images, and Pharma has long tilted the entire scientific enterprise to produce results favorable to its products. At the end of last year, Stanford’s president was forced out over research misconduct in his labs. In an initial report into the Stanford investigation, STAT News reported data from Retraction Watch to the effect that a paper is retracted, on average, every other day for image manipulation. Retraction Watch had, at that time (Dec. 2022) 37,000 papers in its database. The top 5 most-retracted authors have at least 100 retracted papers each.
Into that mess, enter the ability to generate bespoke data on demand.
Recent Comments