Imprimir Republish

Good practices

Study shows potential for ChatGPT to fabricate and defraud clinical trial results

Italian researchers have shown that the AI technology behind ChatGPT is capable of generating false clinical trial data to support the conclusions of fraudulent scientific articles. In a paper published in the journal JAMA Ophthalmology on November 9, a group led by ophthalmologist Giuseppe Giannaccare of the University of Cagliari, Italy, used GPT-4, the most recent version of the ChatGPT language model, with Advanced Data Analysis (ADA), which produces statistical analyses and data visualization. The researchers used the AI tools to fabricate clinical trial data for two corneal transplant approaches used to treat a disease called keratoconus.

Based on specific prompts, the models generated a certain statistical difference in post-operative exams. The simulated clinical trials included 160 male and 140 female participants and concluded that one surgery is more effective than the other, although this is not true. A real clinical trial, carried out in 2010 with 77 participants, showed that the results of the two methods are similar up to two years after surgery.

Giannaccare told Nature that the aim of the study was to show that in a matter of minutes, it is possible for the AI model to produce scientific results that appear convincing but are not supported by real information and may even be contrary to the evidence. “If you don’t look closely, it is difficult to identify that it is not of human origin,” said the surgeon.

“It seems like it’s quite easy to create data sets that are at least superficially plausible,” said Jack Wilkinson, a biostatistician at the University of Manchester, UK, who analyzed the fake trials at Nature’s request. He and colleague Zewen Lu were only able to find inconsistencies in the results after carrying out a thorough examination. There were discrepancies, for example, between the names assigned to patients and the sex that would be expected for these names. The volunteers’ ages were also clustered in a way that would be unlikely in a genuine experiment, with a disproportionate number of participants whose ages ended with the numbers 7 or 8. Another problem was the lack of correlation between the patients’ pre- and post-operative eye exam results. As part of a collaborative project, Wilkinson is developing AI tools to detect this type of problematic study.

Republish