Imprimir Republish

GOOD PRACTICES

The poison that is also an antidote

Artificial intelligence, which poses unprecedented challenges to scientific integrity, is also behind many of the tools used to defend it

Angel Octavio Burguette Morales / Getty Images

The first annual statement by the UK Committee on Research Integrity (UKCORI), released in July, highlighted that advances in artificial intelligence (AI) have generated new challenges, such as how to identify academic work written by ChatGPT, but are also creating opportunities to increase scientific productivity and combat misconduct. “Tools that use AI can enhance research processes,” the committee members wrote in the report, referring to the growing use of AI by scientific journal editors to speed up the process of analyzing and selecting articles or to detect subtle clues that images have been manipulated or attempts to deceive antiplagiarism software.

The UKCORI statement highlights that AI can also help organizations that promote research integrity gain access to data of interest that would otherwise be difficult to obtain — the report itself includes a table on open science indicators, such as the growing extent to which data and code is being shared by researchers on public repositories, which were produced by the PLOS journal collection using AI. The UK Committee on Research Integrity is an independent body created in 2022 to promote good scientific practice in the United Kingdom, with ties to UK Research and Innovation (UKRI), the country’s biggest science funding agency.

Large language models, used in programs like ChatGPT to identify patterns in how humans connect words, numbers, and symbols, can also be useful for tracking signs of misconduct. At the end of May, a team led by Dmitry Kobak, a data scientist from the University of Tübingen in Germany, shared an atlas on the bioRxiv preprint platform of all the biomedical literature published around the world between 1970 and 2021. To create the enormous circular map, which vaguely resembles an image of a Petri dish colonized by bacteria, Kobak first had to download the abstracts of 20.6 million articles from PubMed, a search engine used to access the MEDLINE database of biomedical literature.

The group used an AI language model called PubMedBert to aggregate articles with similar characteristics or terms. Groups of papers with convergent content were named “neighborhoods,” which can be seen in detail by zooming in on the atlas.

The map allows users to analyze all manner of trends in the literature, such as the genders and origins of authors in each group, but it also has the potential to detect misconduct in a more efficient way than other currently available methods. The researchers analyzed 11,756 articles from the atlas that were retracted due to the discovery of errors, fraud, or plagiarism that compromised the veracity of their content. The abstracts were flagged as retracted by PubMed to warn readers that they are not considered valid scientific literature.

Although they were spread across the entire map, they were often located within the same neighborhoods, forming what the authors called islands of retracted articles with shared themes, such as research into cancer drugs, marker genes, and microRNA functions. These topics are often the subject of fraudulent work produced by paper mills—illegal services that produce manuscripts using falsified data or images, sell authorship to interested researchers, and even help submit them for publication on behalf of their clients.

The team checked other non-retracted articles from the same islands and found 25 that may also have been produced by paper mills but had not yet attracted any attention. They exhibited similar characteristics to the fraudulent articles, such as titles with an identical pattern or author affiliations with hospitals in China. Analyzing these islands could help journal editors and universities investigate studies that may have escaped their scrutiny. “But clusters of similar papers would need further screening to avoid wrongly flagging genuine papers,” Jennifer Byrne, a professor of molecular oncology at the University of Sydney in Australia and an expert on scientific integrity, told the journal Science.

Another promising front opened by AI is the identification of predatory journals, meaning those that publish articles in exchange for money, without carrying out a rigorous peer review. A group of computer scientists from National Yang Ming Chiao Tung University in Taiwan has developed a journal verification system called AJPC that is based on machine-learning technology. Data were collected from 883 titles identified as predatory journals on two lists available online and from another 1,213 journals deemed trustworthy according to a compilation by the Berlin Institute of Health — the latter is used to advise authors seeking suitable journals in which to publish their articles.

The group extracted information that helped them identify words and terms characteristic of predatory journal websites. They found, for example, that the websites of these journals tend to overemphasize terms such as “peer review” and “indexing,” while legitimate titles mention these standard publication concepts more sparingly. They then tested eight different machine-learning algorithms with the potential to distinguish between predatory and legitimate journals. The “random forest” algorithm performed best, analyzing a sample of 167 websites with the highest success rate and only two false negatives. “Results from performance tests suggest that our system works as well as or better than those currently being used to identify suspect publishers and publications,” the Taiwanese researchers wrote in an article describing the AJPC system in the journal Scientific Reports. They stressed that the conclusions are merely indicative, and human verification is needed to complement the AI analysis. One of the next steps will be to train the system to identify fraud in conference proceedings, which have different characteristics from predatory journals. The system is already being used at universities in Taiwan.

Republish