Blind dates : Revista Pesquisa Fapesp

A study described in the journal Scientific Reports presented a method for identifying articles produced by paper mills—fraudulent services that sell scientific manuscripts, often generated by artificial intelligence and based on fabricated data, and submit them to journals on behalf of their clients. Instead of looking for evidence of plagiarism or manipulated images, which are the most common traits of fake articles, the model focuses primarily on the relationship between the authors, in which atypical patterns may indicate a spurious origin.

The main premise is that when researchers pay to have their names included in falsified studies, unlikely partnerships are created that differ greatly from those established in the real world, such as connections between young researchers and their former advisors or longstanding collaborations. The new approach tracks a series of unusual or suspicious author attributes: whether they are very young and at the same time highly productive (more than 20 papers published in a year); whether or not they are connected to senior researchers and tend to collaborate with other young researchers; and whether they participate in networks that form randomly and quickly disband. It is also common for fabricated articles to have a larger author list than the average for their subject area, because paper mills make more money by selling the same manuscript to multiple clients.

Further signs of fraud can be found in bibliographies (because fake articles contain low-quality or repeated content, they tend to cite other fake articles instead of established literature) and in the profile of the journals chosen for publication (journals with a robust body of reviewers or that carry out open peer review, in which the review is public, are more difficult to deceive or co-opt).

The results obtained by the model, which analyzed papers indexed in the Dimensions database, were compared with other methods of detecting paper mill activity. Researchers with suspicious profiles were identified by the new method in 7.43% of the 1,858 articles indexed between 2020 and 2022 in the Retraction Watch database, a website that compiles papers that have been retracted for various reasons—including for being created by a paper mill.

It also found that studies linked to suspicious author networks were linked to 37% of the articles identified as fraudulent by the Problematic Paper Screener, a tool launched in 2022 that identifies so-called tortured phrases—poorly translated expressions that suggest the use of artificial intelligence to repeatedly translate the text and thus reduce the similarity of the content in efforts to circumvent plagiarism detection systems. The problem is that these texts can become incomprehensible: the term big data, for example, can end up as the meaningless expression colossal information. The tool was created by a group led by French computer scientist Guillaume Cabanac of the University of Toulouse (see Pesquisa FAPESP issue nº 317).

The convergence between the results of the new model and the Problem Paper Screener varied from country to country. Of the 345 papers by Saudi Arabian researchers that contained tortured phrases, 317 (or 92%) were also flagged as anomalous by the new tool. The match rate was 74% in articles by researchers from Iran, 44% for the USA, and 25% for China. In absolute numbers, the highlight was India, with 773 studies linked to anomalous author networks and 46% of the 1,666 containing distorted expressions, as identified by Cabanac.

In 2022, the Committee on Publication Ethics (COPE) estimated that 2% of all manuscripts submitted for publication are generated by paper mills. According to the new model’s creators, information scientist Simon Porter and epidemiologist Leslie McIntosh, some publishers were more exposed to fabricated papers than others. Hindawi, a publisher owned by Wiley whose journals made 8,000 retractions due to peer review fraud in 2023, has the highest risk profile: in 2022, 4% of the articles it published were associated with paper mills. Since the scandal, Wiley has dropped the Hindawi brand and incorporated all of its titles. The publisher MDPI is also significantly affected, with a manufactured article rate of 3%.

Porter and McIntosh are vice presidents of the technology company Digital Science, which is linked to the Springer Nature group. According to them, the problem appears to be growing more prevalent. “From 2018 there is transition in behavior that doubles the relative occurrence of unusual researchers over a four-year period,” they wrote. Paper mills have ways of trying to hide signs of fraud. Some have begun adding the names of renowned researchers as authors of the articles, without their knowledge, to create the illusion of a respectable network of collaborators—but this comes with its own risk that the scam will be discovered, revealing that the paper is fabricated. Another trick paper mills use, according to the pair of researchers, is to financially co-opt scientists with modest careers to sign the fabricated articles, making them appear less suspicious. This, however, increases the costs of the fraudulent services.

The rise of artificial intelligence is sure to make fraud more sophisticated and difficult to detect based solely on textual and graphical analysis. This makes the new tool even more important, since it can help to track a broader class of articles with atypical profiles. “By understanding the business of paper mills—the technological approaches that they adopt, as well as the social structures that they require to operate—the research community can be empowered to develop strategies that make it harder, or ideally impossible, for them to operate,” the authors wrote.

Republish