Study evaluates the output of computer scientists who publish far more articles than their average colleague : Revista Pesquisa Fapesp

A group of computer scientists from the Federal University of Minas Gerais (UFMG) developed a methodology that is able to identify atypical and questionable behaviors among prolific researchers, those who publish a number of scientific articles way above the average of their peers. In a study published in February in the journal Scientometrics, the team reviewed articles published between 2010 and 2020, stored in a computer science repository, the DBLP (Digital Bibliography & Library Project), and found a few hundred authors who wrote more than 19 articles per year. This performance is significantly above average: out of all researchers with work filed in the DBLP repository, 99% produced less than 10 articles per year.

The publication patterns of the prolific authors were then mapped. It was observed that, in general, their scientific production had a steady growth rate, being evenly distributed among several journals and involving a limited group of contributors. But there were a few disparate cases. These were extremely productive authors — one of them published 127 articles in 2020 — showing certain characteristics: they suddenly doubled or tripled their production in a short period of time, from two to five years, concentrating many articles in a few journals and/or forming wide networks of contributors, some of which had almost 1,000 coauthors.

The people in charge of the study acknowledge that it is not possible, based on their observations to date, to assert that the abnormal behaviors configure any type of misconduct — they intend to evaluate, in future works, if scientific integrity may have been compromised by these authors. However, they did highlight that the discrepancies observed are considerable, and they want to know if the abnormal behaviors they found might be the differentiator between scientists who publish above the average in an authentic manner, and those who artificially boost their production. “We have come up with a set of metrics that assess the phenomenon, and we believe they may apply to several disciplines,” stated Edré Quintão Moreira, a computer science PhD student at UFMG and lead author of the study.

One of the discussions raised by the study is that some of the authors use tricks to boost their performance, such as simulating collaboration networks that do not actually exist. “One possibility is that there may be a certain level of collusion between the researchers, adding contributors who did not actually contribute to their papers, so as to expand the performance of the group as a whole,” says Alberto Laender, retired faculty member of the Department of Computer Science (DCC) at UFMG, and one of the authors of the study. The concentration of publications in certain journals also raises a red flag. “One of the most prolific authors published more than 140 articles in a single journal. A question that naturally arises is that these authors would be benefitting from publisher policies that are less stringent for certain works, and they rely on the encouragement or negligence of publishers to increase their productivity,” observes Laender. The interest of the UFMG group in this topic therefore makes sense. “I was a member of the Computer Science Advisory Committee of the CAPES (Brazilian Federal Agency for Support and Evaluation of Graduate Education). We observed several cases of hyperprolific authors that seemed suspicious, but we were not able to verify if the information they presented was solid,” says Wagner Meira Junior, also a member of the DCC group that underwrites the study.

The behavior of the hyperprolific authors has been perplexing researchers for a long time. In a study published in 2018, John Ioannidis, an epidemiologist from Stanford University, found among the files of the Scopus repository the names of authors who had published at least 72 articles in a given year between 2010 and 2016. The number was as high as 9,000 individuals who had signed at least one article every five days. Ioannidis found only anecdotal evidence of misconduct, such as the case of the Japanese materials scientist Akihisa Inoue, former dean of the University of Tohoku, who published 2,566 articles — seven of which were considered contentious due to duplicate content. In 86% of the cases, the prolific authors were physicists who were members of international joint ventures, whose articles are signed by hundreds of contributors, sometimes thousands. The examples associated with these large networks were disregarded by Ioannidis, who submitted questionnaires to the remaining 269 names and received 85 replies. Based on his findings, there are indeed researchers who are able to write a large number of articles without causing ethical deviations, although there is no guarantee that such production is relevant. He noticed, however, that in certain cases, productivity was associated with lower standards for authorship attribution in some disciplines.

According to Sigmar de Mello Rode, a researcher from São Paulo State University (UNESP) and chairman of the Brazilian Association of Scientific Editors (ABEC Brazil), it is still common practice in some areas to include, among the authors of an article, names that are not qualified to sign, which represents misconduct. “It happens in all the areas of knowledge: two researchers sign each other’s articles to expand their scientific production. The publisher of a journal does not have many tools available to detect this type of collusion, much less if they occur at a large scale and involve many researchers,” he states. In the case of the study of computer scientists, Rode also raises suspicions about the so-called “salami science” production, where the results of a research project are sliced into several less significant findings to give rise to multiple papers. “This type of trick is easier to identify, for example, using software that searches for similarities among papers,” he affirms.

In computer science, the propagation of highly productive authors is a recent phenomenon. According to the Scientometrics study, in 2010, only 38 researchers, equivalent to 1% of the researchers registered in the DBLP, published more than 19 articles. In 2020, on the other hand, this performance was seen among 540 authors, 6% of those registered in the repository. While the most prolific computer scientist in 2010 produced 37 articles during that year, one of his peers in 2020 published an astonishing 127 articles. “Around 2016 and 2017, there was a sudden increase in these numbers,” states Marcos André Gonçalves, DCC faculty member, also responsible for the study. “It is very unusual to publish two articles in the same week. Such output usually does not lead to a quality result. I can barely read two articles in a week.”

Despite identifying parameters to define abnormal behaviors, the UFMG group admits that other dimensions of the issue may arise. “A characteristic of this work is that it is lively and dynamic. We must be mindful of new factors that may drive this phenomenon. I would bet, for example, that the arrival of ChatGPT will lead to a new wave of hyperprolific authors,” says Meira Junior, speaking of the famous artificial intelligence program that is also being used to support scientific writing. For the researcher, there is a risk that such practices become widespread. “If this actually happens, young investigators could construe that, instead of following the traditional path based on work and effort, there is more value in using tricks to boost one’s individual performance, despite corrupting the system.”

Scientific article
MOREIRA, E. et al. The rise of hyperprolific authors in computer science: Characterization and implications. Scientometrics. vol. 128(5), pp. 2945–74. mar. 15, 2023.

Republish