The calculating cat : Revista Pesquisa Fapesp

Fake research papers attributed to a tabby cat named Larry demonstrated the possibility—and ease—of manipulating certain scientific productivity indicators. The cat in question belongs to the grandmother of computational biologist Reese Richardson, a PhD student at Northwestern University, USA, who led an experiment that named the pet as the author of several scientific papers, highlighting a strategy used by fraudsters to boost academic metrics by exploiting flaws and negligence on ResearchGate and Google Scholar.

Working with British scientific misconduct expert Nick Wise, the biologist created a ResearchGate account in the name of the family pet, Larry Richardson, posing as an early-career mathematician. “Anyone can make a ResearchGate profile, and if you use an academic email address, no additional verification is required,” Reese Richardson explained on his personal blog. He created an email address for the cat on the Northwestern University server so as not to arouse suspicion.

The pair then uploaded 12 manuscripts to the animal’s account with Larry as the sole author—the content was nonsense, and the titles referenced topics such as complex algebra and the structure of mathematical objects. At the same time, they created another 12 fake papers—attributed to fictitious researchers—that cited the cat’s 12 articles in their bibliographies. Since ResearchGate only allows researchers to publish their own articles on their profiles, Larry was also included as a coauthor of these other papers.

Google’s crawler bots visited the ResearchGate profile and counted the citations within just two weeks. A page was then automatically created for the mathematician Larry Richardson on Google Scholar, displaying 11 articles and 132 citations (one of the papers was missed by the scan, for some reason), giving the cat an h-index of 11. The h-index is an established scientific metric that calculates both the scale of a researcher’s output and the interest it has generated. An h-index of 11 means that an author has published at least 11 articles in their career that have each been cited at least 11 times in other papers.

Richardson had the idea for the experiment when Nick Wise told him about Facebook ads for services promising to increase citation counts and h-indexes on Google Scholar. One such ad showed screenshots of the Google Scholar pages of 18 customers “before and after” purchasing the service. The vast majority were mathematicians from India, but there was also one from Oman and one from the USA. Each citation cost US$10, and the customers on display had ordered between 50 and 500 citations each.

In some situations, the citations were made in articles from a suspected predatory journal—the hypothesis being that the company that sells the citations colluded with the journal to publish papers with fake references. Most cases, however, adopted an openly fraudulent strategy. The manuscripts citing the customers were authored by names such as the Greek philosopher and mathematician Pythagoras (570–495 BC) or the Russian mathematician Andrei Kolmogorov (1903–1987). Although the titles and abstracts seemed coherent, the rest of the articles were gibberish. Richardson and Wise realized that the texts were produced by a computer program called MathGen, which combines sequences of words and formulas extracted from genuine articles to compose nonsensical texts.

The fake papers had been published exclusively on ResearchGate profiles, where they are not subject to any form of peer review, and were never shared on preprint repositories, where they would have been scrutinized by other experts. Once Google Scholar had screened and counted the articles and citations on ResearchGate, the manuscripts were removed from the platform to erase any evidence. Just a few were found in Google’s cache. “Despite the conspicuous vulnerabilities of Google Scholar and ResearchGate, the quantitative metrics calculated by these services are routinely used to evaluate scientists,” Richardson said.

This type of fraud only works for metrics on Google Scholar, which exhaustively tracks academic literature online, including research papers published on personal profiles. Databases such as the Clarivate Analytics Web of Science and Elsevier’s Scopus only consider articles from indexed scientific journals, meaning they are not vulnerable to the deceitful method.

Once the experiment was over, Richardson wrote about it on his blog in a post titled “Engineering the world’s highest cited cat, Larry.” The idea of using a feline as an author was not random. Richardson was inspired by theoretical physicist Jack Hetherington, who published two articles and a book chapter in partnership with a certain F. D. C. Willard in 1975. The initials stood for “Felis Domesticus, Chester,” the scientist’s Siamese cat. Hetherington and his pet were cited 107 times—Richardson’s goal was for Larry to surpass that number. But the tabby cat’s stunt only lasted a week. After finding out about the fraud, Google Scholar deleted the articles, although Larry Richardson’s profile remains.

Other researchers have identified signs of this type of fraud in the past. In February, computer scientists Talal Rahwan and Yasir Zaki analyzed more than one million Google Scholar profiles and found that 114 of them featured anomalous citation patterns. “The vast majority had at least some of their dubious citations from ResearchGate,” Zaki told the journal Science.

Ijad Madisch, CEO of ResearchGate, said the company was “aware of the growing research integrity issues in the global research community” and assured Science that it was reviewing its processes. According to Madisch, evidence that fraudulent content is deleted after being indexed by Google will help the social network improve its misconduct monitoring systems.

Republish