The slow path to self-correction in science : Revista Pesquisa Fapesp

In a paper posted on MetaArXiv in February, a group of researchers from the USA, UK, Germany, and the Netherlands followed what happed to four influential psychology articles after their conclusions were challenged by new experiments. Despite the damage to their credibility, the papers continued to be cited at a similar rate in other manuscripts—in most cases, the fact that the results could not be repeated was simply ignored. There was, however, a slight drop in positive citations of the studies and a slight increase in negative citations. Of the papers that referenced the articles and acknowledged that the results had been questioned, only half presented arguments or evidence to defend the original findings.

The analysis suggested that replication studies, which are conducted to confirm discoveries and are considered essential to revealing errors, may not immediately trigger science’s self-correction mechanisms. The manuscript shared on MetaArXiv is a preprint, meaning it has not yet undergone peer review. The study was led by epidemiologist John Ioannidis of Stanford University, USA, an expert in scientific integrity.

The decision to look at psychology papers in particular was deliberate. In the past decade, a significant number of scientific articles in the field have fallen into disrepute because scientists have been unable to repeat their results in subsequent experiments. This led to what has been called the “replication crisis in psychology.” Among the initiatives established to tackle the problem, there was an effort to submit high-impact papers, such as the four analyzed by Ioannidis, to more rigorous scrutiny.

One of them was published in 1988 by Fritz Strack, now professor emeritus at the University of Würzburg in Germany. The results seemed to corroborate a hypothesis put forward by American philosopher and psychologist William James (1842–1910), which postulated that a person’s facial expression directly affects their emotional state. Strack asked study participants to hold a pen in their mouths to force them to make one of two different types of expression: a smile or a frown. Then he made the subjects watch cartoons. He concluded that smiling participants found the cartoons funnier than those who had to frown.

In 2016, the study was reappraised by the Association for Psychological Science’s Registered Replication Reports initiative, which used the same cartoons, but was unable to reproduce Strack’s results. The German psychologist has defended his research and argues that in the replication study, the volunteers knew that they were being observed and filmed. This, according to him, may have changed their behavior.]

In the past decade, several psychology articles have fallen into disrepute after being challenged by subsequent studies

Another article on the list was published in 1998 by American social psychologist Roy Baumeister, who identified limits to our capacity for self-control. Baumeister and his colleagues intended to show that people found it more difficult to complete complex tasks, such as doing a jigsaw puzzle, soon after resisting the temptation to eat chocolate. Similarly, individuals found it more difficult to complete the tasks shortly after having to give speeches defending ideas that go against their beliefs. The findings reinforced the idea that “ego depletion” occurs when a person has to exercise high levels of self-control. This effect was reappraised in a 2016 study with 2,141 participants, which found no evidence of ego depletion. But according to the analysis by Ioannidis, this had no impact on citations of the original paper.

The other two articles were related to how exposure to certain situations influenced participant responses to subsequent stimuli. In 2013, Eugene Caruso of the UCLA Anderson School of Management, USA, published a paper theorizing that exposing people to money impacted the extent to which they defended free market principles. Similarly, Travis Carter of the University of Chicago concluded that showing people images of the US flag made them more politically conservative. Both studies were reappraised by an initiative called Many Labs, using the original methodology with a larger number of participants. The results were not reproduced.

Ioannidis suggests that some of the authors may have cited the studies simply because they did not know that the results have been disputed. This has even occurred with papers that have been retracted due to errors or fraud, yet still end up being cited by unsuspecting researchers. The disconnect between original results and replication studies is not a new problem. In 2012, Harvard University psychologists Joshua Hartshorne and Adena Schachner published an article in the journal Frontiers in Computational Neuroscience proposing the creation of databases to link original studies with research aimed at replicating their findings.

Olavo Amaral, a physician and professor at the Institute of Medical Biochemistry of the Federal University of Rio de Janeiro (UFRJ), highlights another aspect: the possibility of researchers suffering from confirmation bias, selecting arguments and evidence that corroborate their beliefs. “It’s not uncommon for people to cite papers that interest them to prove a point instead of doing a thorough review of the existing evidence. Perhaps that’s why scientific consensus is so often divided,” says Amaral. He heads the Brazilian Replication Initiative, a project funded by the Serrapilheira Institute that plans to repeat a hundred experiments from Brazilian biomedical articles to verify how many of the published results it is able to replicate (see Pesquisa FAPESP issue no. 267).

A 2007 study that examined this issue in medicine was led by Ioannidis himself and published in the Journal of the American Medical Association (JAMA). The researchers analyzed references to observational studies that identified benefits of vitamin E for the heart, beta-carotene against cancer, and estrogen against Alzheimer’s disease—all cited after the results had been refuted by randomized clinical trials. They found that after the observational study results were contested, there was a very slow fall in the citation rate, while many researchers continued to positively reference the rejected results.

Republish