The self-correcting nature of science : Revista Pesquisa Fapesp

A report released last month by the National Academies of Sciences, Engineering, and Medicine in the United States outlined a number of ways to enhance the credibility of scientific research, with a focus on reducing the number of studies whose results are never reproduced or replicated. Titled “Reproducibility and Replicability in Science,” the 196-page document was commissioned by the National Science Foundation, the leading research funding agency in the USA, and is based on a year and a half of discussions held by a multidisciplinary committee of 13 researchers.

According to physician Harvey Fineberg, former dean of the Harvard School of Public Health and chair of the committee that wrote the report, confirming the results of a previous study facilitates the self-correcting nature of good science and is of utmost importance. “However, factors such as lack of transparency of reporting, lack of appropriate training, and methodological errors can prevent researchers from being able to reproduce or replicate a study,” he said. “Research funders, journals, academic institutions, policymakers, and scientists themselves each have a role to play in improving reproducibility and replicability by ensuring that scientists adhere to the highest standards of practice, understand and express the uncertainty inherent in their conclusions, and continue to strengthen the interconnected web of scientific knowledge—the principal driver of progress in the modern world,” said Fineberg when the report was announced.

Before offering its recommendations to researchers, institutions, and funding agencies, the committee members felt they must first precisely define the terms “reproducibility” and “replicability,” which are not always fully understood and are often used interchangeably. Reproducibility, according to the report, means achieving the same results using the same premises and input data as the original study. The concept is strictly computational—researchers attempting to reproduce the results simply need access to the data set and details on how it was stored and analyzed.

The main recommendation in this regard is to make transparent and available not only the underlying data of the initial study, but also the methods, code, models, algorithms, and software used to reach the result—even knowing the operating system and hardware architecture used in the study can be helpful. The document makes additional suggestions, such as ensuring researchers are properly trained on computational research practices, providing ways for large data sets to be stored and made available for subsequent studies, and investing in research and development of computational tools and methods that can improve the rigor and reproducibility of scientific papers. It also recommends that scientific journals reinforce policies and actions that facilitate the reproducibility of the experiments described in articles they publish. “Journals could employ a reproducibility editor to oversee this effort,” said space engineer Lorraine Barba, from George Washington University in Washington DC, USA, who was part of the multidisciplinary committee.

The concept of replicability is more nuanced, defined as obtaining consistent results across more than one study aimed at answering the same scientific question, but based on different input data. A study may not be replicable for a variety of reasons. When the cause is fraud or bias, there has usually been some form of scientific misconduct involved. But a lack of replicability can also be caused by uncertainties inherent to the research, and in these situations, it is not possible to say for sure that the original study was wrong. “Because of the intrinsic variability of nature and limitations of measurement devices, results are assessed probabilistically, with the scientific discovery process unable to deliver absolute truth or certainty,” says the report, going on to highlight one of the main functions of replicability studies: “Scientific claims earn a higher or lower likelihood of being true depending on the results of confirmatory research.”

Useful conclusions
According to the document, even findings that an experiment is not replicable can be helpful to science if previously unknown effects or sources of variability are identified, for example. If occurrences of nonreplicability are unavoidable, trying to reduce their frequency is important to combat wasted time and resources. The report thus suggests that researchers disclose their findings fully and meticulously. “They should take care to estimate and explain the uncertainty inherent in their results, to make proper use of statistical methods, and to describe their methods and data in a clear, accurate, and complete way,” said the document, which also warns against making exaggerations when disclosing research results so as to avoid generating false expectations.

The writers of the report criticize the overuse of the so-called p-value, a measure that determines the probability that an observed effect is caused by chance and not by the factors being studied. A p-value of less than or equal to 0.05 is frequently used as an indicator of statistical significance, suggesting that the results are robust. According to the committee, a whole set of parameters, including proportions, standard deviations, and distributions of observations should be measured to evaluate the rigor of the data and any uncertainties it contains.

The report also suggests that confidence in research results should be increased by evaluating cumulative evidence from a number of scientific papers (helping identify the extent to which the findings can be generalized) rather than single studies. Similarly, it recommends that authorities and policymakers should be wary of discrediting or dismissing a conclusion corroborated by multiple papers due to new contrary evidence from a single study. Harvey Fineberg believes the idea that science is experiencing a credibility crisis due to an increasing number of nonreproducible or nonreplicable research is an exaggeration. “There is no crisis, but there is also no time for complacency,” he said.

Republish