The scientific community in the Netherlands is in the midst of a dispute that could have major repercussions on how research quality and researcher performance are evaluated all over the world. The controversy began in June, when Utrecht University, the country’s oldest and top-ranked higher education institution, announced that it was reforming its rules on hiring and promotions, abandoning the use of bibliometric indicators such as the impact factor (IF) in assessments of faculty members’ scientific output. Calculated by the number times a paper is cited by other articles, the IF is used, for example, to determine the prestige of a scientific journal—the company Clarivate Analytics issues its Journal Citation Reports annually, which estimates the average impact factor of more than 10,000 journals. In many disciplines, the IF is considered a good indicator of the repercussions an article has had among specialists in its field. It has even become the basis for other indicators, such as the h index—which combines the number of articles published by an author with the amount of times their work has been cited.
In the new model proposed by Utrecht University, scholars will be evaluated without calculating the number and influence of their papers, instead based on the quality of their teaching, commitment to teamwork, and willingness to share research data. Each department must develop its own performance appraisal strategies, taking the impact on the economy and society into account, as well as the principles of “open science,” meaning practices that promote transparency and collaboration. “We have a strong belief that something has to change, and abandoning the impact factor is one of those changes,” Paul Boselie, a professor at the university’s School of Governance who helped develop the new system, told the journal Nature. According to him, indiscriminate use of the IF has undesired consequences, such as excessive concern with publishing articles and the continuous search for hot topics likely to receive many citations, to the detriment of other important scientific objectives.
The decision caused shock waves in other Dutch institutions, resulting in an open letter signed by 170 researchers opposed to the change fearing that it will be adopted by other universities—one of the signatories was Bernard Feringa of the University of Groningen, who won the Nobel Prize in Chemistry in 2016. The counterargument is that in the absence of objective metrics, hiring and promotion processes will be governed by potentially arbitrary criteria. “The impact factor is an imperfect but nevertheless useful metric,” Raymond Poot, a cell biologist at Erasmus University Medical Center in Rotterdam and co-author of the open letter, told the journal Nature Index.
Jacques Marcovitch, dean of the University of São Paulo (USP) between 1997 and 2001, believes the debate in the Netherlands highlights the advantages and limitations of both approaches. “Bibliometric indicators are rational and objective, but they are known to cause behavioral changes and are incapable of capturing dimensions such as teaching quality in the classroom,” he says. Detailed analysis of a researcher’s scientific and academic contribution, however, is significantly more laborious and poses complex challenges. “Naturally, this is a much longer and more difficult process,” says Marcovitch, who leads a project funded by FAPESP that is developing new metrics to assess the scientific, economic, and cultural performance of public universities in São Paulo.
The Dutch dispute is symbolic because it has established a break from traditional metrics, the overuse of which has long been criticized as reductionist. In recent years, a series of manifestos has proposed different ways of carrying out more comprehensive assessments that have gained supporters around the world. Chief among them is the 2012 San Francisco Declaration on Research Assessment (Dora), endorsed by more than 20,000 researchers and 2,000 institutions in 148 countries, which recommends abandoning the sole use of the journal impact factor in assessments related funding, promotions, and hiring. Another highly-regarded document is the set of guidelines created at the 6th World Conference on Scientific Integrity, held in Hong Kong in 2019, designed to more broadly assess researcher performance and establish career rewards for those who adopt practices that promote scientific integrity (see Pesquisa FAPESP issue no. 303).
Institutions from several countries have been reducing the weight of bibliometric indicators and increasing the importance of qualitative parameters in pursuit of what is known as “responsible metrics.” The University of Glasgow in the UK recently began evaluating “collegiality” among faculty members: high-level promotions are only offered to researchers able to demonstrate that they have supported the careers of their colleagues and assistants by sharing data and coauthoring articles. In the system used in the UK to classify universities and distribute funding, bibliometrics are combined with peer reviews of a selection of each institution’s most significant research—the weight of each element is adjusted every assessment cycle.
China’s approach is also going through changes. Reducing the emphasis on the number of studies published, researchers must now select their best contributions for analysis by expert panels. The Chinese have also announced their intention to develop their own bibliometric indicators that take the regional impact of their research into account.
In November last year, several initiatives related to more complete research assessment were presented at a virtual conference of the Global Research Council, created in 2012 to encourage the exchange of management practices between funding agencies. In some countries, scientists are being urged to provide a structured narrative of their career, expressing their individual contribution rather than listing the number of articles they have published and citations they have received. The Swiss National Science Foundation is testing such a résumé, known as the SciCV, which is easy to complete and update. The UK’s Royal Society has developed a résumé that is divided into four modules: contribution to the generation of knowledge, contribution to the development of individuals, contribution to the research community, and contribution to broader society.
In recent years, a series of manifestos has proposed different ways of carrying out more comprehensive assessments that have gained supporters at universities.
Despite the changes, bibliometric indicators are still a commonly used tool in scientific evaluation. A study published in the journal eLifeSciences in 2019 found that 40% of research universities in the US and Canada mention impact factors or related terms in documents referring to job stability, appraisals, and promotions. A recent case involving the University of Liverpool, UK, highlights the difficulties of attempting to change the culture. The institution is a DORA signatory and claims to be striving to adopt responsible metrics, such as peer reviewing of the work of its researchers. But it has recently come under fire for using financial metrics to choose which 32 professors from its Faculty of Health and Life Sciences to make redundant. To keep their jobs, they must demonstrate that they have been able to attract funding for their research at similar levels to the 24 research-intensive universities in the Russell Group, of which Liverpool is one. A recent editorial in the journal Nature highlighted the controversy at Liverpool as a crossroads for the movement sparked by the DORA.
The search for and adoption of responsible metrics is also advancing in Brazil. In an article recently published in the journal Anais da Academia Brasileira de Ciências, a trio of biochemists launched the manifesto “Responsible Scientific Assessment: Minimizing indices, increasing quality,” which underlines the importance of peer review in identifying the contribution of a study. One of its recommendations is to create reward mechanisms for good reviewers with an in-depth knowledge of a topic who focus their suggestions on how to improve the quality of a peer’s manuscript or research project.
The document also suggests that bibliometric indicators be used sparingly and their limitations taken into account. “Researchers working on the frontier of knowledge cannot be evaluated quantitatively. Quality can only be assessed through a peer review by people with experience,” says Alicia Kowaltowski, a researcher at USP’s Institute of Chemistry and advisor to FAPESP’s Scientific Board, who wrote the manifesto together with Ariel Silber, also from USP, and Marcus Oliveira, from the Federal University of Rio de Janeiro (UFRJ). Responsible metrics, Kowaltowski points out, require a contextual analysis. “The number of citations varies depending on the field and is influenced by other factors—review articles, for example, do not provide any original data but are usually more cited. The context is important,” she emphasizes.
Biochemist Jorge Guimarães, who was president of the Brazilian Federal Agency for Support and Evaluation of Graduate Education (CAPES) from 2004 to 2015 and currently heads the Brazilian Agency for Industrial Research and Innovation (EMBRAPII), is cautious about the dichotomy of quantity and quality indicators. “There is a lot of talk about switching to more qualitative forms of assessment, but nobody really knows which ones should be used,” he says. He rejects the idea that bibliometrics are purely quantitative. “The impact factor measures quality. It shows that someone has read your article and used it as a reference, usually because of your contribution.”
In the 2000s, CAPES created Qualis, a scientific journal classification system used to evaluate graduate programs in Brazil. The system is being revised and is often criticized for determining the importance of an article not based on the number of citations it actually received, but by an indirect parameter: the average impact factor of the journal that published it. The approach has been condemned in manifestos in favor of responsible metrics, but Guimarães explains that there is a reason behind it. “Program assessments take into account the scientific output of the previous four years. In this short period of time, most articles do not get cited many times, so this would not serve as a good measure,” he says. He points out that the weight attributed to each journal was the object of in-depth discussions among the representatives of the assessment committees for each field of knowledge.
Guimarães maintains that indicators must always be carefully interpreted. He observes, for example, that Brazilian science only has a small influence on tropical agriculture, since the results are of regional interest. “But no one has any doubts about its economic and social importance in Brazil. This needs to be considered in the evaluation.” Brazilian chemistry researchers, says Guimarães, produce high quality work. “But there is little transposition between the research conducted in universities and the industrial sector.”
Jacques Marcovitch believes one of the biggest challenges is identifying the types of impact that universities are capable of generating in different fields. “Some metrics are suitable for some disciplines but don’t make any sense for others. While an indicator on patents could be useful in engineering, peer recognition is the primary objective in philosophy,” he explains. Society, on the other hand, expects a different kind of impact from universities. “Society expects results every year, in the form of new students enrolling and well-trained professionals graduating, as well as further education and high-quality research. In the sanitary crisis, this pressure generated enormous stress and universities did their utmost to respond,” he says.
Forty percent of research universities in the USA and Canada consider impact factor when promoting faculty members
The assessment of graduate programs by CAPES is currently changing to include more qualitative aspects. Some metrics, such as the number of professors and the number of master’s and doctoral students, have lost their significance in the four-year assessment ending this year and will no longer affect the scores awarded to each course. They will be considered only as an indicator for maintaining a minimum number of staff on the program. The articles published by professors and students from each program will now be analyzed at three different levels and only the first, which measures total output, will be quantitative in nature. The others will include peer analysis to evaluate a select set of papers by each professor, as well as the program’s best intellectual output—in addition to scientific articles, this can also include technical and artistic work. Information scientist Rogério Mugnaini, from USP’s School of Communication and Arts, believes the changes are interesting. “The idea of making professors and course leaders highlight the most relevant work is good and reduces the weight of publication volume in the program assessment, discouraging the drive to publish for the sake of it,” he says. According to him, it is still too early to estimate the outcome of the changes. “Ideally, these models should be tested in one assessment cycle and then implemented in the next, to see how well they work,” he says. In the future, CAPES plans to profoundly change its evaluation system, analyzing programs in five different dimensions (see Pesquisa FAPESP issue no. 286). According to Mugnaini, the ideal combination of quantity and quality indicators is yet to be tested, but he does not believe that output metrics will be totally abandoned. “Publishing papers is an essential part of being a scientist and I don’t think it’s possible for an assessment model to dispense with that entirely. But it is undoubtedly important to look beyond output alone, to encourage the development of consistent and lasting projects and participation in collaborative networks.”
FAPESP has been refining its research evaluation process to ensure the analysis is based on merit and quality. The main change relates to the terms used in research proposal forms, which serve to reinforce the Foundation’s expectations both for applicants and reviewers. Instead of asking for the applicant’s most impactful articles, books, or patents, the focus is now on their most important scientific results and how the new project could amplify this contribution and expand its reach. “The objective is to centralize the quality of the research proposal and ensure that what is being evaluated first and foremost is its potential contribution,” says Cristóvão de Albuquerque, FAPESP’s head of research collaboration.
Efforts are also being made to improve the peer review process. The Foundation created a video to guide reviewers and help them produce a constructive report. “The aim is for their opinion to help improve the proposal, that way, if it is rejected, it can be refined and resubmitted later,” says Albuquerque. FAPESP, he notes, also allows applicants to include details of their life that may help reviewers understand their contribution. “This is useful, for example, for researchers with young children who have taken periods of time off work,” he concludes.
Performance indicators at state universities in São Paulo, 2022 (nº 19/10963-7); Grant Mechanism Public Policy Research; Principal Investigator Jacques Marcovitch (USP); Investment R$614,583.66.