Watch out for the tide

Report proposes limits on use of indicators to evaluate science in Great Britain

Metrica_05_2JG0788Léo RamosIn July 2015, the debate on the reliability of quantitative methods to measure the impact of scientific and academic output opened a new chapter with the publication of a report commissioned by the Higher Education Funding Council for England (HEFCE), the agency responsible for financing and evaluating the university research system in England. The outcome of 15 months of work by an independent interdisciplinary team, the document entitled The Metric Tide deals with both the usefulness of and the abuse of indicators in evaluating the merit of research universities and other research groups. In view of the spread of parameters like impact indicators and university rankings, the group suggests they should be used more judiciously. “Metrics need to be chosen carefully, and they should always add to and support the assessment of specialists, not replace it,” says Richard Jones, Dean of Research and Innovation at the University of Sheffield and a member of the panel that wrote the document.

The group presented the notion of a “responsible metric” based on five points. The first point is humility, defined as the recognition that peer review, although imperfect and subject to error, can take into account in broad terms the quality of scientific output, which is something that isolated indicators are still not able to do. The second point is robustness, a requirement that excludes the use of data taken out of context or not sufficiently representative in an evaluation process. According to the report, the emphasis on “narrow and poorly designed” parameters produces negative consequences. One example not to follow is relying on a scientific journal’s impact factor as a measure of the quality of the work it publishes or the merit of the researchers it features. The reason is that these indices merely mirror averages observed in groups of articles published in prior issues. The document also mentions the use of article citations as universal criteria of quality, without taking into consideration the different realities of each discipline.

The third point is transparency, ensuring that the collection and analysis of data remains open and understandable to researchers and laymen. The report criticizes the widespread use of university rankings, arguing that there is a lack of transparency related to the choice of ranking indicators. The fourth point is diversity, meaning the effort to adopt a set of indicators that can encompass the different kinds of contributions that researchers make. Finally, the fifth point is reflexivity, which is understood to mean a concern with quickly identifying undesirable effects of using indicators and the willingness to address them.

“The attraction to metrics is likely only to grow,” wrote James Wilsdon, a professor at the University of Sussex and leader of the panel that wrote the report in the journal Nature. He says that there is an increasing demand for an assessment of public spending on research and higher education, at the same time as the amount of data on scientific output and the capacity to analyze it have grown. “Institutions need to manage their research strategies while at the same time competing for prestige, students and resources.”

This is an especially sensitive topic in Great Britain because every five years, its universities and research groups are subject to a comprehensive assessment process that determines how public money will be disbursed in the next five years. In December 2014, HEFCE published the most recent assessment, the Research Excellence Framework (REF 2014). For the REF, 154 universities submitted 1,911 items in 36 areas of knowledge. Each item submitted represented a set of scientific works, case studies, patents, ongoing research projects, information on researcher performance and bibliometric indicators linked to a research department or group, and it was evaluated by a panel of specialists. The quality of the scientific research counted for 65% of the evaluation; the impact of the research outside of the university setting (a new feature in the REF 2014) was 20%, and the research environment, 15%. The REF concluded that 30% of Great Britain’s universities are global leaders, 46% reach international standards of excellence, 20% enjoy global recognition, and 3% are known only nationally.

Metrica_06_2JG0774LÉO RAMOSDiscrepancies
The team in charge of the report analyzed the REF 2014 data and concluded that individual indicators and peer review analysis do not always produce results that agree. Big discrepancies were observed, for example, in the performance of researchers at the beginning of their careers. Similarly, what indicators cover was uneven among the different areas of knowledge, and in particular, the panel on Arts and Humanities had specific problems in this area. The report recommended maintaining the REF’s current model, based on a qualitative assessment done by specialists who can use carefully selected indicators. It also suggested more investment in “research on research” to deepen understanding of the use of indicators. The group also instituted a “reverse” prize, the Bad Metric Prize, to call attention to the inappropriate use of quantitative indicators. The first winners will be chosen in 2016.

While the report was being prepared, the British scientific community was shocked by a tragedy related to the pressure researchers feel about the metrics. Stefan Grimm, who for ten years was a professor of toxicology at the Faculty of Medicine at Imperial College, committed suicide at the age of 51. He was depressed by the announcement of his firing and left an e-mail in which he related a series of threats he had received from his supervisor if he did not obtain a certain level of financing for his laboratory, which he did not attain. Imperial College announced that it had revised its assessment criteria after his suicide, which it mentioned when the report was made public.

The document commissioned by HEFCE is in line with recent works that defend similar ideas, such as the Leiden Manifesto on research metrics, published in September 2014 at the 19th International Conference on Science and Technology Indicators held in Leiden in the Netherlands. The Manifesto’s ten principles largely mirror the recommendations of the British group. For example, they refer to the necessity of transparency in data analysis and propose taking into consideration differences between publishing and citation practices. Another reference is Dora, an acronym for the San Francisco Declaration on Research Assessment, published in December 2012 at a meeting of the American Society for Cell Biology, which makes 18 recommendations for researchers, institutions, funding agencies and scientific publishers. The main recommendation proposes eliminating the use of a journal’s impact as an indicator of an article’s quality. Almost 600 scientific institutions and 12,500 researchers have already signed the declaration – the HEFCE report suggests that institutions and agencies also sign the declaration so that the public is made aware of its assessment practices.

Great Britain’s adoption of this approach is noteworthy. “While most countries are still early in their discussion of assessment metrics, Great Britain is several steps ahead, almost through adolescence,” observes Sergio Salles-Filho, a professor at the University of Campinas (Unicamp) and coordinator of the Study Group on Organization of Research and Innovation (Geopi), which evaluated FAPESP programs. He adds that the inclusion of new parameters to evaluate scientific output is also motivated by the need to refine the evaluation process, measuring different aspects related to the impact on society. “In certain areas, the most important task is not to publish articles but rather produce manuals used in industry, promote changes in public policy or change economic policy guidelines. Assessment processes are being transformed, and in twenty years will be very different.”

Metrica_08_2JG0830LÉO RAMOSComplete cycle
Brazil has advanced in broadening its assessment criteria. “Funding agencies are no longer satisfied with knowing merely the specific impact of a scientific article, and they are seeking to evaluate the complete cycle, which gathers information on researchers’ work and the results of programs to measure its contribution over the long term,” he says, referring to the effort of FAPESP and more recently the Brazilian Innovation Agency (FINEP) to systematize a process of data collection that continues bringing together information on the results of the research over time.

Rogério Mugnaini, professor at the School of Communications and Arts (ECA) at the University of São Paulo (USP), is studying the range of assessment criteria for the graduate-level programs run by the Brazilian Federal Agency for the Support and Evaluation of Graduate Education (Capes), using all of the documents proposed by the areas of knowledge since 1998. He has already noted that the areas are increasingly resorting to impact indicators, even if these parameters are not valued in the culture of the specific discipline. “Some areas, like geography, are adopting an assessment model used by the hard sciences,” says Mugnaini. “Since the volume of titles to be evaluated is very extensive, there is a tendency to adopt indicators without knowing their limitations.” Salles-Filho believes there are additional aspects that have to be considered in the evaluation of graduate-level programs. “In Brazil, we graduate 15,000 students with doctorates every year, but we don’t know where they are and what they are doing with the knowledge and experience they acquired during their doctoral studies – whether they are thesis advisers or working in the public or private sector. We should have some idea of the impact on society of our graduate programs,” he says.