Imprimir Republish

Evaluation

Experts analyze the risks and benefits of using artificial intelligence to measure researcher performance

Study used machine learning to analyze résumés and predict who would receive a CNPq grant

Alexandre Affonso

Artificial intelligence (AI) could be used to perform scientific evaluations currently only entrusted to human reviewers, suggests a study published in the Journal of Informetrics in November. The objective of the article, written by researchers from the Federal University of Rio Grande do Sul (UFRGS), was to identify criteria and attributes capable of defining whether a researcher should be awarded a Research Productivity (RP) grant from the Brazilian National Council for Scientific and Technological Development (CNPq). Currently distributed to around 15,000 researchers, these grants act as a complement to remuneration offered by institutions as payment for a person’s work and for supervising students.

The team analyzed the CVs of 133,000 researchers, 14,138 of whom were awarded RP grants between 2005 and 2022. The authors used a series of machine learning techniques to examine the résumés of grant candidates and were able to identify which researchers would be considered with a reasonable degree of accuracy. The method was 80% accurate in one of the grant categories—the RP-2 grant, which is aimed at younger researchers and is awarded based primarily on the number of published articles and student supervisions. “For other grant levels, the tool performed well but with less accuracy, since the decision is based on a more qualitative analysis,” said Denis Borenstein of the UFRGS’s School of Business Administration, one of the authors.

According to Borenstein, it was only possible to create the model because there is a large volume of data on researchers’ work on the Lattes résumé website, which were used to train the machine learning algorithms. He believes the tool could be useful at least for screening candidates and making the work of human reviewers easier. “Reviewers could then analyze a smaller volume of proposals more calmly and carefully,” says Borenstein, an expert in applied operations research, an interdisciplinary field that uses algorithms and mathematical and statistical methods to aid decision-making.

Olival Freire, scientific director of the CNPq, agrees that AI will be useful in the agency’s evaluation procedures, but warns that it needs to be adopted gradually and cautiously. “Poorly trained or misused AI systems can have terrible results. You need to carefully curate the algorithms to make sure the analysis is consistent,” he says. Freire notes that the CNPq already uses AI in tasks such as the selection of reviewers for grant and scholarship applications. “The system runs through the list of 15,000 research productivity grant recipients and identifies a small range of experts on the topic in question. They are then contacted by CNPq technicians,” he says.

This strategy, according to Freire, prevents bias in the reviewer selection process, such as repeatedly inviting those who quickly accept the task or submit their reviews. People applying for RP grants are now also allowed to use generative AI to help write their proposals, as long as they declare that they did so. The CNPq director states, however, that the widespread use of AI could conflict with the more qualitative approach that the agency aims to take in its evaluation process. “Some CNPq disciplinary committees are judging requests for productivity grants in two stages. In the first, more quantitative in nature, they analyze data on scientific production, citations, and number of supervisions. This can be supported by algorithms. In the second, candidates are invited to highlight numbers, but their main achievements, which could be influential articles, patents, or works of art, are judged by peers. This could not be done properly by artificial intelligence,” he explains.

Artificial intelligence is already being used to organize and analyze large volumes of research data and even to identify or model protein structures that could lead to new drugs. Physicist Osvaldo Novais de Oliveira Júnior, current director of the São Carlos Physics Institute at USP, showed that AI is highly successful at predicting whether a scientific article will receive a large number of citations. Together with colleagues from USP and Indiana University, USA, he uploaded a paper on the subject—yet to be peer-reviewed—to the arXiv repository. In the study, the group used AI to examine the abstracts of 40,000 articles published in the American Chemical Society’s journal ACS Applied Materials & Interfaces between 2012 and 2022. The method was able to indicate, with 80% accuracy, which abstracts were among the 20% most cited, based only on the words used and the topics covered, without considering the authors or the institutions with which they were affiliated. Novais, a scholar of computational linguistics, claims that the mastery of human language by computers will enable them to perform all kinds of intellectual activities and that their enormous data processing capacity will soon lead them to surpass human intelligence. “It is likely that in the not-too-distant future, around 2027, we will reach the technological singularity, at which point AI will surpass human capacity,” he says.

Marcio de Castro Silva Filho, scientific director at FAPESP, believes the trend of using AI in the review process is irreversible. “Scientific publishers already use the technology to screen and analyze submitted scientific manuscripts. Funding agencies are also moving in this direction, developing tools to support their reviewers,” he says. “At FAPESP, we are discussing how algorithms could allow us to extract information from proposals and help the people reviewing them.” It is essential, according to Silva, to be transparent about the use of AI when it is incorporated into tools like these, in addition to being clear about what criteria they analyze.

However, it may be a while before algorithms can feasibly be used for more complex tasks in the review process, according to physician Rita Barradas Barata, who was director of reviews at the Brazilian Federal Agency for Support and Evaluation of Graduate Education (CAPES) between 2016 and 2018. “It’s one thing to use algorithms to process large volumes of data, but interpreting the data in a way that captures all the nuances needed to review a proposal is something else entirely.” The researcher, who was responsible for completing the 2017 version of the postgraduate program assessment released every four years by CAPES (see Pesquisa FAPESP issue nº 260), says that the evaluation has to consider multiple dimensions of the performance of master’s and PhD courses, such as the context in which they operate and regional vocations, all of which will need to be analyzed by any algorithm developed to help.

Jacques Marcovitch, who was dean of USP between 1997 and 2001, sees risks in the predictive use of AI. One is that it inhibits the adoption of the principles of “responsible evaluation,” which aim to introduce qualitative parameters, based on peer review, into the analysis of scientific results. “Algorithms are capable of reading a large amount of content, but they always look to the past, to data accumulated over time. In an era of disruption, this would limit the identification and recognition of the science that will shape our future,” he says. Marcovitch is head of the Métricas Project, an effort that brings together researchers from multiple institutions with the aim of developing comprehensive ways to measure the impact of universities on society.

He emphasizes that this does not mean that AI cannot be useful. The most important thing, he says, is that the results are used by trained people who understand the limitations and know how to interpret them. Justin Axel-Berg, who is also involved in the Métricas Project, warns about the lack of transparency regarding the parameters adopted by generative AI algorithms. “It would be a great risk to use these programs to determine who receives public funding for research and scholarships. What would you say to a candidate who was unhappy with the assessment result? That it was the algorithm that said no?” he asks.

Multidimensional analysis
Qualitative criteria adopted for chemical engineering

In the field of chemical engineering, the CNPq is awarding Research Productivity (RP) grants based on different criteria than those adopted for other disciplines. Instead of being limited to traditional quantitative indicators, such as the number of articles, citations, and students supervised, the focus is on measuring the contribution of the applicant in a broader way, evaluating everything from the academic impact of their scientific work to their role in training others, efforts to establish collaborations, and leadership in scientific and innovative projects. The rules for the next three years were discussed by the CNPq’s advisory committee for chemical engineering over the last four years and began to be implemented in October, after being discussed with the community. “The idea is to consider the qualitative nature of the research, with the aim of discouraging predatory publishing practices that artificially inflate scientific production,” says Claudio Dariva, a researcher at Tiradentes University in Aracaju, Sergipe, and chair of the committee.

The search for new criteria is partly motivated by the fierce competition for RP grants in chemical engineering. Based on the average of the last three evaluations for RP grants (2021, 2022, and 2023), only 35 of every 100 researchers who apply in the area are successful, the lowest level among all engineering programs at the CNPq. To give an idea of the asymmetry: the average success rate for applicants across all engineering departments in the period was around 45%, with some disciplines reaching 60%. “If we use merely quantitative criteria to award grants, many researchers who have made important contributions to society would not be able to compete. Our guiding question has always been: what does it mean to be a productive researcher?” says Maria Alice Zarur Coelho, a researcher from the Federal University of Rio de Janeiro (UFRJ) who led the advisory committee during part of the period during which the new criteria were formulated. “The objective is to perform a multidimensional analysis of the applicant, which serves as a guide and prioritizes the impact of their research more comprehensively,” says Marisa Beppu, a researcher from the University of Campinas (UNICAMP) who also led the CNPq’s chemical engineering advisory committee.

The story above was published with the title “Measured by algorithms” in issue 346 of December/2024.

Republish