Imprimir Republish


Valuable records

Data compiled from the Lattes Platform provide fuel for studies on science in Brazil and reveal trends

ELISA CARARETOThe Lattes Platform, which contains more than 4 million academic CVs, has become the source of information for a growing number of researchers seeking data on Brazilian science in order to study related phenomena and trends. Established in 1999 by the National Council for Scientific and Technological Development (CNPq), the CV directory records the backgrounds and contributions of each student, technician and researcher in Brazil and provides the government and funding agencies with information on scientific production, participation in projects, students advised and researchers supervised, among other data. Its usefulness, however, has long extended beyond the sphere of management, helping researchers produce original knowledge. “Few countries have a platform containing data on the activities of its scientific community as a whole,” says Rogério Mugnaini, professor at the University of São Paulo (USP) School of Communications and Arts (ECA).

Mugnaini is taking part in a research project supported by CNPq and USP that seeks to develop tools to collate information from the Lattes Platform with data on scientific publications from national and international sources. In June 2015 Mugnaini and his colleagues published an article in the journal PLOS One, analyzing the relationship between senior researchers in the exact and earth sciences and the graduate students they advise, looking at bibliographic production recorded in the Lattes Platform for the period 1981–2010. They noticed that, the longer they work together, the greater the productivity of the young researcher. “We noted that some of the advisors stopped pursuing their own line of research and began to publish articles only with their students,” he confirms. Another of the group’s studies analyzed to what extent Lattes information is up-to-date compared with the CVs obtained from graduate program reports. The conclusion was that up to 20% of articles published in the preceding three years had not been entered into the system. The fields of engineering and agricultural sciences are the ones that suffer most from this problem.

Mugnaini believes that the development of specific indicators based on the Lattes platform is viable. “In the bibliographical references in Lattes CVs, there is information on scientific production in periodicals not indexed in the databases of traditional journals, in addition to theses, books and other documents. The Lattes database represents a complete view of Brazilian scientific production and could be the basis for a more faithful assessment system,” he says.

ELISA CARARETOThe advent of a tool to help extract and organize the large quantities of Lattes data available on the Internet was critical to the researchers’ work.  Since 2005, scriptLattes—developed by Professor Roberto M. Cesar Jr. of the USP Institute of Mathematics and Statistics, and his doctoral student at the time, Jesús Mena-Chalco, who is now a professor at the Federal University of the ABC (UFABC)—has been available. Mena-Chalco is also a member of Mugnaini’s research group, which is coordinated by Luciano Digiampietri, a professor at the USP School of Arts, Sciences and Humanities (EACH). “Our intention was to create an in-house tool for use at IME, but it ended up being useful for many others,” says Mena-Chalco. “Before scriptLattes, Lattes data were normally collected manually.” The free software scriptLattes automatically downloads Lattes CVs for a group of people of interest, compiles lists of their output, removing duplicate data, and generates reports on articles published, advising and coauthor networks, for example. “The tool does not create additional data, but it can gather and organize information extracted from large data sets automatically. This information can be used as input for knowledge discovery.”

The scriptLattes program is currently being used by more than 50 institutions and research groups in Brazil. Mena-Chalco is using the tool for two projects. One of them seeks to map coauthor networks for scientific articles, books and book chapters. An article published in January 2014 in the Journal of the Association for Information Science and Technology showed that the number of collaborations between Brazilian researchers has increased noticeably over the last two decades, principally in the fields of health sciences and agricultural sciences (see Pesquisa FAPESP Issue No. 218). Another project seeks to build family trees of scientists, analyzing advising and supervision relationships. The researcher and his collaborators plan to develop a national platform that will allow them to identify contributions by each researcher to the training of other researchers. “There have already been initiatives to develop family trees in fields of knowledge such as mathematics, physics and neuroscience, but not for an entire country’s scientific community,” says the researcher. “The analysis of information on academic advising relationships is an attempt to measure the importance of a researcher based on his impact on other generations.”

Similarly, Luciano Digiampietri, a professor in the EACH-USP bachelor’s degree program in information systems, who has already published more than a dozen articles based on Lattes data, also uses information from the platform to develop algorithms tailored to anticipate trends. One of his studies, together with a master’s degree student, William Maruyama, is seeking to predict a researcher’s future collaborations. The study used data on co-authorship of scientific papers by researchers in the field of computer science recorded in the Lattes Platform between 1970 and 2010. The data from 1970 to 2000 served to represent past patterns. The information from 2001 to 2005 was used to represent the present. The algorithm compared the two time intervals and tried to predict with whom the researchers would collaborate in the future. To validate the algorithm, the results were compared with data recorded between 2006 and 2010. The representation of the future developed by the algorithm and what really happened between 2006 and 2010 coincided 97%.

Another line of research is the analysis of trends in certain fields of knowledge. Using keywords taken from the titles of scientific publications by computer science researchers recorded in Lattes over time, the objective was to predict which topics would be the most studied in the near future. In 2012 the term that stood out according to the methodology was “web services,” but the prediction for both 2015 and 2020 is that neural networks will be the hottest field. The EACH-USP researcher also plans to develop a tool capable of suggesting which recently published articles might interest a particular researcher, based on the subjects on which she focuses. “I started working on these topics five years ago. They had nothing to do with my doctoral work, but they fascinate me,” says Digiampietri.

Studies based on data from the Lattes Platform are uncovering little known phenomena or trends that have not yet appeared in official indicators. Fabio Mascarenhas, a professor in the Department of Information Science at the Federal University of Pernambuco (UFPE), just advised student Guilherme Alves de Santana on his master’s thesis. Amid the data compiled on scientific collaboration, Santana uncovered a peculiar characteristic in the formation of research groups in Brazil. “Scientific production within research groups was analyzed. We expected to find more articles co-authored by members of the same group. But we found more collaboration with researchers from outside groups than within them,” he said. Santana will investigate the topic in depth in his PhD dissertation.

Another interesting fact, this time involving scientific literature on tropical medicine by Brazilian authors, was noted in the master’s thesis of Natanael Vitor Sobral, also defended this year. Based on data from the UFPE Graduate Program in Tropical Medicine, he found that a significant proportion of articles published in the field do not focus directly on endemic diseases in Brazil, such as dengue, malaria or schistosomiasis, but rather on diseases that international scientific journals find more interesting, such as AIDS. “The explanation is that major scientific journals have little space for the diseases of poor countries if they are not linked to more universal themes,” says Mascarenhas.

Alberto Laender, a professor in the Department of Computer Science of the Federal University of Minas Gerais (UFMG) and member of the Brazilian Academy of Sciences, collected data from more than four million CVs and, with the help of Thiago Magela Rodrigues Dias, a doctoral student at the Federal Center of Technological Education of Minas Gerais, who he is co-advising, is analyzing the scientific production of over 220,000 doctoral-level researchers who have input their CVs into the platform, 64,000 professors in graduate programs and 15,000 CNPq productivity grant recipients—the researchers considered most productive by the federal development agency. “We are going to analyze the evolution of production in these three groups and in seven broad fields of knowledge,” says Laender, who is a member of the National Institute of Science and Technology for the Web (InWeb). Preliminary data suggest that, in general, the total number of publications by the Brazilian scientific community began to drop in 2012. There may be two reasons for this. The first is the increase in the teaching workload of PhDs hired as professors at federal universities. “Perhaps they are not managing to publish at the same rate as when they were studying during their doctorate,” he says. A second is the increase in the number of posts for professors at federal universities, where PhDs have little time for research. The drop in scientific production, however, is seen neither among professors teaching graduate classes, nor among those receiving productivity grants.

Comparing production
Laender and his group have published many articles based on data from the Lattes system. They have already published papers on the profile of researchers in computer science and their scientific production compared to colleagues in North America and Europe. They have also used Lattes data to develop an Internet portal ( where the production of the National Institutes of Science and Technology (INCTs) can be viewed.

Researchers who work with Lattes frequently complain about the difficulty in obtaining raw data directly from the platform. In April 2015, CNPq introduced a confirmation code for each query, making it even harder to extract Lattes information via the Internet. This was to prevent the growing number of incidences of commercial sites sharing platform data. CNPq’s policy is to provide each institution with consolidated data on its researchers, professors and students, but the entire dataset for all individuals is not easily obtained. “Full release of Lattes data would be important for the scientific community.” I do not mean individual CVs, but rather the dataset and updates to it so that we can work with it more easily,” says Alberto Laender of UFMG.

Mônica Ramalho, a CNPq science, technology and innovation analyst engaged in assisting with planning and coordination of statistics and indicators at the agency, claims that there is an institutional procedure that one can follow to obtain this information. “To get more direct access, you just have to send a request to CNPq explaining the reason why the researcher needs Lattes data,” she says.