Nobel Prize winner data used in studies on trends and advances in science : Revista Pesquisa Fapesp

The 120-year history of the Nobel Prize—which recognizes extraordinary contributions in the fields of Physics, Chemistry, and Medicine or Physiology, as well as Literature, Economic Sciences, and Peace—has become a treasure trove of data for analyzing top scientists’ work and careers. Information on the performance and profiles of laureates has long been used as a benchmark to detect common patterns among researchers at the frontier of science, providing clues about how leading-edge research works. In recent years these studies have become more sophisticated through the use of computational tools and Big Data.

In 2019 a group of data scientists at Indiana University Bloomington and Northwestern University, in Evanston, developed a database of publication records covering laureates across the three science categories of the Nobel Prize—a total of 93,000 papers from 545 prizewinners. They then intersected the publication and citation database from Microsoft Academic Graph with data collected from laureates’ personal curriculum vitae (CVs), university websites, and Wikipedia. Algorithms were used to purge ambiguous or redundant records.

The database has yielded several research papers so far. An article in Nature Reviews – Physics, for example, reviewed the peculiarities often ascribed to Nobel laureates and found that some remain valid, while others less so. The review confirmed that laureates tend to produce prize-winning works early in their career, and far earlier than their colleagues do. But the paper dispels the notion that Nobel laureates are lone geniuses making solo contributions—instead, they are increasingly collaborating in large teams. Prizewinners are now publishing papers as part of bigger groups—average team size for laureates is 4.04 versus 3.25 for non-laureates, indicating that they are increasingly engaged in a highly cooperative science ecosystem.

An analysis by Physics Today of the database developed at Indiana and Northwestern found that publishing and citation patterns among Nobel laureates have shifted in the last 20 years: contemporary prizewinners have published more papers than laureates in the previous century, and their papers now cite more works in their references. Italian Santo Fortunato, director of the Indiana University Network Science Institute, says the gap between a discovery and recognition with a scientific Nobel prize has also grown over time, and now stands at an average of more than 30 years. But major discoveries do tend to be recognized much sooner. For example, the detection of gravitational waves in 2016 led to a Nobel physics prize in 2017. “I have no doubt that COVID-19 vaccines will be awarded a Nobel prize really shortly,” Fortunato told Nature.

Exploring data about Nobel laureates to gain an understanding of how science works is not a novel concept. In the 1970s, sociologist Harriet Zuckerman, a researcher at Columbia University, became known for her studies on the role of scientific leaders in the advancement of science in the US. Using data about Nobel laureates, her book Scientific elite: Nobel laureates in the United States analyzes the careers of scientists who won the prize between 1907 and 1972, and shows how the development of a vigorous academic culture played a crucial role in the country’s rise to scientific and technological preeminence. But new computational tools are now drawing fresh insights from the Nobel Prize data. Researchers are scouring the data on laureates to detect research patterns characterizing high-impact careers, distinctive signs of creativity, and even factors underlying the origin of discoveries.

The Lattes platform has provided a wealth of data for studies on science trends in Brazil

Last year, a paper in PLOS ONE explored a collection of articles published by Nobel laureates to try to understand how each of their contributions unfolded. The authors, data scientist Yakub Sebastian, at Charles Darwin University, Australia and Chaomei Chen, at Drexel University, in the US, found that these papers shared “boundary-spanning traits,” with their authors having “exceptional abilities to connect disparate and topically diverse clusters of research papers.” The two researchers are currently testing metrics to identify these traits.

The answers, of course, will not be found in simple bibliometrics. A 2020 study by Marek Kosmulski, a Polish chemist at Warsaw Technological University, shows that future Nobel laureates are difficult to predict out of international lists of the most prolific researchers. Kosmulski analyzed data on 97 prizewinners in Chemistry, Economics, Medicine, and Physics between 2010 and 2019. He noted that only 17 of them were in the list of 6,000 highly cited papers as published by Web of Science. Data analytics firm Clarivate Analytics, in contrast, finds that laureates can be predicted based on their research careers. The firm has created a list of “citation laureates” that has successfully predicted 64 Nobel prizewinners since 2002. Five laureates last year were on the list. Clarivate has not disclosed details of its methodology, but says it starts with a quantitative analysis of papers with over 2,000 citations, and its team of experts then adds a layer of qualitative analysis.

Ho Fai Chan, an economist at Queensland University of Technology, in Australia, has specialized in research using Nobel laureates as parameters. His previous papers in Scientometrics have shown, for example, that Nobel prizewinners slow their pace of collaboration after winning the prize but remain loyal to their most frequent collaborators prior to recognition. Chan’s most recent paper, published this year, analyzed data for 387 Nobel laureates across the three scientific categories from 1901 to 2000, and found that receiving the Nobel Prize at a younger age is related to a longer expected lifespan. “Any resulting increases in status and social standing could also result in a healthier and longer life,” Chan told Chemistry World. Rasmus Bjørk, a physicist at the Technical University of Denmark, published a study in 2019 on the ages of Nobel laureates, and concurs with Chan’s findings. “No university in the world would fire a Nobel laureate, who in general can obtain abundant research funding over the rest of their career,” says Bjørk.

Jacques Marcovitch, who served as dean of the University of São Paulo (USP) from 1997 to 2001, notes that assessment metrics for physics, chemistry and medicine need to take into account the distinctive nature of each field, as they can differ significantly in both researcher performance and team size. “The average age of Nobel laureates, for example, varies significantly between theoretical and experimental physics. An experimental innovator will typically make their major breakthrough later in life, as their discoveries require them to first develop a basis of theory; theoretical physicists tend to make their discoveries earlier, because they use known abstract principles to develop their theories,” says Marcovitch, who is leading a FAPESP-funded project to develop metrics for assessing the scientific, economic, and cultural performance of public universities.

Other science databases are also being exploited. In Brazil, the Lattes platform, a collection of more than 4 million academic CVs, has been used as a source of information by a growing number of researchers searching for data to find trends and patterns in Brazilian science. This has been made possible by a newly launched tool to extract and collate CV data. “Computational tools are crucial in scientometric research, especially for collecting internet data that is useful for different applications,” says Fábio Mascarenhas e Silva, a researcher in the Department of Information Science at the Federal University of Pernambuco (UFPE), who has previously done extensive research using Lattes datasets (see Pesquisa FAPESP issue nº 233). He recently published a paper about the performance of Nobel-winning articles. He recalls how, in a debate with his graduate students about current approaches to recognizing researcher merit—now largely based on publishing output and impact—the discussion wandered to the Nobel prize. “We were discussing factors that might result in a researcher receiving more citations, and ended up talking about what happens when a researcher wins a Nobel Prize,” recalls Mascarenhas.

This discussion inspired librarian Jailiny Stanford’s master’s thesis under Mascarenhas, which she defended in 2017. Stanford analyzed the citation impact of papers that had won the Nobel Prize in Physics and Chemistry from 2005 to 2015, based on data from Web of Science and Scopus. “Immediately after winning the prize, Physics laureates see a notable surge in citations, although the increase is less pronounced for their most notable papers, which were already highly influential before the prize,” says Mascarenhas. “In contrast, there is no perceptible change among Chemistry laureates, who typically see their citations peak before winning the prize.” The dynamics of each field, says the researcher, affect the level of recognition that even the most high-profile scientists receive.

Brazilian researchers nominated for the Nobel prize

Although data about Nobel laureates can be useful, it also has its drawbacks when used as a basis for analyzing science. The Prize, founded by Alfred Nobel (1833-1896), the Swedish inventor of dynamite, is known to be somewhat biased in its assessment criteria. This includes having a disproportionate number of researchers from developed countries (and Scandinavians in particular) invited to nominate candidates and select winners. Nominators typically interact among themselves, and winners are not selected in a neutral environment. This in itself, however, is not necessarily a disadvantage in research about the history of science. José Eymard Homem Pittella, a physician and retired professor at the School of Medicine at the Federal University of Minas Gerais (UFMG), published an article in 2018 in the journal História, Ciências, Saúde – Manguinhos about the extent to which Brazilian science became internationalized from 1901 to 1966. His primary source was the Nomination Archive, a database with information about the identity and distribution of nominators, nominees and laureates in the Medicine, Physics, Chemistry, Literature, and Peace categories of the Nobel Prize. The Royal Swedish Academy of Sciences opened access to the archive in 1974, although data on the five most recent decades has been restricted.

In his paper, Pittella shows how in the first 66 years of the Nobel Prize, Brazil’s limited presence on the international science scene undermined its chances of producing Nobel laureates—no Brazilians have ever won a Prize. Several Brazilian researchers have been invited to nominate candidates in the fields of Medicine or Physiology, Physics, Peace, and Literature. In the study period, the candidate most nominated by Brazilian nominators in a scientific category was American physiologist Walter Bradford Cannon, who never won a prize. Carlos Chagas (1878–1934) was nominated once in Medicine and Physiology in 1913, and a second time in 1921. “Carlos Chagas was a prodigious researcher. He discovered a disease, its vector and how it is transmitted. But he was only nominated by Brazilian researchers, and was unable to garner wider support,” says Pittella.

Brazilian physician Antonio Cardoso Fontes (1879–1943) was nominated in 1934, and infectious disease specialist Adolfo Lutz (1855–1940) four years later. Manoel Dias Abreu (1891–1962), a physician who invented photofluorography, received four nominations in 1946—that year, a record five researchers from the University of Brazil were invited as nominators. All nominated Abreu except physician Agenor Guimarães Porto, who instead chose Bernardo Houssay, the winner in 1947. In the Physics category, Cesar Lattes was nominated five times from 1949 to 1954, by seven different researchers, none of them Brazilian. The only nominator with whom he had not worked previously was Leopold Ruzick, who won a Nobel Prize in Chemistry in 1939—he nominated Lattes on three occasions.

Republish