Imprimir Republish


Big data and new materials

Scientists turn to digital mining techniques and huge databases of compounds in efforts to accelerate the discovery of new structures

Researchers from São Carlos test algorithms that facilitate the search for new types of glass

Léo Ramos Chaves

Big data could accelerate the search for new materials and reduce the empirical nature of a process historically marked by a combination of trial and error, accidents, keen observation, and last but not least, luck. Researchers from various fields of physics, chemistry, and materials engineering are hoping to shorten the path to the discovery of new compounds by using increasingly powerful computers, artificial intelligence, and growing databases on the properties and structures of theoretical materials never made in a lab, or real ones already obtained in experiments. For now, the potential is more exciting than the reality, but the approach is still in its infancy, both in Brazil and abroad.

A team led by materials engineer Edgar Dutra Zanotto, from the Federal University of São Carlos (UFSCar), is making progress in the search for new glass compositions and structures thanks to the use of artificial intelligence (AI). In an article published in the scientific journal Acta Materialia at the end of January, Zanotto and his colleagues compared the efficiency of six algorithms at correlating the chemical composition of 43,240 oxide glasses selected from a database with one of the material’s fundamental properties: glass transition temperature (Tg), which indicates the temperature above which an amorphous material leaves its rigid and brittle phase and starts to exhibit a more viscous and malleable state. “We compared the efficiency of the algorithms at predicting the Tg of known compounds and concluded that two in particular stood out,” explains Zanotto, coordinator of the Center for Research, Education, and Innovation in Glass (CERTEV), one of the Research, Innovation, and Dissemination Centers (RIDCs) funded by FAPESP. The margin of error in the projections provided by the best algorithm, Random Forest, was 7.5% at most, an excellent performance on a par with the level of uncertainty of measurements obtained through experiments.

This was not the first study of this kind by the team. Two years ago, Zanotto, PhD student Daniel Cassar, and André Carlos Ponce de Leon, a professor from the Institute of Mathematical and Computer Sciences at the University of São Paulo (ICMC-SP), tested the performance of one algorithm at predicting glass properties. Their next study will measure how efficiently the three best algorithms can correlate the chemical composition of these oxide glasses with five other properties important to glass applications and development, such as refractive index and thermal expansion coefficients. “When the algorithms become so refined that they can accurately predict the characteristics of a material based solely on its composition, we will no longer need to spend so much time and money testing the countless possibilities to discover glass in a lab,” comments Zanotto. Formulations not deemed promising can be discarded early and researchers will be able to concentrate their efforts on the compounds most likely to succeed.

Yazdani Lab / Princeton University Surface of a topological insulator seen under a tunneling microscopeYazdani Lab / Princeton University

Big data and AI could change the way material sciences are studied. Historically, researchers first discover novel compounds or structures by chance or after tireless searches and modifications, and then try to measure their properties to see if they may be useful. Now, with access to huge databases of materials, scientists can simply search for compounds with specific properties or characteristics (see sidebar). Researchers from the Federal University of ABC (UFABC) and the Brazilian National Nanotechnology Laboratory (LNNANO) at the Brazilian Center for Energy and Materials Research (CNPEM) did just that. They consulted the Automatic – Flow for Materials Discovery database (AFLOW) in search of three-dimensional compounds with a quantum property associated with a certain type of electron spin, known as the Zeeman effect.

To date, this effect, which alters the energy levels of the atoms, has only been experimentally verified by subjecting two-dimensional materials formed by a single layer of atoms, such as graphene, to a magnetic field. “From a list of approximately 59,000 compounds on AFLOW, we found 20 that demonstrate the effect the way we wanted, without the need for a magnetic field,” comments physicist Gustavo Dalpian, from UFABC, one of the authors of an article published in the journal Quantum Materials in August last year. Theoretically, this characteristic gives the materials an advantage for manufacturing spintronic devices, the electronics of which are based on spin states rather than electron charge. “A device made of three-dimensional materials that exhibit the Zeeman effect would not need a magnet to generate a magnetic field. This would make building the device less complex,” explains Dalpian.

Physicist Adalberto Fazzio, director of LNNANO and the author of various papers on the use of data mining in the search for new materials, believes computational approaches are useful and important, but they must be refined and utilized in a realistic manner. “The algorithms still need to be taught how to find mathematical expressions that actually represent physical principles,” he says. One of the drawbacks of using computational models and tools is that they can return untrue results, which at first appear to be a shortcut to a new discovery, only to lead to a dead-end. Simulations can identify promising compounds that end up being unstable or cannot be manufactured. In such a situation, it is only scientific knowledge, embedded in increasingly sophisticated mining algorithms or backed up by scientific literature, that tells the researcher they are on the wrong track.

SOLLIANCEJudging by the geographical origin of the largest databases available today, the race for new materials is led by the Americans and Europeans. “China’s work in the area is not yet earning great attention,” says physicist Osvaldo Novais de Oliveira Junior, from USP’s São Carlos Institute of Physics (IFSC). “They only publish their results in Chinese or they create materials about which they don’t publish anything at all.” According to the researcher, AI techniques are good for classifying data—be it images, words, or properties of materials—but not for interpreting it. Together with Dalpian, Oliveira Junior edited a collection of articles on the use of big data in the search for new materials for the journal ACS Applied Materials & Interfaces in July 2019.

If this approach produces the expected results, stories such as that of Scottish physician Alex Flemming (1881–1955), who in 1928 discovered penicillin by chance, will become increasingly rare. Upon returning from a two-week vacation, Flemming, who had a reputation for carelessness, noticed a white mold had formed on a culture plate in his lab. The fungus was preventing the growth of bacteria that had inadvertently contaminated the petri dish. Thus, the world’s first natural antibiotic was discovered.

Six databases
Some collect data on specific types of materials, while others are more wide-ranging

The Materials Project
Launched in 2011 by the US Department of Energy, the project is managed by the Lawrence Berkeley National Laboratory in California. Its database holds information on the chemistry, structure, and properties of 124,000 inorganic compounds and 530,000 nanoporous materials. It provides tools that allow the user to simulate the characteristics of materials before testing them in the lab.

Automatic – Flow for Materials Discovery (AFLOW)
A consortium formed of 16 universities (most from the US and some from Europe and Asia) whose database stores information on 3.2 million composite materials, for which it has calculated more than half a billion quantum, thermal, structural, and elastic properties, among others. The platform, which is also funded by the US Department of Energy, enables users to virtually construct materials using large-scale scanning technologies (high-throughput).

The Novel Materials Discovery (NOMAD) Laboratory
European repository helmed by Germany’s Max Planck Society, created in late 2015. It unites eight materials research centers and four supercomputing centers, and provides virtual tools to search and cross-reference the properties, structures, and other parameters of millions of compounds. It also features an online encyclopedia of materials, with data on a fraction of the materials found in its online database.

Database specializing in hybrid semiconductor crystals, formed at the molecular or nanometer level by an organic and an inorganic compound. It is run by higher education and research institutions in the USA, under the leadership of Duke University. Its main focus is the search for materials with a so-called perovskite crystalline structure, similar to that of calcium titanate (CaTiO3), which can be a cheap and efficient alternative for manufacturing solar cells.

Computational 2D Materials Database (C2DB)
Stores information on 4,000 two-dimensional materials, including their structures, elasticity, and thermodynamic, electronic, optical, and magnetic properties. The materials are formed by a single layer of atoms. Its best-known representative is graphene, whose famous hexagonal structure is just one carbon atom thick. The database is run by researchers from the Technical University of Denmark.

Created by the Ecole Polytechnique Fédérale de Lausanne, Switzerland, this project focuses on one of the most exotic classes of materials: topological insulators, which conduct electricity on their surface, but, as the name suggests, behave as insulators in their interior. The project began in 2012 and the database contains information on more than 13,500 materials. Topological insulators could potentially be used to produce new forms of electronic devices.

CERTEV – Center for Research, Education, and Innovation in Glass (nº 13/07793-6); Grant Mechanism Research, Innovation, and Dissemination Centers (RIDCs) Program; Principal Investigator Edgar Dutra Zanotto (UFSCar); Investment R$34,665,855.27.

Scientific articles
ALCOBAÇA. E. et al. Explainable machine learning algorithms to predict glass transition temperature. Acta Materialia. Jan. 30, 2020.
ACOSTA, C. M. et al. Zeeman-type spin splitting in nonmagnetic three-dimensional compounds. Quantum Materials. Aug. 7, 2019.