Imprimir Republish


Modern mining

Trained to interpret huge volumes of information, data scientists have job opportunities in a wide range of sectors

Andrés Sandoval

Highlighted as a rising research field that combines knowledge of computing, artificial intelligence, mathematics, and statistics, data science involves the analysis of complex volumes of information generated by various platforms. In Brazil, higher education institutions have been investing in new undergraduate and graduate courses to meet the growing demand for data scientists. The objective is to give students the skills to interpret, structure, and analyze data that can reach the order of petabytes, a storage unit that represents 1,024 terabytes.

“In addition to knowledge of computing, mathematics, and statistics, data scientists need to be curious and enjoy solving problems,” says André Ponce de Leon Ferreira Carvalho, vice-director of the Institute of Mathematical and Computer Sciences (ICMC) at the University of São Paulo (USP) in São Carlos, who also works at the Center for Mathematical Sciences Applied to Industry (CEMEAI), one of the Research, Innovation, and Dissemination Centers (RIDC) supported by FAPESP. Their professional education also involves learning how to identify valuable information in enormous databases, an area known as big data. The institution, which already had a graduate program for data scientists, has just started offering a new undergraduate degree in the field. “Professionals currently working in data science generally come from related fields, such as computer science, physics, mathematics, engineering, and statistics,” notes Carvalho. “The increased demand has led to more specific courses.”

Since 2018, the curricula of all courses at ICMC have placed an emphasis on data science, highlighting its relevance in multidisciplinary studies. In 2020, its undergraduate degree in statistics, for example, changed its title to statistics and data science. “This is a worldwide trend in both statistics and computer science courses, which are placing more and more importance on data science,” adds Carvalho. The disciplines that comprise the new course’s curriculum include software engineering, artificial intelligence, high-performance computing, computer networks, and large database mining.

While the volume of data produced every day by smartphones, virtual assistants, electronic locks, watches, refrigerators, vacuum cleaners, air conditioners, and televisions is almost immeasurable, they can also be used to improve products and services or simply make day-to-day life easier. “The data produced by the various sensors in vehicles, such as reversing cameras, thermometers, and speedometers, are stored in databases and can be used for preventive maintenance, as well as to notify drivers when they are in areas subject to flooding or where accidents frequently occur,” says Carvalho as an example.

Large databases that allow for systematic analysis by scientists are usually divided between structured and unstructured data. The former are those that are already organized, such as the number of times a particular website or application is visited, the number of users, most consumed products, and places frequented by the greatest number of people. Unstructured data, meanwhile, include texts and images published on social networks and sounds captured from microphones installed in the ocean, forests, or urban environments, for example. Once systematized, they can help to predict climatic phenomena such as hurricanes, to identify fires, or to determine the occurrence of assaults in a given region.

Requirements to be a data scientist

1. An affinity for the exact sciences, especially mathematics and statistics
2. Familiarity with programming languages
3. Ability to solve problems and identify opportunities for innovation
4. Willingness to work in multidisciplinary teams
5. Good communication skills

Growing demand
While computers were initially seen as a threat to jobs, with people worrying that automation would lead to rising unemployment, many new related professions have emerged as a result, including the information and communication technologies (ICT) sector. Data released by management consulting firm Bain & Company, based in Boston, USA, estimates that as of 2020, approximately one million new data scientists have graduated with undergraduate and graduate degrees worldwide. Data from the Brazilian Association of Information and Communication Technology Companies (Brasscom) indicate that in Brazil, the number of technology roles is not only on the rise, but may actually suffer from a lack of qualified professionals. By 2024, an estimated 70,000 new positions will be available annually.

“As well as companies that are undergoing digital transformations or those that were founded in the sector, there are opportunities in several other areas of science, which increasingly need data scientists to interpret large quantities of data,” says Bianca Zadrozny, a researcher and senior manager of spatio-temporal modeling at IBM Brasil’s Research Laboratory. The company estimates that in the USA, the number of positions available will continue to grow by 5% per year, with 60,000 jobs created in 2020 alone. “Data scientists tend to be one of the most prominent roles at companies due to their ability to suggest hypotheses, design experiments, evaluate results, and present them in an understandable way,” notes Zadrozny.

Luis Gustavo Nonato, coordinator of ICMC’s new course at USP, says the same trend can be seen in Brazil. “There is a huge labor deficit here. Companies are constantly seeking to hire data scientists, and even with the growing number of courses in the area, there is still a shortage of professionals,” he says. According to Nonato, demand is growing across the country, but the majority of positions are available in the South and Southeast. “Governments and public agencies are also placing increasing importance on regional data management for formulating public policies.”

One example of this is the methodologies and models that have been used to increase the efficiency of legal processes and to systematize data generated in the legal sector. In September 2020, ICMC signed an agreement with the São Paulo State Court (TJSP) to develop artificial intelligence tools that will be used to create an analytical database of the information contained in court documents. “The objective is to analyze the content of the texts and identify the most common subjects, as well as to pointing out similarities between different cases,” explains Carvalho, from ICMC.

The institute has also recently signed a contract with the House of Representatives in Brasília to develop software that will use machine learning and natural language processing to analyze content from its channels for public participation. “With this software, it will be possible to identify public sentiment regarding legislative proposals, determining whether the majority are for or against a given bill.”

Aware of the growing demand, the Pontifical Catholic University of Campinas (PUC-Campinas) also plans to start offering a new undergraduate course in the field this year. The degree, entitled data science and artificial intelligence, will be taught in the mornings for the first three years and then in the evenings for the fourth and final year of study. “This will allow students to take internships at the various technology companies located in the region toward the end of the course,” explains Daniele Maia Rodrigues, director of the School of Computer Engineering at PUC-Campinas. Rodrigues highlights that the program aims to equip students with both technical and interpersonal skills, allowing them to combine knowledge of algorithms and computer systems with the ability to work in multidisciplinary teams. To establish these skills, students need to understand computer systems development, programming, computer networks, cloud computing, systems infrastructure, artificial intelligence, machine learning, and natural language processing (through which a computer interprets written or spoken human language). “It is important to note that data scientists are likely to be a part of teams related to other fields, relating to specific business areas,” adds Rodrigues.

Andrés SandovalGraduate studies
As well as a chance to expand their knowledge, many professionals, especially from the exact sciences, see data science as an opportunity to work in a new field. Since degrees in data science have not been offered for long, companies have in the past hired engineers, mathematicians, business administrators, economists, and physicists seeking to complement their qualifications with experience in the new field.

“A knowledge of programming languages, such as Python and R, is essential for data scientists,” explains Eduardo Barbosa, head of the data science and decisions graduate course at the Institute of Education and Research (INSPER). Suited to analyzing robust data libraries, Python and R are widely used in application programming. The course is designed to prepare data scientists to support corporate decision-making and focuses on statistical models and machine learning, programming, and design thinking, which involves developing the ability to understand problems and propose solutions. “In the entrance exam, we try to identify whether the candidate meets the basic requirements for admission to the program. The objective is to avoid frustrating people with no affinity for the exact sciences,” says Barbosa. The institution also offers a 20-hour training course over three days for executives looking to utilize the concepts data science to improve their business.

The specialist diploma in data science and big data at the Federal University of Bahia (UFBA) began in 2018 and is divided into three modules that last four months each, comprising subjects such as applied statistics, matrix algebra, numerical methods, programming with R and Python, machine learning, fundamentals of big data, artificial intelligence, and pattern recognition in image, sound, and video. “As well as statisticians, mathematicians, computer scientists, and engineers, our students also come from the media, the legal industry, business administration, and other fields,” says Jalmar Manuel Farfan Carrasco, coordinator of the course offered by the Statistics Department at UFBA’s Institute of Mathematics and Statistics. To bring together students from such a wide range of fields and help them understand such specific concepts, the graduate program invests in group activities, seeking whenever possible to unite students from statistics and computing with professionals from other areas, such as law, communication, psychology, and others. “By observing the same problem from different perspectives, these interactions allow each member to contribute in a unique way to the search for a solution,” concludes Carrasco.