Science in a haystack

eScience program tries to extract new knowledge in the midst of huge volumes of data

EDUARDO CESAR and LÉO RAMOSFAPESP published a call for proposals to inaugurate its eScience Research Program. The expression ‘eScience’ is used to describe the challenge of research undertaken in computing, together with other fields of knowledge, to organize, classify and ensure access to the huge volume of data constantly generated in all fields of research in order to extract new knowledge and carry out comprehensive and original analyses. The main objective of the program is to integrate groups involved in research on interfaces, algorithms, computational modeling and data infrastructure with scientists in areas where eScience applications are especially needed in Brazil, from agriculture to the social sciences.  “The intention is to bring these two types of researchers together to generate new knowledge in both computer science and applications in these disciplines,” says Roberto Marcondes César Júnior, professor at the Institute of Mathematics and Statistics, University of São Paulo, and deputy coordinator of FAPESP’s Exact Sciences and Engineering area.

The first call will accept proposals for Thematic or Regular Project grants through April 28, 2014. The Foundation hopes that universities will supply human resources, hiring programmers and database analysts, among others.

The sum of R$4 million will be made available to support projects involving mathematical models, digital repositories and data management, new hardware, software, protocols, tools and services aimed at meeting the demands of research in the areas of agricultural sciences, arts, the humanities and social sciences, engineering and physics, climate and earth sciences, and eScience practices and education.

The initial ideas came from two workshops organized by FAPESP, attended by researchers in computing and other areas (from the exact sciences to the humanities).  “The discussion led to the conclusion that an eScience initiative would be ideal,” says Claudia Bauzer Medeiros, professor at the Institute of Computer Science, Unicamp and Coordinator of the FAPESP Engineering and Computer Science area.  “The details matured over time and culminated in a program proposal drafted by the FAPESP Computer Science Coordinators.”

The announcement is pioneering in Brazil, as it requires researchers to submit a Data Management Plan that describes how the team intends to manage, protect, preserve, and disclose its data.  “Many countries, such as the United States, Germany, the UK and Canada, are discussing how to ensure these plans in every research project, as all concur that one of the most valuable parts of any research project is the data generated, and that needs to be preserved and made public in an appropriate manner,” notes Claudia Bauzer Medeiros.

“FAPESP believes that advances in eScience are critical because it has programs that generate enormous quantities of data of scientific interest that need proper treatment and should be shared,” says César, referring to programs such as BIOTA, identifying biodiversity in São Paulo, BIOEN, related to research in bioenergy, the FAPESP Program on Global Climate Change Research, or CinAPCe, on brain research. The experience of the 46 Microsoft Research-FAPESP Information Technology Institute research projects, 32 of which are complete, indicates that there is a community of researchers eager to participate in the new program, which seeks to develop information technology applications with a social reach.

The advent of eScience relates to changes in the way we do science.  “Until a few years ago, a doctorate in biology, for example, was based on a set of experiments in the first two or three years, with the results compiled in a spreadsheet and analyzed in the last year. But today, in the first year the student often has access to hundreds of spreadsheets on a particular experiment available on the Internet, and her challenge is to discover new knowledge in the midst of those reams of data,” explains the professor.  “The research problem has been transformed. Now one must extract knowledge from a large quantity of information, usually containing heterogeneous data.”

Several international initiatives are addressing the challenges of eScience, such as the Institute for Data Sciences and Engineering at Columbia University. Organized into interdisciplinary themes, it is researching ways to extract information from on-line media, to monitor and improve the use of urban infrastructure, and to improve the health care system using patient data and public health records. Another example is the eScience Institute at the University of Washington, which supports research ranging from astronomy to marine biology. There are also virtual telescopes, like NASA’s SkyView, which allow millions of people access to astronomical data. There is a global virtual astronomical observatory initiative for processing astronomical data worldwide, allowing millions of people access to the universe. Another example is the Large Hadron Collider, the largest particle accelerator in the world, whose results are processed worldwide.

The expertise that scientists need in order to extract information from large quantities of data already exists in the private sector, says Gilberto Câmara, a researcher in geoinformatics and environmental modeling at the National Institute for Space Research (INPE) and director of the institution from 2005 to 2012.  “Banks and eCommerce websites have developed a gigantic database management structure in order to obtain information about customers and their habits and provide services, but the way scientific files are organized has changed very little,” he says, citing as an example the 500,000 satellite images provided by INPE at no cost.  “I can download them one by one, but anyone who needs to manage more than 200 personal photos knows how difficult this is. One challenge of eScience is to make all of them analyzable without my having to download them,” he says.

Câmara cites an article in the journal Science, published in November 2013, with the research results from a group that mapped the changes in forest cover worldwide between 2000 and 2010 using 650,000 Landsat satellite images.  “Google, which has a powerful data storage infrastructure and is leading this field, gave the research group access the images. There is a revolution in scientific data processing underway. It will allow us to do scientific research in a different way—the researcher will go to the data, rather than the data being transferred to the researcher.”

Internet research 

On December 18, 2013, FAPESP and the Ministries of Science, Technology and Innovation (MCTI) and Communications signed a cooperation agreement for R$98 million to support scientific and technological research that contributes to the development of the Internet in Brazil. The sum corresponds to the funds remaining from the period between 1998 and December 2005, during which FAPESP, by delegation of the Brazilian Internet Steering Committee (, managed domain registration and IP address allocation, after which this task was transferred to the .BR Information and Coordination Nucleus (  “In 1998, there were 27,000 Internet domains in Brazil. Today there are more than 3 million,” said FAPESP’s president, Celso Lafer.  “FAPESP offered to help the CGI at a time when did not yet exist, and this support was critical,” said Minister Marco Antonio Raupp, of the MCTI. The funds will be distributed among projects submitted by researchers all over Brazil, in proportion to the number of domain registry requests received from each state during that period. São Paulo will receive 47% of the R$98 million to support projects under the agreement, which involves the two ministries and FAPESP.