Programs for an integrated biology

New software examines the role of gene networks in triggering diseases

056_NetCoderDaniel BuenoImprovements in genome sequencing techniques are enabling the generation of increasingly large amounts of data. Developing new ways to analyze this large set of information is a challenge for computational biology. That is the rationale behind NetDecoder, a software that integrates data and manages to extract from it relevant information.  The program, developed in 2015 by Brazilian computer scientist and molecular biologist Edroaldo Lummertz da Rocha as postdoctoral researcher at the Mayo Clinic in the United States, analyzes not just gene expression, in other words, how each gene is activated and can individually contribute to the development of a disease, but also identifies which gene interaction networks – sets of interconnected genes – more strongly affect a particular condition.

“The difference with NetDecoder is that it allows for comparison between these phenotype-specific genetic interaction networks,” says Rocha, who is currently a postdoctoral researcher at Harvard University.

In determining the networks associated with a disease, the program therefore points to which of the body’s regulatory or signaling pathways are altered in a population with a genetic disease when compared with a group of healthy people.  “It’s also possible to compare different stages of development of a single disease and see whether the gene interaction network remains unchanged over time,” he says. The software starts by identifying genes that are differentially expressed (activated) in people with genetic diseases.  Using an algorithm designed to incorporate database information, the software determines whether the gene linked to the disease is apt to interact with other genes, forming a network with the potential to affect a particular signaling pathway.

To validate the computational tool, the researcher and his team analyzed transcriptomes – the set of messenger RNAs produced in a tissue – of people with breast cancer, Alzheimer’s disease and dyslipidemia (abnormal levels of lipids in the blood). After identifying the altered pathways, they compared their results to the literature and found them to be consistent. For breast cancer, for example, NetDecoder identified the signaling pathway of BRAF, already known to be related to the disease.  In Alzheimer’s patients, they found alterations in pathways related to the cellular cytoskeleton, common to neurodegenerative diseases, and in the case of dyslipidemia, they found alterations in metabolic pathways. The findings appeared in an article published in March 2016 in the journal Nucleic Acids Research.

Rocha says that the computational tool can be utilized in studies of any type of human disease.  There is other software that also analyzes gene networks, but according to Rocha, a key feature of NetDecoder is its ability to allow comparisons between interaction networks that act on patients with two distinct phenotypes with different clinical manifestations of a single disease. This feature makes it possible to discover new relationships between genes, signaling pathways (the system of communication that coordinates the cells basic actions) and the diseases studied. “In this way, we can generate new and better targets for treatments,” Rocha says.  “It’s easier to produce medications that act on a pathway rather than on a specific gene.”

Exponential growth
Computational biology, although relatively new, has exhibited exponential growth in recent years. The field’s dizzying pace of expansion can be seen in the international collaborative project known as   Bioconductor, led by researchers from the Fred Hutchinson Cancer Research Center in the United States. The initiative that began in 2001 collects and provides access to a wide range of open source software for analysis of genomic data.  In 2002, it had a mere 20 registered tools but by 2016 that number had risen to more than 1,200.

Since its establishment in 2000, the bioinformatics division of the Genomics and Expression Laboratory (LGE) of the University of Campinas (Unicamp) has developed seven computational tools that can be used in several areas such as genomics, transcriptomics, structural biology and the identification of enzymes.  Notable among them is the Integrated Interactome System (IIS), devised in 2014. “The incentive for developing this tool was to integrate data on genomics, proteomics, transcriptomics and metabolomics that were produced and analyzed independently,” says physicist Marcelo Falsarella Carazzolle, who heads up the unit. “With a broader and more integrated vision of how protein and protein-metabolite interactions occur, it is possible to observe connections that were previously unclear and arrive at new findings.” The tool allows integration of the genomic data on humans, animals or plants that is stored in public databases.  It now has 247 users.

According to Mauro Castro, a researcher at the Bioinformatics and Systems Biology Laboratory of the Federal University of Paraná (UFPR), the field of computational biology represents a major opportunity for Brazilian laboratories to produce research studies at the international level.  “Generating data can require significant financial resources, but analyzing it and extracting important information from it can be done competitively in laboratories, on a modest budget, using qualified personnel,” says Castro. He recently developed RTN software for the purpose of mapping multiple genetic risk factors for breast cancer.  Instead of focusing on the role of a single gene, the program performs a combined analysis of multiple genetic variants that could influence the functioning of tumor regulators.  “Each of these regulators can become a potential target for the development of new markers or medications,” he explains.

Like most of the field’s computational tools, the RTN can be used to study a range of genetic diseases.  The UFPR professor says that a major obstacle in the development of bioinformatics is the lack of specialized workers.  “It is rare to find professionals who are trained to handle the enormous amount of biological data,” Castro says. “Perhaps the biggest challenge lies in training professionals who can understand and analyze this data.”

Scientific article
DA ROCHA, E. L. et al. NetDecoder: A network biology platform that decodes context-specific biological networks and gene activities. Nucleic Acids Research. 14 March 2016.