ELIZABETH PAPADOPOULOS / GETTYIMAGES / PASIEKA / SCIENCE PHOTO LIBRARY / SPL DC / LATINSTOCKExplaining nature’s phenomena with mathematical equations is a routine task and one included in studies of physics, chemistry and mathematics itself. Biology has a lesser tradition in this sense. This relationship is being pursued by several study groups in Europe and the United States that are seeking a link between the genomes of living things and mathematical structures to try and understand better the formation of life on the planet. However, such a link was first made by a group of researchers from the University of Campinas (Unicamp) and the University of São Paulo (USP), who found a mathematical relationship between a numerical code and the sequence of DNA, deoxyribonucleic acid that carries the genes within cells. Other researchers had already suggested such a relationship, but had been unable to prove it. The Brazilians found that the nitrogenous bases, thymine (T), guanine (G), cytosine (C) and adenine (A) are organized according to a numerical logic. “The distribution of these bases has a mathematical code that prevailed throughout the evolution of living beings,” says Professor Márcio de Castro Silva Filho, from USP’s Luiz de Queiroz College of Agriculture. “We discovered that when a protein loses its biological function due to a mutation, for example, it ceases to be represented by a mathematical structure,” says Silva Filho, one of the coordinators of the group.
The researchers have not developed a new code to explain the DNA sequence. They found that a relationship exists between certain DNA sequences and the error-correcting code (ECC), which are mathematical equations used in the whole of the digital process in communication and telecommunication systems, computer memories and the flash memories of pen-drives to correct noise or defects that arise in transmissions. The code, also known by the letters BCH, which are the initials of its discoverers, the Indians, Raj Chandra Bose and Dwijendra Kumar Ray-Chaudhuri and Frenchman, Alexis Hocquenghem, not only identifies the error but also corrects it. The assignment of the association of error-correcting codes and DNA sequences is nothing new. It has been the subject of research since the 1980’s and one of its leading scholars is Professor Hubert Yockey, who worked at the University of California in Berkeley, and published two books: “Information Theory and Molecular Biology” in 1992 and “Information Theory, Evolution and the Origin of Life”, in 2005, both published by Cambridge University Press. Another researcher in the area is Gérard Battail, a retired professor from the National Superior School of Telecommunications, in France, who has written articles proposing the relationship between error-correcting codes and the genome. They have shown the process and raised hypotheses, but have not presented the mathematical relationships with DNA. The Brazilians were able to establish this relationship in the messenger ribonucleic acid (mRNA) sequences that generate proteins.” In finding out the mathematical structure of the protein it is possible to alter the order of the bases and also to correct the mutations or errors that may happen to return to the normal condition of a protein,” says the professor.
In the future, the ability to correct a mutation or cellular error could use, for example, a mathematical solution for acting on the lack of insulin production by pancreatic cells, correcting errors in a specific gene. “It would be possible to identify the mathematical structure of mutations and where they occurred and maybe correct this molecular problem for the organism to return to producing insulin again, by reversing the previous structures. Another possibility would be to manufacture proteins from the mathematical code or even find unknown proteins that exist in the cells,” says Professor Reginaldo Palazzo Jr., from the College of Electrical and Computational Engineering (FEEC) at Unicamp, another group coordinator. “The correction, or way of reversing the error in the cells, happens in the same way as a hard disk (HD), which has a damaged sector and the ECC reconstitutes the information.”
With so many possible uses in industry, in addition to the important scientific significance of the discovery, before publishing the news in scientific journals the researchers decided to file an international patent with the Patent Cooperation Treaty (PCT) in various countries, and another in the United States, with funding from FAPESP and management by Unicamp’s and USP’s Innovation Agencies. If they license the patent, laboratories worldwide will be able to use the mathematical structures discovered by the group, possibly in the form of software for testing proteins in a wide range of products. “This information is important for developing vaccines, drugs or proteins for making cheese and fabric softeners, for example,” says Professor Silva Filho. Today, an alteration is made in the DNA sequence that codifies a protein and then laboratory tests are carried out to check the effectiveness of the reaction in an experiment of trial and error. With the mathematical equations, it will be possible to test the affinity and stability of the protein in preliminary work in order to check mutations and then test them in laboratory experiments to confirm if the mutation in the DNA sequence gives the expected result. “If the mathematical structure is not maintained, the change will not be effected and will not produce the expected results.”
The discovery of the existence of a mathematical code that transcribes the DNA sequence happened almost by chance and began with Professor Palazzo, who established a challenging objective for two of his PhD students, Luzinete Cristina Bonani Faria and Andrea Santos Leite da Rocha, for whom he was the tutor at Feece and who had graduated from the Catholic University of Campinas (Puccamp), and had Master’s degrees from Unicamp. They were to look for the information that is to be found in a cell. “Within the mitochondrion, an organ responsible for cell respiration, there are DNA molecules for synthesizing certain substances, but it does not have all proteins and needs to request extra proteins produced by genes located in the nucleus in order to perform the functions in the organelle. In this case, for mathematicians, the protein is information and there is a standard code for transmitting it,” explains Professor Palazzo. The model presented by the Brazilian researchers adjusts to any sequence of DNA that produces proteins within the cell.
Palazzo is a specialist in the so-called mathematical theory of communication, a study area that researches the transmission of all kinds of information and its codes. Also called the theory of codes, it analyzes the forms of transmission regardless of the meaning. Therefore, the word being transmitted does not matter, but instead, how it is sent from emitter A to receptor B, within a mathematical context. “This theory was presented by Claude Shannon [an American mathematician and electrical engineer] in 1948,” recalls Palazzo. For the study of Andrea and Luzinete, Palazzo suggested they approach one of the professors at Unicamp in the Faculty of Medical Sciences (FCM), initially to find biological components and to go into the subject in depth. After a lot of searching, Professor Anibal Vercesi from FCM suggested that they talk to Professor Márcio de Castro Silva Filho at Esalq. “We went to talk to him and we established a marriage of interests,” says Palazzo. “We started a dialogue with mathematicians, an electrical engineer and myself on one side, and a geneticist specializing in the transport of proteins on the other,” Silva Filho recalls. The first sample of DNA studied by the researchers from Unicamp was from Arabidopsis thaliana, a plant from the mustard family, which serves as a model for genomic studies. Starting with that they spent six months working. “They began to test various mathematical elements to try to find some systematization in relation to the genome,” explains Palazzo, who also relied on the collaboration in the study of computer engineer, João Henrique Kleinschmidt, a former PhD student and currently a professor at the Federal University of ABC in Santo André, in the São Paulo Metropolitan Region. “One day they called me at Unicamp and showed me the results. When I realized what it was I was speechless. I thought it was a coincidence and we began to repeat the work using other genomes from humans, bacteria, fungi and plants. We discovered that this is a universal process,” says Silva Filho.
Understanding the language
At the end of 2009, they submitted an article to the journals, Nature and Science, but both turned it down saying it was something very specific. “I don’t think they understood the mathematical language of the paper,” says Silva Filho. “This is part of the difficulty of the conversation between biologists, engineers, doctors etc.,” says Palazzo. Then they decided to send it to the journal, Electronics Letters, which in just three weeks accepted the work and selected it as the best article of February that year, putting it on the cover of the same month. They began showing the study at international congresses on information theory and should present new results, with information that is more detailed and other mathematical tools. In the article in Electronics Letters, “DNA sequences generated by BCH codes over GF (4)”, they presented a part of the work using the mathematical structure called the Galois algebraic body, while new results use the Galois ring structure. Simplifying things, we could say that in relation to the body the product of two numbers other than zero results in a number other than zero, while in the ring structure the product can be zero. For mathematicians this makes a lot of difference in the presentation of results. So far, they have only presented the results in body.
The achievement of the Brazilian researchers presents an important solution and something new to biology. It starts a new phase in which the phenomena studied start being analyzed using quantitative methods. “In 1999 the Royal Academy of Sweden indicated that one of the advances of science in the new century would be the incorporation of mathematics to the study of biology,” Silva Filho remembers. However, for this to happen, both the Brazilian researchers like Battail and Yockey agree that it is necessary to have greater dialogue between biologists, mathematicians and electronic engineers. “As an engineer, I’m convinced that the information theory is an appropriate tool for an interchange with molecular biology,” Battail wrote in the introduction to Yockey’s book in 2006. “We are still far from an interdisciplinary approach that allows conversation with other areas on projects of this type. However, we’ve taken a good step forward,” says Professor Palazzo.
Mathematical code for the generation and decoding of the DNA sequence and proteins: used in the identification of ligands and receptors (nº 2008/04992-0); Type Papi – Support Program for Intellectual Property; Coordinator Márcio de Castro Silva Filho – USP; Investment R$ 13,200.00 and US$20,000.00 (FAPESP)
FARIA, L.C.B., et al. DNA sequences generated by BCH codesover GF(4). Electronics Letters. v. 46, n. 3, p. 202-03. fev. 2010.