{"id":207464,"date":"2015-09-15T12:31:16","date_gmt":"2015-09-15T15:31:16","guid":{"rendered":"http:\/\/revistapesquisa.fapesp.br\/?p=207464"},"modified":"2015-12-28T12:44:15","modified_gmt":"2015-12-28T14:44:15","slug":"the-mathematical-structure-of-dna","status":"publish","type":"post","link":"https:\/\/revistapesquisa.fapesp.br\/en\/the-mathematical-structure-of-dna\/","title":{"rendered":"The mathematical structure of DNA"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-207466 size-full\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao1_final.jpg\" alt=\"castelli_equacao1_final\" width=\"290\" height=\"191\" srcset=\"https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao1_final.jpg 290w, https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao1_final-120x79.jpg 120w, https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao1_final-250x165.jpg 250w\" sizes=\"auto, (max-width: 290px) 100vw, 290px\" \/><span class=\"media-credits-inline\">Sandro Castelli<\/span>Scientific articles by a group of Brazilian researchers from the University of Campinas (Unicamp) and the University of S\u00e3o Paulo (USP) show that genetic sequences can have the same mathematical structure as the Error Correcting Codes (ECC) used in both broadcast and digital recording systems.\u00a0 ECCs are a set of commands built into the software installed in computer chips, telecommunications equipment, televisions and smartphones to correct digital information defects in such processes as telephone conversations or the storage of data on a computer\u2019s hard disk.<\/p>\n<p>The same mathematical logic, say the researchers, is found in the formation of DNA\u2014the deoxyribonucleic acid whose cells carry the genes and all instructions for development and survival of living beings.\u00a0 In the study, they compared algebraic equations of error-correcting codes with certain DNA sequences, attributing a numerical logic to the nucleotides that make up the genome: thymine (T), guanine (G), cytosine (C) and adenine (A).\u00a0 In doing so, they discovered that there are patterns that link the nucleotide to a number.\u00a0 Thus, depending on the type of sequence, A is represented by 0, C is 2, G is 1 and T is 3.\u00a0 In digital language, which consists of bits, the information is translated into 0s and 1s. \u201cWe have shown that DNA has sequences that follow the same mathematical structures and rules as digital communication,\u201d says M\u00e1rcio de Castro Silva Filho, from the Genetics Department of the Luiz de Queiroz School of Agriculture (ESALQ) at USP.\u00a0 \u201cThe DNA sequence is not random; it follows a pattern,\u201d he says.<\/p>\n<p>The group\u2019s most recent study was published in the journal <em>Scientific Reports<\/em>, from the publishers of <em>Nature<\/em>, in July 2015. The introduction states that the biological and digital communication systems have similar procedures for transmitting information from one point to another.\u00a0 According to the researchers, the information contained in DNA is copied (transcribed) as RNA that will use mathematical logic to direct the sorting of amino acids in the proteins required for cell function.\u00a0 In the study, the researchers presented a computational tool to better understand the evolutionary path of the genetic code by analyzing, for example, <em>Arabidopsis thaliana,<\/em> a plant widely used as a model organism in genetic studies, and the formation of nucleotides in groupings of three letters called codons.\u00a0 In rare cases this biological grouping \u2013 TGA, for example \u2013 presented differences that did not match the results presented by the ECC.<\/p>\n<div id=\"attachment_207468\" style=\"max-width: 300px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-207468 size-full\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao2_final_0002.jpg\" alt=\"castelli_equacao2_final_0002\" width=\"290\" height=\"330\" srcset=\"https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao2_final_0002.jpg 290w, https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao2_final_0002-120x137.jpg 120w, https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2015\/12\/castelli_equacao2_final_0002-250x284.jpg 250w\" sizes=\"auto, (max-width: 290px) 100vw, 290px\" \/><p class=\"wp-caption-text\"><span class=\"media-credits-inline\">Sandro Castelli<\/span>The letters and numbers in red indicate mutations in the genetic sequence<span class=\"media-credits\">Sandro Castelli<\/span><\/p><\/div>\n<p>In presenting the problem at the Brazilian Conference of Genetics in 2011, Silva Filho fielded a question from biologist Everaldo Barros of the Catholic University of Bras\u00edlia that helped him find a way forward. Barros wanted to know if the alteration in a DNA codon of a sweet potato (<em>Ipomoea batatas<\/em>) referred to an ancestral code.\u00a0 Silva Filho and electronics engineer Reginaldo Palazzo J\u00fanior, of the School of Electrical Engineering and Computer Sciences (FEEC) at Unicamp, another group coordinator, set out to find an answer.\u00a0 Working together with doctoral candidates Luzinete Cristina Bonani Faria and Andr\u00e9a Santos Leite da Rocha, they showed that the difference detected between the sequence derived from the error code and the biological sequence is a mutation that does not match the mathematical equations of the primordial genome of the sweet potato found in sequences of older organisms such as <em>prymnesophytes<\/em> algae or ancestral mitochondrial variants of the genetic code.\u00a0 Mitochondria are cell organelles that show traces of more remote genetic material. Therefore, only the oldest DNA is part of the equation.<\/p>\n<p>\u201cThe gene sequence that encodes the delta subunit of F1-ATPase protein of the sweet potato presents the TGG codon that encodes the amino acid tryptophan.\u00a0 However, the sequence generated by the mathematical code for the tryptophan codon was TGA, which would introduce a stop in the protein synthesis, impairing its function. Initially, the alteration generated by the mathematical code would be incorrect,\u201d says Silva Filho.\u00a0 \u201cWhen we determined that the amino acid tryptophan is ancestral and encoded by the TGA codon, everything came together and we were then able to understand that a mutation had occurred,\u201d says Palazzo J\u00fanior. This type of mutation had already been recognized through the biochemical process, but had never before been identified through a mathematical process.<\/p>\n<p>The researchers are now working on a phylogenetic study to learn more about the evolution of species from the mathematical and biological standpoint.\u00a0 They are analyzing genetic sequences to determine whether the mutations found present characteristics in individuals that are important for functionality of the species. Current studies are being carried out on plant and animal genomes to confirm whether in fact the mathematical model is closely related to the biological model.<\/p>\n<p>The discovery has led the group to file for an international patent on the utility model of the system they developed, already patented in the United States. \u201cThis mathematical structure may be important in the field of protein engineering for developing genetically modified organisms, new drugs, vaccines and altering the DNA sequence in future gene therapy systems, or even producing and discovering new proteins from the mathematical code,\u201d explains Silva Filho, an agronomist who holds master\u2019s and doctoral degrees in genetics and molecular biology and specializes in protein transport.<\/p>\n<p>It would also be possible, in a treatment for diabetes, for instance, to study the genes linked to the disease through a mathematical structure and correct the genes to eliminate the problem.\u00a0 Silva Filho predicts that the pharmaceutical industry will benefit greatly from this new way of envisioning DNA because use of the mathematical code will facilitate both the understanding of the disease and the formulation of drugs that are capable of more specifically targeting it.<\/p>\n<p><strong>Alterations in sequence<\/strong><br \/>\nMathematicians and computer scientists recognize the Brazilian researchers\u2019 code by the letters BCH, which are the initials of the Indian-born mathematicians Raj Chandra Bose and Dwijendra Kumar Ray-Chaudhuri and the French mathematician Alexis Hocquenghem who invented the code in 1959 and 1960. BCH is only one of several existing error-correcting codes.\u00a0 By using this code, biologists, biochemists and pharmacists, perhaps in collaboration with mathematicians, could conduct preliminary analyses using computer sequences to test the alteration of amino acids, proteins and mutations and then go to the laboratory to determine if the results are correct. \u201cThe existence of a mathematical structure in DNA sequences implies an enormous albeit feasible computational complexity in carrying out analyses and predicting mutations,\u201d says Palazzo J\u00fanior, who is an electronics engineer and works in the fields of information and coding theory.\u00a0 Today, this alteration process to produce a genetically modified organism or a medication is carried out through extensive laboratory tests.\u00a0 The function of the mathematical code in the biotechnological processes will be to minimize the occurrence of errors in the cell nucleus after genetic transcription of DNA to RNA, the ribonucleic acid that directs protein synthesis in ribosomes.<\/p>\n<p>The potential association between error-correcting codes and DNA sequences is not entirely new.\u00a0 One of the first scholars on the subject was Professor Hubert Yockey, who has been working in the field since the 1980s at the University of Carlifornia, Berkeley.\u00a0 Another researcher in the field is G\u00e9rard Battail, a retired professor from France\u2019s National Superior School of Telecommunications who has published several articles proposing the relationship between error-correcting codes and genomes. These scientists have demonstrated the process and proposed hypotheses but have not yet presented actual mathematical relationships with the DNA. The Brazilians have been able to establish this relationship in the protein-producing genetic sequences. \u201cBy understanding the mathematical structure of the protein-encoding gene, we can alter the order of the bases as well as correct any mutations or errors that could appear for it to revert to its original protein condition,\u201d says Silva Filho.<\/p>\n<p>The initial study came about in 2008 when Palazzo J\u00fanior challenged the previously mentioned two doctoral candidates to the task of modeling the transmission of information, in this case, proteins, between the cell nucleus and the mitochondria. In order to do this, Faria and Leite da Rocha sought out M\u00e1rcio de Castro Silva Filho at ESALQ. They established a dialogue and the two began testing some of the mathematical models of communications systems in order to find the one best suited to the biological model.\u00a0 After several months, they revealed their findings to Silva Filho.\u00a0 At first, he thought that there was just a coincidence between the sequences generated by the ECC and the biological model with regard to the amino acids.\u00a0 As the research progressed, more DNA sequences were obtained from different living things and the results stood, independent of the species.\u00a0 Assisting in the discovery were doctoral candidate Jo\u00e3o Henrique Kleinschmidt, a computer engineer and now professor at the Federal University of the ABC (UFABC), and more recently, biologist Larissa Spoladore, a doctoral candidate at ESALQ, and biologist Marcelo Brand\u00e3o, a professor at Unicamp.<\/p>\n<p>In 2009, Silva Filho, Palazzo J\u00fanior, Faria and Leite da Rocha submitted an article to the journal <em>Eletronics Letters<\/em>, which was published in the February 2010 issue (<a href=\"http:\/\/revistapesquisa.fapesp.br\/en\/2010\/12\/01\/lifes-equations\/?\" target=\"_blank\">see <em>Pesquisa FAPESP<\/em> Issue n\u00ba<em>\u00a0178<\/em><\/a>). \u201cNow, with the publication in <em>Scientific Reports<\/em>, we think the global biological sciences community will become more interested,\u201d says Silva Filho.\u00a0 \u201cAs far as we know from the literature available, no other group is conducting research on this, although there might be someone in the pharmaceutical industry developing something like this privately.\u201d<\/p>\n<p>\u201cAs in the case of many other scientific discoveries, there is probably a long road ahead before this is accepted and used. Clearly, they have made a huge leap and shifted the paradigm,\u201d says biologist Rog\u00e9rio Margis, a professor in the Biotechnology Center at the Federal University of Rio Grande do Sul (UFRGS). \u201cNew challenges will likely appear with the discovery of this pattern, which transcends the linear sequence of the bases and adds another layer of complexity and code pattern to the DNA molecule.\u00a0 Expanding this type of analysis will require extensive computational infrastructure,\u201d Margis notes.\u00a0 \u201cUp to now, the studies have not had the impact and repercussions the researchers had expected them to have within the scientific community.\u00a0 One problem is that the study, while unique, encompasses separate fields such as biology and mathematics that do not typically work together,\u201d he says.<\/p>\n<p>\u201cI\u2019ve presented the studies at events abroad, but I think there is a certain level of distrust for a number of reasons.\u00a0 The subject is extremely complex, few people are able to go back and forth between the fields of genetics and error-correcting codes, the group is made up of Brazilians and the 2010 study was published in a journal in the field of electrical engineering,\u201d Silva Filho explains.\u00a0 He thinks that increased interest in the studies has to come from people involved in molecular biology and biotechnology. On the mathematics side, interest would have to come from groups involved in information theory and communication.\u00a0 But this will only take place if multidisciplinary integration occurs as it did in the initial discovery.<\/p>\n<p><strong>Projects<br \/>\n1.<\/strong> Mathematical code to generate and decode DNA sequence and proteins: its use in the identification of ligands and receptors (<a href=\"http:\/\/www.bv.fapesp.br\/pt\/auxilios\/31139\/codigo-matematico-de-geracao-e-decodificacao-de-sequencias-de-dna-e-proteinas-utilizacao-na-identif\/\" target=\"_blank\">n\u00ba 2008\/04992-0<\/a>); <strong>Grant Mechanism\u00a0<\/strong>Program of Support of Intellectual Property Rights (PAPI); <strong>Principal Investigator<\/strong>\u00a0M\u00e1rcio de Castro Silva Filho (USP); <strong>Investment<\/strong>\u00a0R$ 13,200.00 and US$ 20,000.00.<br \/>\n<strong>2.<\/strong> Herbivory and intracellular transport of proteins (<a href=\"http:\/\/www.bv.fapesp.br\/pt\/auxilios\/6493\/herbivoria-e-o-transporte-intracelular-de-proteinas\/\" target=\"_blank\">n\u00ba 2008\/52067-3<\/a>); <strong>Grant Mechanism<\/strong>\u00a0Thematic Project; <strong>Principal Investigator<\/strong>\u00a0M\u00e1rcio de Castro Silva Filho (USP); <strong>Investment\u00a0<\/strong>R$ 1,392,217.77 and US$ 169,187.06.<br \/>\n<strong>3.<\/strong> System biology techniques applied to the agriculture: transcriptomes and interactomes analyses (<a href=\"http:\/\/www.bv.fapesp.br\/pt\/auxilios\/45170\/biologia-de-sistemas-aplicada-a-agricultura-analise-de-transcriptomas-e-interactomas\/\" target=\"_blank\">n\u00ba 2011\/00417-3<\/a>); <strong>Grant Mechanism <\/strong>Young Investigators in Emerging Institutions grant; <strong>Principal Investigator<\/strong>\u00a0Marcelo Mendes Brand\u00e3o (Unicamp); <strong>Investment<\/strong>\u00a0R$ 199,169.39 and US$ 3,846.15.<\/p>\n<p><em>Scientific articles<\/em><br \/>\n<span style=\"line-height: 1.5;\">BRAND\u00c3O, M. M., <em>et al<\/em>. <a href=\"http:\/\/www.nature.com\/articles\/srep12051\" target=\"_blank\">Ancient DNA sequence revealed by error-correcting codes. <\/a><\/span><strong style=\"line-height: 1.5;\">Scientific Reports<\/strong><span style=\"line-height: 1.5;\">. V. 5, No. 12051. July 2015.<br \/>\n<\/span>FARIA, L. C. B., <em>et. al<\/em>. <a href=\"http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0022519314003233\" target=\"_blank\">Transmission of intra-cellular genetic information: A system proposal.<\/a> <strong>Journal of Theoretical Biology<\/strong>. V. 358, p. 208-31. Oct. 2014.<br \/>\n<span style=\"line-height: 1.5;\">FARIA L. C. B.,<em> et al.<\/em> <a href=\"http:\/\/journals.plos.org\/plosone\/article?id=10.1371\/journal.pone.0036644\" target=\"_blank\">Is a Genome a Codeword of an Error-Correcting Code?<\/a>\u00a0<\/span><strong style=\"line-height: 1.5;\">PLOS ONE<\/strong><span style=\"line-height: 1.5;\">. V. 7, No. 5, and 36644. May 2012.<br \/>\n<\/span><span style=\"line-height: 1.5;\">FARIA, L. C. B. <em>et. al<\/em>. <a href=\"http:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?arnumber=5410653&amp;tag=1\" target=\"_blank\">DNA sequences generated by BCH codes over GF(4)<\/a>. <\/span><strong style=\"line-height: 1.5;\">Electronics Letters<\/strong><span style=\"line-height: 1.5;\">. V. 46, No. 3, p. 202-3. Feb. 2010.<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"Equations show similarities between the genetic code and digital systems","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[159],"tags":[209,237,246],"coauthors":[97],"class_list":["post-207464","post","type-post","status-publish","format-standard","hentry","category-science","tag-biology","tag-genetics","tag-mathematics"],"acf":[],"_links":{"self":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts\/207464","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/comments?post=207464"}],"version-history":[{"count":0,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts\/207464\/revisions"}],"wp:attachment":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/media?parent=207464"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/categories?post=207464"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/tags?post=207464"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/coauthors?post=207464"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}