The dance of the genes : Revista Pesquisa Fapesp

HÉLIO DE ALMEIDABioinformatics, a branch of computer sciences dedicated to the creation of software and mathematical tools for the biological area, has just demonstrated a new theorem that may be useful for the study of the evolution of genomes. Researchers João Meidanis and Zanoni Dias, from the São Paulo State University of Campinas (Unicamp), and Maria Walter, from the University of Brasilia (UnB), have calculated the maximum number of times that two basic rearrangements in the inside of a genome can occur: movements of blocks of their genes and reversals in the sequences of base pairs (chemical units) that make up these genes.

According to the authors of the study, published at the end of last year in the Journal of Computational Biology, the answer to this question is equal to half the number of genes of the genome in question plus two. That is to say, in a genome with 100 genes, at the most 52 rearrangements of the kinds described above may take place. In this case, the calculation is 100 ÷ 2 + 2 = 52. If there is an odd number of genes, the result of the equation, a fraction, should be rounded down. “This equation, the main one in our work, is valid for genomes with three or more genes”, says Meidanis, the leader of the group of bioinformaticists carrying out the study. “It is a refinement of theorems proposed by other authors”.

The theorem may be useful in the study of the evolution of genomes as it makes it possible to compare them and so to see what they have in common and what the differences are. When undergoing rearrangements like those mentioned, a genome transforms itself and generates another, different from the original. From the evolutionary point of view, the distance between two genomes can be regarded as directly proportional to the number of arrangements that have been processed: the greater the quantity of operations of rearrangement, the greater the evolutionary distance between them. There are other ways of measuring evolutionary closeness between genomes, but this one was used as a parameter by the researchers in this work.

What Meidanis’s team did, therefore, was to calculate the maximum number of rearrangements – equivalent to the largest distance possible in evolutionary terms – that may separate two genomes that show a certain similarity. A genome with ten rearrangements with reference to its base genome, from which it was derived, is closer to its mother-sequence than a third genome, which shows 15 rearrangements. “Our contribution was to show that the maximum number of possible rearrangements between two similar genomes is smaller than used to be thought”, Meidanis explains. The equations advocated by other authors to deal with these issues invariably arrive at results that are numerically higher than those obtained with the theorem demonstrated by the computer scientists from Unicamp and UnB.

In practical terms, the equation proposed by Meidanis’s team can only be applied in a comparison of genes with very specific characteristics. The first condition: it only works when confronting pairs of genomes, one set of genes against another set of genes. If there are ten genomes to be compared, they will have to be analyzed two by two. Another limitation: it only makes sense to apply the equation to compare two very similar genomes – they should both have the same genes (at least three, for the theorem to be valid) and in the same quantity. Normally, genes compared in the laboratory do not display these ideal conditions for the theorem to be employed, but this is no cause for concern.

“The theorem is part of a broader theoretical model that still needs to be refined”, Meidanis explains. So, the equation can’t be effectively tested. “Yes, it can. We are going to use it shortly to compare genomes of viruses, which have many similarities amongst themselves”. Another possibility is to use the equation to analyze the evolution of genomes of chloroplasts (cellular structures responsible for photosynthesis in plants), mitochondria (organelles responsible for the production of energy), and perhaps a few bacteria. The rearrangements which the theorem takes on are given the technical names of transposition and reversion.

One more example can help to understand what these two operations consist of. Imagine two genomes, called X and Y, both with the same quantity of genes, five. Each one of these genes – different from each other and denominated by the numbers 1, 2, 3, 4 and 5 – appears just once in the genomes. In X, the genome taken as a reference, the sequence of the genes is 1, 2, 3, 4 and 5. In Y, because of an internal rearrangement, the standard pattern has been altered to 1, 4, 5, 2 and 3. In the second genome, the 4,5 block has positioned itself between gene 1 and gene 2. Technically, this operation of a rearrangement in the order of the genes in a genome is a transposition.

Then there is reversion, which is an alteration in the order of the base pairs that make up a gene (adenine, thymine, cytosine and guanine, or just A,T, C and G). To go back to the above example, consider that the sequence of base pairs of gene 1 in X, the genome of reference, is ATCG. After undergoing a reversion, a complicated process for the uninitiated in genomics, the resulting sequence in Y will be CGAT. “There are other kinds of rearrangements in genomes, but we did not take them into consideration in the creation of the theorem, which applies to more simple situations”, comments Meidanis, who founded Scylla, a bioinformatics company in Campinas that develops software for the area of genomics.

Republish