At the heart of the genes : Revista Pesquisa Fapesp

On March 28, 2000, a group of São Paulo researchers published an article in the American scientific magazine Proceedings of the National Academy of Sciences (PNAS) telling their peers from abroad that they had developed a methodology capable of identifying fragments of expressed (or active) genes. It was an alternative form that is complementary to the conventional technique for getting ESTs, or expressed sequence tags, or active bits of genes, that had been created in 1991 in the United States. Conceived by two scientists who were then working in the São Paulo branch of the Ludwig Cancer Research Institute, Andrew Simpson and Emmanuel Dias Neto, the methodology was given the name of Orestes (Open Reading Expressed Sequence Tags). Since then, for revealing the central region of genes, Orestes has become a useful tool, above all in the quest for genes that prove to be active in only a few types of tissue, in Brazilian and international projects that are studying the genome of organisms or the expression of genes involved in several ailments. It was also used in the recently concluded study of the genome of the Schistosoma mansoni worm, which causes schistosomiasis.

In the printed issue of November 11 last, the same PNAS stamped in its pages an article that does a balance of the joint use of Orestes and of the more traditional technique for generating ESTs in studying genes connected with one of the most challenging of human diseases, cancer. The work was written by about 140 researchers from Brazil, the United States, Europe and South Africa who took part in two great ventures that analyze expressed sequences in tumors: the Human Cancer Genome Program, financed by FAPESP and by the Ludwig Institute, and the Cancer Genome Anatomy Project (CGAP), funded by the National Cancer Institute, of the United States.

In the six-page article, the multinational team of scientists sums up the results achieved by the Brazilian and American projects, which, using samples of health cells and of cells with tumors, have produced detailed information about the set of genes that are activated in tissues removed from seven parts of the human body: lungs, breasts, brain, head and neck, bowels, womb and kidneys. With the samples of cells from each one of these regions, at least 100,000 expressed sequences were generated. “We showed that, in this set of tissues, there is a surprising variability in the use of genes, which led to the discovery of rare genes (of limited expression) that may be important from the therapeutic point of view for treating some forms of cancer”, comments Simpson, one of the authors of the study, who was the coordinator of the Human Cancer Genome, closed off in June this year, and works today in the international headquarters of the Ludwig Institute, in New York. On a smaller scale, expresses sequences were obtained from other kinds of tissues, above all from the prostate and the ovaries.

To understand the role of genes in the development of the main types of cancer and their respective normal tissues, both ventures generated many ESTs. To be more precise, the Human Cancer Genome produced 823,000 expressed sequences, and the CGAP, 1.2 million fragments of genes. Together, the two projects, which entered into a partnership a few years ago, have generated more than 2 million ESTs. “This number is equivalent to more or less 40% of all the expressed sequences derived from human tissues deposited in public databases”, says Simpson. It is estimated that 2 million expressed sequences extracted from tumors and the respective healthy cells are related to 23,500 human genes, about three quarters of all the known genes of Homo sapiens. Among the tissues studied, cells from the lungs were the ones that showed the greatest number of active genes, 13,390. “In isolation, no type of tissue expressed more than 57% of the genes that were represented in our ESTs”, explains Dias Neto. The cells of the breasts used the lowest number of genes, 10,380. In the other tissues, the quantity of genes expressed was between 10,000 and 13,000.

For the reader not to get lost amidst so many figures, a few explanations are necessary about expressed (active) genes and the methodologies used for getting ESTs. The molecule of DNA of a person is identical in any one of the cells with a nucleus. Accordingly, any tissue has the same genome, the same set of genes. In accordance with the more recent forecasts, the human species shows about 29,000 genes. But the fact that all the types of human tissues have the same genes does not mean that all the cells use these genes in the same way. When activated, a gene dispatches, with the assistance of one other molecule (messenger RNA), a chemical recipe to produce the protein specifically associated with it. If the gene is not active, the protein is not synthesized.

As the article in the PNAS shows, in some kinds of tissues, and this is the case of the lungs, a larger number of genes goes into operation. In others, as in the cells of the breasts, a smaller quantity of genes is used. Or, in other words, expressed or activated. Accordingly, all the tissues have the same genes, in the same quantity. But each type of cell expresses or activates a particular subset of the total number of genes. This particular subset of active genes is called a tissue’s transcriptome. The quantity of genes activated in a cell also varies according to time. A tissue may express fewer genes at a given stage of development and more genes at another moment. “The more a gene is expressed, the easier it is to find its fragments (ESTs)”, likens Marco Antonio Zago, from the Ribeirão Preto School of Medicine at USP, the coordinator of the Clinical Cancer Genome Program, funded by FAPESP.

One of the main challenges for the researchers studying the genetic basis of tumors is to understand the change in profile in the expression of genes in healthy tissues and in cells with cancer. But what is the difference between the Orestes method and the conventional technique for generating ESTs? The Brazilian technique makes it possible to get information on the central part of the genes, where the encoding regions tend to be concentrated: these are the stretches of the gene that supply the chemical recipe needed for the production of proteins. In a typical human gene, it is estimated that a little more than 50% of its sequence of nucleotides (primordial chemical units) are part of the encoding region. Although it is important, the rest of the sequence is secondary. The traditional technique for getting ESTs, though, adopted by the CGAP, gives priority to seeking data at the extremities of the genes. For focusing on different points of the DNA, one in the middle and the other at the ends of the genes, the two methodologies have become complementary and have ended up by stimulating partnership between Brazilians and Americans in the area of cancer. “With Orestes, we have managed to get more rarely expressed genes (that are little used by a tissue) than with the traditional methodology”, says Dias Neto.

Besides being used to map active genes in the tissues, the techniques that work with expressed sequences lend themselves to other purposes. In the article in the PNAS , the researchers show that the ESTs can also be a useful tool in discovering mutations in genes apparently related to the development of tumors. After employing just the Orestes technique to analyze the sequences that form 1,127 genes suspected of being involved in the genesis of other types of cancer, they were, for example, capable of identifying 30 probable new SNPs, a specific kind of mutation. “Orestes is not an excellent method in the quest for this objective, but it can, without a doubt, be used for looking for SNPs”, Zago comments. Short for single nucleotide polymorphism, SNP designates the various forms that a nucleotide can take on. Actually, these possibilities are limited to the four nitrogenous bases that form DNA: adenine (A), cytosine (C), guanine (G) or thymine (T). Hence, when they announce that they have discovered an SNP related to a given disease, such as cancer, the scientists are saying that they have found a variation of just one base in a stretch of a gene that can increase the chance of this ailment occurring.

One other point explored in the article in the PNAS the fourth one spawned from the Human Cancer Genome Program published in this magazine focuses on the use of expressed sequences to determine the occurrence of a phenomenon known as alternative splicing. When it uses its information in a non-traditional way, a gene sends the cell a chemical recipe that is slightly different from usual. The result is that the cell produces a protein different from the one that would be synthesized originally. Like other mutations, certain forms of alternative splicing may be related to the emergence of diseases.

In the study in the PNAS , the scientists resorted to two distinct methodologies for analysis and estimated the level of this phenomenon in the group of 1,200 genes suspected of being involved in the genesis of tumors. The calculations suggest that between 21% and 47.5% of these genes show alternative splicing. The challenge now is to find out which of these alterations may be pathogenic and which are innocuous. “The Human Cancer Genome and the CGAP have made public much data about expressed genes in cancer”, explains Simpson. “It is now up to the researchers from these areas to work in depth on the data generated by these projects.”

The Project
Human Cancer Genome Program; Coordinator Andrew Simpson Ludwig Institute; Investment US$ 10 million (Fapesp) and US$ 10 million (Ludwig Institute)

Republish