For a moment man felt small. Exactly on the same day, the 12th of February, two groups that had been contending with each other for years for the right to finish quicker the sequencing of the complete human genome – the Public International Consortium and the private North American company Celera Genomics – broke, in a separate manner and in distinct publications, the same and surprising news. After having mapped close to 95% of the human genetic code, they estimate that man has close to 30,000 genes, three to four times less than what was previously imagined.
In the tangled data, analysis and opinions that beefed up the two initial blueprint of our genome, printed on the pages of the British magazine Nature (Public Consortium) and of the North American Science (Celera), this number caught everyone’s eyes. Only 30,000 genes! The species which dominates the planet, that has been capable of planting its flag on the moon and of journey back to earth, houses in each cell a little more than double the number of genes of worms and flies. The reaction of society was immediate.
General shock and jokes comparing Homo sapiens to small winged and crawling beings. Then, would this be the main conclusion to one of the most talked about and costly scientific programs ever carried out by humanity? Beyond this discovery, the data on the genome shows that the genes are distributed in an irregular form through the 23 pairs of chromosomes which make up the human genome. There are chromosomes with a higher concentration of genes and others with very few.
As well, the predominance of repeated sequences called junk DNA was noticed, whose function is still unknown. For the time being, it is interpreted as evidence that our genetic code has incorporated sequences from other species (bacteria for example) and has still not gotten rid of this material of doubtful use. The ending of a great scientific piece of work can frequently produce more doubts than certainties. This was what happened with the human genome, disseminating an apparent effect – only apparent – opposite to what was initially expected. From the moment that one analyses the two sketches of our DNA, a series of new, and old, questions come into play. Some of them follow below.
The end of the beginning is not yet finished
One needs to be crystal clear: the mapping of the sequencing, although it made the pages of Nature and Science, is still only a sketch (the second) of our DNA, albeit already showing contours very close to the final form. It is as if humanity had received an enormous library with a very precarious catalogue which doesn’t permit us to know how many books exist in all of the stands, nor does it separate out the important works from the mediocre and those of little value. None of the results indicated by the published studies in the two magazines is definite and unquestionable. The numbers are provisional and need to be calculated much better, the analysis still need to be refined and there are a series of open questions. “Generated by computer programs, the actual configuration of the genome, with this reduced number of genes, is an excellent hypothesis of how our DNA might be, but it is still a hypothesis.”, ponders Dr. Marcelo Briones, professor of biology and molecular evolution at the Federal University of São Paulo (Unifesp).
These shortcomings are necessary for a simple reason. Bluntly speaking, the researchers of the two groups have not reached the final point in the gigantic work into which they have propelled themselves: to establish the correct order of 3.2 billion nitrogenous bases – adenine, cytosine, guanine and thymine, represented respectively by the letters A, C, G and T – dispersed throughout the chromosomes. In June of last year, the conclusion of more than 80% of the sequencing of the human genome had been announced, but on that occasion, they didn’t put on paper their findings. Now they have taken a step forward. They perfected the initial sketch and wrote pages and pages in the two most influential scientific magazines on the planet about what they had found.
The Public Consortium states that they have broken down around 94% of the sequences of the nitrogenous bases, a percentage point less than that of Celera. Or that is to say, there are still considerable holes in our genome – holes which could make the difference, even more so when one knows that only 2% of the genetic material of man is different from the DNA of the chimpanzee. Furthermore, a little more than a third of the genes identified have an unknown function. The end of the beginning – a play on words used by many scientists to say that the sequencing of the human genome is the first stage, not the last, in the search for the deciphering of our DNA – is definitely not completed. So much so that the Public Consortium admits in the article in Nature that only in 2003 might there be a sequencing of the genome with fewer holes.
As a matter of fact, 2003 was the forecast date for the publication of the complete mapping of the genome. However, as Celera anticipated and decided to publish its work in Science , even though it was not yet complete, the Public Consortium also decided to disclose its material before the programmed time in Nature, the main rival of the North American magazine. “We couldn’t allow a private company to claim the credit for work which could not have been carried out without relying on the information of the public effort”, says Dr. Jean Weissenbach, Director of the National Sequencing Center of France, justifying the attitude of the Public Consortium initiative. The data of the Consortium – a network officially formed by laboratories of six countries (the United States, England, France, Germany, Japan and China), but in practice supplied by more nations such as Brazil – could be used, and in fact was – by Celera.
After all, how many genes do we have?
The two articles, one in Nature and the other in Science, cited the number of genes of Homo sapiens to be between 26,000 and 40,000. The so much commented total of 30,000 human genes is a kind of average of the consensus that seems to have pleased both the scientists of the Public Consortium and those of Celera. Before the publication it was estimated that our species had close to 100,000 genes. Some estimates went as far as 120,000, even 140,000 genes.
However, there are those who say that the present forecast of 30,000 genes is as crystalline and indisputable as the result of the last presidential election in the United States. They are betting that, sooner or later, there will be a recount. Scientists who have participated in the genomic projects in Brazil believe that there may be more genes as yet undetected by the mathematical computerized models of Celera and the Public Consortium. The total, they say, could get as high as 50,000 if they were to consider the so-called transcriptor genes which form the molecules of ribonucleic acid (RNA), the basis for the synthesis of proteins.
Dr. Andrew Simpson, the Coordinator of the Human Cancer Genome (GHC), a project financed by FAPESP and by the Ludwig Institute, is one of those who sustains this projection. Last year, his team applied on chromosome 22, one of the smallest and one of the first to be deciphered, the techniques used by the GHC, the ESTs or expressed sequences tags, which indicates only the active parts of the molecule of deoxyribonucleic acid (DNA). The result was that the group from São Paulo found 219 new transcriptor regions, which seem to correspond to close to 100 genes which had not at that time been described. It was so relevant that the discovery found its way into the pages of the issue edition of the 7th of November 2000 of the Proceedings of the National Academy of Sciences of the United States. “Our work, if it continues to be well done, may contribute to the determination, with precision, of the number of genes of the human genome.” comments Dr. Simpson.
Another indicator that the number of human genes could be subject to adjustments comes from another team from São Paulo that worked on the recently concluded genome of sugar cane, which partially mapped the DNA of this plant and tracked close to 80,000 genes (see Pesquisa Fapesp issue 59). By crossing the information obtained in the sequencing of sugar cane with that available through the GenBank, a data bank of all of the genomes concluded or in the phase of conclusion, the researchers linked to the FAPESP project found, who would have believed, between 200 and 1,000 genes as yet not identified in Arabidopsis thaliana , the first plant entirely sequenced at the end of last year. Even if figures are not precise, the result of the comparison could increase by as much as 5% the total of genes (25,000) forecast for Arabidopsis thaliana. If new genes were found for this plant, why cannot the same thing happen for the human genome?
Genetic determination versus environmental factors
The hypothesis that the human being has only 30,000 genes reactivates this old debate. The critics of genetic determinism, in general people and scientists in the area of the humanities, but also notable biologists such as Dr. Richard Lewontin, of Harvard University in the United States, gained some ground with the news of the supposed shortage of genes of Homo sapiens. With so few genes, how can we believe that all we are – physical appearance, tendency to illness and personal preferences – merely depends on DNA and relegate to the underground the role of the environment? More incisive critics decree the death of the concept of the gene, as did the Folha de S. Paulo (Leading Brazilian Newspaper) in an editorial shortly after the publication of the data of the Public Consortiumand Celera.
In what appears to be a change in attitude, Craig Venter, owner and main scientist of Celera, certainly one of the men who dreamt most of making money by the study of genes, has begun to put out a series of anti-bombastic declarations about the weight of the sequencing of As, Cs, Gs and Ts in our existence, after the publication of the article of his company. “The assembling of the sequencing of the human genome is only the first and hesitant step of a long and exciting journey in the direction of understanding the role of the genome in human biology.”, he has said. Or: “Two mistakes should be avoided: determinism, the idea that all the characteristics of a person are dictated by the genome; and reductionism, (to believe that) now that the human sequencing is totally known and that it is only a question before we know the functions and interactions of the genes which give a complete description of the cause of human variety.” (See the interview below).
Dr. Francis Collins, the main coordinator of the work of the Public Consortium on the human genome, did not follow Dr. Venter on his moderated discourse. Days after the publication of the article by his team, Dr. Collins participated in the annual meeting of the American Association for the Progress of Science in San Francisco in the United States, and insisted in propagating the cliché idea that DNA is the “book of life”, a metaphor that Venter, also present at this scientific meeting, made a point of rebuffing it. The two, who have always had so many differences of opinion – regarding patenting and the methods of sequencing of genes – managed to add one more item to their list of differences.
An area which comes out on top after the publication of the sequences present in our DNA, even though the analogies tend to throw human anthropocentrism down the drain is comparative genomics. It was already known that the size of the genome – the number of pairs of bases – does not bear a relation to the evolutionary status of the organism. A protozoon, the Amoeba dubia , has 670 billion pairs of bases in its genome – 220 times greater than that of the human being. Nor is the only creature which beats the Homo sapiens. Even the dog, our best friend, leaves man behind: the Canis familiares must have about 100 million pairs more than their owners.
Our quantity of genes, after the new projections were lowered to 30,000, is also no longer, by itself, a motive of pride for the species. The popular fruit fly (Drosophila melanogaster) has more than 13,000 genes; the worm Caenorhabditis elegans has 19,000; and the Arabidopsis has 25,000, without mentioning the sugar cane, partially mapped in São Paulo and with close to 80,000 genes and still counting. For those who believe that man is an unique species, the bad news springs up from all sides. According to the Public Consortium, the human chromosomes and those of a mouse show a lot of similarities: there are at the minimum 200 segments with at least two common genes and in the same order. “The comparisons with the mouse are going to help to identify new human genes.”, says Dr. Sandro José de Souza, the coordinator of bioinformation of the Human Genome of Cancer.
In genomic comparison, the DNA of the human is placed side by side with that of other organisms. The conclusions can be surprising. “Genetically speaking, we have more in common with plants than with fungi, differently from phylogeny which puts the plants and the fungi together.”, commented Dr. Carlos Frederico Martins Menck, ofthe Institute of Biomedical Sciences of São Paulo (USP). After analyzing 120 repair genes of DNA, the guardians of the genome, who fix the damages that occur in this molecule, he and his team found notable similarities and, at the same time, distinct differences between the genomes of man, animals, yeasts (fungi), bacteria and plants (sugar cane and Arabidopsis ).
The same gene can be found in diverse species, but the plants, for example, might have genes only found in man or types of bacteria, or not have the genes indispensable to other organisms. “The worst thing is when we find nothing in these comparisons of the genomes. In these cases, we have to look again until we are certain that there doesn’t exist anything really in common.” comments Valéria Rodrigues de Oliveira, of the Menck team. At the end of last year, she found for the first time a bacterium repair gene in humans – the some one that afterwards was seen working in chloroplasts, a part of the vegetable cell in which photosynthesis takes place. “The exchange of genes between organisms is much more intense than we had thought.”, affirmed Dr. Menck. “Genomes are mixtures of genomes.”
The question of race
The two almost complete sequences of the human genome brought with them new evidence that, at least from the genetic point of view, there are no significant differences that justify the notion of race to qualify human beings. The genome of one person is equal to 99.99% of its composition when confronted with the DNA of any other individual on the face of the earth – white, black, yellow or of indigenous origin. Celera estimates that the differences between the genome of two people can be summed up in 1,250 pairs of bases with distinct “letters”. The article by the company affirms that the genetic differences between people who belong to the same ethnic group could be even larger than between two individuals of distinct races.
This news, obviously, is good and contributes, hopefully one hopes, to diminish racial prejudice. However, one cannot interpret it in an erroneous form. Yes, we are all extremely alike in the interior of our DNA, but this is not to say that each ethnic group does not have specific genetic predispositions, which could increase or decrease the occurrence of certain diseases in these populations. Caused by a modification in the hemoglobin, as a resultant alteration in a gene, the falciform anemia has, for example, a greater occurrence in the Black population.
Also, another type of anemia, thalassaemia, also provoked by beta genetic alterations, shows a greater incidence in individuals of Mediterranean origin. “In some cases, people from different ethnic groups can have distinct responses to the same drug. This has to be taken into account at the moment of developing a medicine.”, says Dr. Mayana Zatz, the Coordinator of the Center of Studies of the Human Genome and a researcher at the Institute of Biosciences of USP.
Raw material for science
Although they are not the final version of the human genome, the two sketches of our DNA contain enough data in quantity and quality to push forward research in the most varied of areas for years and years. “Now we can make the medicines for the 21st century.” says Dr. Sérgio Danilo Pena, Professor of Immunology at the Federal University of Minas Gerais (UFMG) and Director of the Center of Genome Analysis and Classification of the A.C. Camargo Cancer Hospital, in São Paulo. It is abundant raw material for engineers, specialists in computing, mathematicians and philosophers. “Over the next few months, we shall begin to feel the impact of genome, perhaps in a diluted form.”, predicts Dr. Marco Antonio Zago, of the School of Medicine of USP at Ribeirão Preto. “Groups throughout the world are going to publish work using the information of the sequencing.”
As soon as the articles in Science and Nature came out, the physicist Dr. Murilo da Silva Baptista, of the Physics Institute of USP, called the biologists he knew. He didn’t leave them in peace until he managed to obtain the complete sequencing of chromosome X, one of the smallest, in order to study the pattern of the organization of the molecule of DNA and the cycles of repetition of the codons, a grouping of three bases which form amino acids, components of proteins.
The proof of the existence of a rule which controls the occurrence of codons could help to anticipate when one of them should appear again. “The human genome is extremely fertile ground and we only need to plant the seeds.”, says the physicist, who last year found these patterns of repetition when analyzing the genomes of the fruit fly drosophila and of the bacterium Mycoplasma genitalium. They are systems with their own rules but which, nobody has any idea why, present mathematical characteristics in common with the oscillation of the economic indicators of the stock market and with the behavior of the atomic particles electrically charged, the so called plasma.
The post-genomic era: transcriptome and proteome
With the work of the determination of the ordering of the 3 billion pairs of bases well under way, the new wave in the genomic area, or post-genomic, is the study of proteome, the set of proteins within an organism. As it is known, the genes produce proteins, molecules which form the cells and tissue, whose excess or scarcity can cause disease. Many known molecules are proteins: hemoglobin, insulin, hormones and neurotransmitters such as dopamine and serotonin – without mentioning the enzymes, indispensable for chemical reactions. The proteins account for close to 90% of the dry weight of blood, 80% of muscle and 70% of skin.
In man, the study of the interaction of proteins promises to be a task which will be even more complicated than that of deciphering the genome. One of the reasons is that nobody has a very clear idea of the size of our proteome. In contrast with DNA, identical in whatever part of the body, the proteins produced in one type of cell are not the same as those found in others. Since man has close to 100 trillion cells, this time it will be difficult to quickly arrive at a consensus of opinion.
From now onwards, the estimates vary between 100,000 and 1 million proteins. Both the public consortium and Celera have begun research into this field. In Brazil, there are few groups specialized in this area. Outside of São Paulo, one of the few is the Brazilian Center of Services and Research into Proteins at the University of Brasilia (UnB).
For some scientists, the race towards proteome is inevitable, but it is occurring in a hasty way. “We should do things differently.” says Dr. Simpson. Before tackling the proteins, they say, it would be suitable to understand the transcriptome, the group of genes that as they are expressed, generate molecules of RNA necessary for the synthesis of proteins. When they manage to demonstrate that a region of DNA is a transcript, the researchers establish that there is one or more genes.
The Initiative for the Validation of Human Transcriptome, a recently initiated project between FAPESP and the Ludwig Institute, intends to find 4,000 transcript genes over the next two years. Thirty one laboratories are participating in the project , budgeted at US$ 1 million. “For us, the fact that Celera and the Public Consortium have not finished all of the work, was an excellent piece of news.” says Dr. Anamaria Aranha Camargo, of Ludwig, one of the Coordinators of the Human Transcriptome project. “We’ve still got a lot of genes to find and validate.”
Between cities and deserts
Not everyone took up graphs, tables and long scientific dissertations in order to understand the human genome. Dr. Bob Waterson, Director of the Genome Center of the University of Washington in St. Louis, United States, didn’t hesitate in putting forward metaphors with the proposition of clarifying the irregularity with which the genes distribute themselves along the human chromosomes. “In some regions the genes are pretty piled up, like the buildings in a city.” he said “But there are as well large deserts where the junk DNA can be found, and each region contains unique information about the history of our species.” This scenario strongly contrasts with the genome of other species such as the Arabidopsis, the C. elegans or the Drosophila – much more uniform, sprawling out into suburbs, with a relatively regular distribution of genes on the chromosomes.
The urban centers, dense in genes, are constituted primarily by blocks of two nitrogenous bases, guanine and cytosine, G and C. However, the deserts or junk DNA areas are rich in adenines and thymines, A and T. On each chromosome there are long stretches of GC, one with a density of 60% and the other with only 30% for example. This never happens in a pattern and constitutes what Dr. Waterson calls neighborhoods, with distinct accents. “It is as if the regions of genes and of junk DNA had made an agreement, so that the former would occupy he cities and the latter the deserts.” says Dr. Eric Lander, Director of the Genome Research Center of the Whitehead Institute of the United States.
Close to the cities there are stretches in which only the bases G and C repeat themselves, 30,000 times or more. There are the islands CpG, only slightly represented through out the genome, which help to regulate the functions of the genes. Another peculiarity is that each human gene can originate, on average, three proteins, more than those of the worms and the flies. It is a consequence of the so called alternative splicing in which the parts of a protein can be rearranged in a different manner – ABC, CBA or BAC in the hypothetical case of only three elements -, like pieces of a toy to be put together. This process is possible because the genes are spread out along the DNA and the regions which codify the proteins are not necessarily continuous.
The human species has expanded the families of proteins. It is calculated that close to 60% of the families of human proteins contain more elements than in any other species. And the majority of the groups of proteins are associated with physiological functions more developed in the vertebrates. The Science article lists 247 genes which are development regulators or are associated with the nervous system, the interaction of proteins, to signal molecules or to the reply of the immunology system, in man, in the Drosophila, in C. elegans , in yeast and in Arabidopsis – and we beat them superbly.
The refinement is evident: only the human organisms, among those that have been studied, produce genes of interleukin, a type of antibody. We also have close to three times more a group of proteins which regulate the response to infections, called immunoglobins than the fruit flies and worms do, and they are absent in fungi and plants. Also we have more or less ten genes which belong to four families of proteins involved in the production of myelin, the coating of nerves; the fruit flies have only one of these genes and the C. elegans none. An analysis of the genome detected the presence of remnants of a migration which occurred in our first vertebrate ancestors.
As they had little defense against parasitic invaders, the bacteria were able to become resistant in the interior of the organisms. The result of this coexistence is that the human genome houses close to 200 genes which appear to come from bacteria or from intermediary genomes of viruses, though the hypothesis that that the bacteria might have also stolen genes from ancestral vertebrates cannot be entirely discarded. According to the article in Nature , close to half of the genome is derived from the so called jumping genes or transposons – genes which leap from one point to another of the chromosome or even from one chromosome to another and regulate the function of other genes.
Another interesting point is that which the scientists are calling “the mania of collecting things”, in contrast to other species. The quantity of junk in our genome exceeds that of much older species, with the exception of the ameba. There are repetitions in half of our genome, much more than in Arabidopsis (11%), in C. elegans (7%) and in the Drosophila (3%). “This fact suggests that we are very lazy in cleaning out our home.” comments Dr. Arian Smit, a bioinformation specialist at the Institute of Biology Systems. It has been calculated that the drosophila cleaned its house close to 12 million years ago, while the mammals about 800 million years ago.Republish