Puzzles of complexity : Revista Pesquisa Fapesp

Emmanuel Dias Neto

The study of biology is perhaps as ancient as the emergence of the structures of the brain that have allowed for the establishing of language and the development of conscious thought. This elaborating process, of some millions of years of age, allowed humanity to occupy itself with an understanding of its origins and also of the processes related to life, sickness and death. Over the last fifty years, it has been possible to accompany fantastic discoveries in diverse areas of knowledge. In biology we have made a tremendous leap that culminated in the recently accomplished complete sequencing of the human genome. The ability to accumulate a vast quantity of genetic information has allowed us to understand the structural basis of various genomes, from those of primitive bacteria (called archaic bacteria) up until man himself, passing through fungi, parasites, worms, plants and biology models such as fruit flies and mice.

This accumulated knowledge has allowed us to reconstruct the evolutionary relationship of a large number of living species, recalculating the history of life on our planet. The capacity to read genomes fundamentally changed the study of biology, of medicine and of the various associated fields, influencing a varied group of industries that include pure chemistry, pharmacology and the agricultural industry among others. The large quantity of data produced represents a rich source of information that must be carefully studied, in such a way as to allow the most useful advances of our knowledge. The greatest impact of these findings are yet to come, and certainly will come when we truly manage to decode, understand and associate the information contained in our genome.

For this to happen we must, above all, be aware that the knowledge of the complete sequencing of a genome, in spite of being an important piece, is a long way from allowing, on its own the mounting of the intricate puzzle of our complexity. In order to understand what permits us to have this marvelous complexity ? our enormous behavioral repertoire, the ability to take conscious actions, our creative, musical and scientific capabilities, the capacity to learn, our memory, among others ? we cannot simply count upon our genetic load of 3.2 billion nucleotides and a number of genes not much greater than those of a fruit fly. We must be aware that the mastery over a genome means the possession of a map.

In the case of the human genome, a complex map, as yet not fully understood, that will help enormously in the search for the origins of diseases based on genetic variations, on diversity, complexity and on the behavior of proteins within cells. Neuropsychiatric studies show that in monozygotic twins separated at birth and raised in distinctly different environments, similar on the development of neuropsychiatric diseases is around 50%. This shows that under certain circumstances a balance exists between the importance of the genes and the environment in the determination of certain conditions. If on the one hand genetics plays a large part in the development of illnesses, environmental factors also have a considerable important role. Genetics is not absolute. We have still a lot to learn in the study of the interactions of the genome with the environment, as well as knowing about and unveiling the subtleties of our genome.

We already know a lot, but it?s still little

It is curious to observe that, even after the enthusiasm generated through the conclusion of the sketch of the sequencing of the human genome, a tremendous effort still needs to be done so that we can understand the significance of the immense majority of sequences obtained. One of the first questions that comes up: how can we identify the important regions of the genome? How can we determine its function in the organism? The regions of the genome with the most obvious function are the genes, which are found to be involved with the production of proteins. Nevertheless, these regions are restricted to around 3% of our genome.

The numbers of referred genes in the hyped workings that described the human genome reached, according to the lowest estimates, around thirty thousand ? some studies indicate that there are as many as one hundred and twenty thousand genes. Even with this small number, only in about half of them do we find some type of domain that allows the prediction of physiological activity. While it is believed that the estimates of the number of genes may well increase with the development of better computer programs of genetic prediction and with the accumulation of more experimental data, it is very clear that the number of genes is only one of the mechanisms that create the biochemical diversity necessary to make up the proteins.

In our genome, the genes are formed in blocks of inter exchangeable information called exons, which are separated by blocks without proteic information, known as introns. The exons can be recombined just as one would combine syllables to form a word, forming distinct messages. In this manner, the sequence of a single gene can start and end in different regions and its internal portion can be assembled by alternating different blocks, thus generating proteins with distinct functional characteristics.

These combinations (known as alternative splicing) represent an efficient mechanism for the generation of diversity without the necessity of maintaining an immense number of different functional genes. Besides the events of alternative splicing, diverse mechanisms known as epigenetics, both through the methylation of DNA or modification of histones, can change the expression of a gene. These events of epigenetic regulation regulate the activity of genes silencing their activity or remodeling the structure of the chromosomes, exposing or hiding determined genes in accordance with the necessity of their expression. In this manner a complex system of inter cellular regulation is set off, linking or unlinking genes in determined tissues or in specific phases of development.

Junk DNA ?

Genes are distributed in an uneven manner within our chromosomes. Sequencing data shows that some chromosomes such as Nos 17, 19 and 22, are rich in genes, when compared with chromosomes 4, 8, 13, 18 and chromosome Y. The packing of genetic material in the nuclei of our cells is a complex process, since the DNA of a single human cell is close to two meters in length. Some years ago it was discovered that the distribution of chromosomes in cells, in the process of packing, is extremely well organized. On the periphery of the cell?s nucleus remain the chromosomes with less genetic density, while the richer chromosomes are situated in the inner portion of the nucleus.

It has been determined that this chromosomal distribution has been regulated for some thirty million years since it is conserved in primates. This conservation indicates an important functional role. Some researchers suggest that the chromosomes that have more genes are found in the more central portion of the nuclei, and the others around them are protecting them from external mutational agents. Furthermore, various studies have demonstrated that frequent movement of the chromosomes of the cell nuclei occurs.

This dance of the chromosomes shows that the structure of DNA and its packing in the cells is not something rigid. The chromosomes seem to be able to move themselves, making the exchange of genetic material between themselves and the exposition of genes that must be active under determined circumstances, possible. The study of chromosomal distribution has even been suggested as a possible criterion for the diagnosis of cancer. If only 3% of the genome codifies proteins, could it be that the remainder of our DNA is an evolutionary leftover that serves only for protection? One of the ways to analyze our genome is to compare it with those of other organisms.

This is called comparative genomics. These studies start from the premise that a block of DNA conserved for millions of years must have some important function, which would be jeopardized if the sequencing were altered. This is called physiological conservation. Studies of comparative genomics demonstrate that approximately 95% of our genome is very similar (pursuing in fact close to 99% identical) with that of the chimpanzee. Nevertheless, our time of divergence (the period of time that separates us from a common ancestor) with the chimpanzee is only five million years.

Perhaps this period has not been sufficiently long enough for non-functional regions to have differentiated themselves, and we have had a passive conservation. When we deepen the comparisons and we investigate the similarities that we possess with the genome of the mouse, whose last common ancestor with man existed some one hundred and forty five million years ago, we see that a significant portion of this junk DNA is still conserved. If the chimpanzees are genetically very close and it is not possible to distinguish passive conservation from functional conservation, the mouse is very distant, which prevents the detection of changes in the most recently acquired DNA.

While the comparison with the genome of a distant organism, such as the mouse, offers an important window into genomic regions with functional potential, the long divergence between the two species does not allow the identification of some subtleties. The regions of the genome that have changed between these species and have allowed the evolution as primates ? and afterwards asHomo sapiens ? are not in the genome of the mouse and must be found by another means.

In an article published in the magazineScience at the end of February (Boffelli et al., 2003), a group of researchers compared non-coding regions of the human genome with similar areas of the genome of other non-human primates. The scientists discovered various conserved regions, even when species of tropical primates, very distant from our species, were used. They were able to identify conserved elements and to prove that they have functional activity: they act in the regulation of genetic expression between the different species. For this to occur, various species of primates were used, including the DNA of various Brazilian primates.

The importance of the biodiversity in order to decipher our genome became clear. Nevertheless, each genome possesses unique characteristics that for their functional reflexes allow us to differentiate the species. How can one investigate the functional regions (with physiological activity) unique to the human genome? We know that they are not restricted to the genes or to the conserved regions in other primates.

They are unique characteristics of our species. A very interesting study on this subject was carried out by researchers from an American company jointly with a researcher from the National Cancer Institute in the United States (Kapranov et al., 2002). Using the sequencing of the human chromosomes 21 and 22, the scientists designed small fragments of artificial DNA, copying all of the sequence of these chromosomes in short internals of thirty five nucleotides, the constituent blocks of DNA. The millions of fragments produced were used to investigate if human cellular lineages were producing RNA to complement these fragments.

The strategy was able to prove the functional activity of new regions and to allow a transcriptional analysis on a large scale of these two human chromosomes. To the surprise of everyone, an extremely high percentage of these fragments proved to be associated to mature RNAs of cellular lineages. The authors of the study demonstrated that the transcript active regions of our genome are, at least, ten times more extensive than we could have imagined. Perhaps these regions contain very rare genes, as yet not demonstrated through any technique, or regulatory molecules as yet unknown, but of central importance to an understanding of the physiology of our genome. In this manner, if before we had imagined that 3% of the genome contained genes, this work suggests that perhaps this percentage could be much higher.

Genes and new drugs

In the fraction of genes currently known, some hundreds code proteins that are potentially powerful for the treatment of diseases. Several of these proteins, as well as drugs based on monoclonal antibodies, are in the final phases of experimentation and some have already been tested on humans. This being the case, today they are looking for more efficient and less costly mechanisms in the production of medicines. One of the promising avenues is the genetic manipulation of food. To produce a more nutritious bean, corn with a human growth hormone or carrots with vaccines, are dreams that have been running round the heads of researchers for years.

These dreams are getting closer day by day, and an important step in this direction was announced some time ago by a North American company after associating itself with the renowned Scottish Roslin Institute (the same that amazed the world with the cloning of the sheep named Dolly), but this time to produce drugs within chickens? eggs. While animals such as goats, cows, sheep and rabbits have been used to produce medicines in their milk, the technology of the work with birds is surging forward with the promise of being faster, cheap and practically unlimited thanks to the capacity of egg production. The first product should be a monoclonal antibody directed towards combating melanoma, one of the most aggressive and common tumors that occur in Brazil. The mastery of this technology, allied to the discovery of the totality of human genes and the determination of their biological functions, permits us to imagine a promising future for this new form of production of medicines.

Polymorphisms of DNA

Every single one of the billions of human beings on our planet ? with the exception of monozygotic twins ? have their own and unique genome. In spite of being unique, the genomes of two unrelated human beings have an average identity of some 99.9%. In a genome of around 3.2 billion pairs of bases, the subtle difference of 0.1% represents a collection of some millions of nucleotides, responsible for our fabulous diversity. The greatest of these differences takes the form of substitutions or Single Nucleotide Polymorphisms or simply SNPs. The SNPs make up a key element for us to understand human genetic variability and its association with various illnesses. Recently there was an explosive increase in the number of SNPs deposited in public data banks.

Just about a year ago only the data bank named dbSNP, linked to the National Health Institute of the United States, had close to four million SNPs deposited. Today this number has grown by nearly 50% overtaking six million SNPs. Nevertheless, only 0.3% of these polymorphisms have already been studied in an in-depth manner and numerous polymorphisms still remain to be discovered. Scientific literature demonstrates the concern that there would be an immense number of polymorphisms yet to be revealed, but which are of great relevance, that would not be found only through the use of computer strategies.

This concern is due to the fact that the central portion of the genes will be only slightly represented (in the data of the so-called ESTs or expressed sequencing tags), while the data from the Human Genome Project is based on a very limited number of individuals, leading to a reduction of the population of polymorphisms. To this end, the Brazilian initiative for the generation of ESTs of the ORESTES type was extremely positive.

In the context of the Human Cancer Genome Project (HCGP, financed jointly by the Ludwig Cancer Research Institute and FAPESP), the Brazilian group produced some 1.2 million ESTs, derived from the internal portion of the genes, one of the largest groupings of worldwide data. Besides having been derived from various types of human tumors, these samples are of great value for having been derived from a population with a high index of ethnic mixture, contributing with variations difficult to find in more homogeneous populations.

The national data will be essential for us to reach a wide coverage of the analysis of clinically relevant polymorphisms, searching for associations between polymorphisms of DNA and illnesses and evaluating the levels of polymorphisms of our genome. These polymorphisms should be the key towards predisposition or protection for the development of numerous besides, as well as being directly associated with the way in which different people respond to medicines. For example, we already know that the difference of a single base in the sequencing of the APOE gene confers greater susceptibility to the development of Alzheimer?s disease and to cardiovascular illnesses. The knowledge of the effect of these alterations in our response to drugs opens up a tremendous space for genetic pharmacology.

We know that the cost for the development of a new drug is around US$ 600 million. Sometimes drugs, that are extremely efficient for the vast majority of the population, have to be withdrawn from the market as they provoke serious side effects in some people. These studies into polymorphisms will alter the evolution of medicine. In therapeutic medicines we can envision the end of the trial and error approach. In the clinical area, we can forecast that prevention will be privileged in relation to treatment, thus ending in a reduction of the cost of the treatment of illnesses. For this to occur, the interaction between scientific research and private initiative is fundamental, thus allowing for the translation and incorporation of scientific finds into the day-to-day life of common people.

In spite of the genome offering a series of discoveries as yet to come, the way already set out through scientific research allows for a series of current practical possibilities, ready to be implemented into the routine of our society. The study of polymorphisms will also allow us to calculate the diversity of DNA existing between individuals of different human ethnic groups. When we compare the DNA of two individuals of the same ethnic group, we see that the number of differences found is as frequent as the differences observed between individuals of different ethnic groups. In this manner science demonstrates that the concept of race, looked at from DNA differences, does not make the least bit of sense.

Diversity and individuality are the most fundamental characteristics of each human being. Data from the human genome is permitting these characteristics to looked upon in a very clear manner and is showing us that, no matter how unique we are in the Universe, we still we have a lot in common with the rest of humanity. Our genome can be looked upon as the patrimony of our species. Leaving aside an anthropocentric posture, DNA shows that all the forms of life known are coded by the same primary basic material, the nucleotides that make up the genomes.

This allows us a conceptual vision that the event of the coming forward of life on this planet must have been a unique event. The philosophical and even poetical vision tells us that we are all members of a large family, made up of various diverse forms of life on the planet. All of us had a common ancestor, and this must lead us to reflect about our posture when faced with aspects such as pollution, degradation of the environment and the preservation of life on the planet.

The future

In spite of the fact that we can always be surprised in our forecasts, the current stage of research allows us to glimpse how the scenario will be in fifteen to twenty years time. Within wide possibilities, a few things appear to be certain:

-We will have a wide list of human genetic products giving an enormous potential of restoration drugs (in a manner similar to insulin or recombinant human growth hormone available today), with dramatic preventative and curative effects on various illnesses.
-Shortly the patient?s medical file will contain a list with the status of various polymorphisms linked to genetic pharmacology, as well as the propensity for the development of a series of illnesses.
-Obtaining the complete genomic sequencing of an individual will be possible in a few years time and we must be ready to deal with the maintenance of this confidentiality in a responsible manner.
-Genetic therapy will become a reality for illnesses caused by the alteration of a single gene. Defective genes could be replaced by functional versions.
-The understanding of the genetic basis of complex illnesses will allow for the design of rational drugs, directed to metabolic paths that work inadequately, eventually making possible the modeling of preventative strategies.
-An understanding of specific genetic alterations of certain tumors will allow for the precocious diagnosis of the majority of human tumors.
-The genomic pharmacology industry will be established in an expanding manner, generating personalized medicine in which drugs are elaborated in accordance with the genetic features of different groups of individuals.

It is believed that by around 2010, effective genetic markers will be available for a large number of illnesses and human conditions. It is estimated that the cost of a diagnostic test, including a large list of markers, will cost around US$ 100. Test measurements that will permit an evaluation of genetic predisposition to certain diseases will become possible, society will face issues that involve the availability of this information to employers or to health insurance firms. Law should protect citizens from the misuse of this information, and we must question the validity of using this information at the moment of decisions on employing people.Over the next few years, the public will have more and more opportunities to carry out genetic tests and speculate about their genetic destiny.

It is urgent that legislation accompanies scientific advances, incorporating and using the fruits of these discoveries and imposing limits in the most delicate areas. Without public debate and the appropriate controls, people could be discriminated against because of their genetic characteristics. We need to discuss what genetics can and cannot do and what type of society we want. The double helix, with all of its beauty and simplicity, brought together in a definite manner biochemistry, physiology and genetics. Its structure offered an immediate explanation for the process of copying DNA, mechanisms for genetic inheritance, mutations and diversity. Nevertheless, the double helix has not as yet clarified details on the interaction between genetics and the environment.

Human individuality revealed through DNA has made various concepts come to be reviewed and for medicine to again focus on the individual. Predictions at the level of the population do not have the same prophesied power on the individual level. After having gone so far in the understanding of the “book of life”, perhaps this is a good moment for us to reevaluate our expectations and the very concept of our existence itself. If on the one hand we have our genetic individuality, we are also all very similar and similar to other forms of life on the planet. I am completing this text with a song that I came into contact with during a journey to the city of Recife. Popular knowledge surprises us, and I believe that the number of genes proposed by the verse below should be much closer than the thirty thousand suggested in the studies published inNature andScience .

“The world finds itself well advanced,
Science reaches progress without total,
In the wonderful research of the genome,
All of the human body has been mapped,
And there on this map all has been counted,
Eighty thousand genes we can count,

Science makes rain and wets,
Makes clones of sheep,
Makes copies complete,
But I doubt if science can make a poet,
Singing at the gallop on the sea shore?”

Geraldo Amâncio, Pernambuco, Brazil.

Emmanuel Dias Neto is a researcher with the Psychiatry Institute of the Medical Faculty of the University of Sao Paulo and one of the inventors of the ORESTES method.

Republish