Imprimir Republish

Interview

José Nelson Onuchic: Genetic structure modeler

Brazilian physicist who specializes in protein and DNA folding helped restore the genome of the woolly mammoth

Onuchic during a visit to São Paulo in early October

Léo Ramos Chaves / Pesquisa FAPESP

José Nelson Onuchic enjoys having the freedom to move between different research topics. “The beauty of academic life is being able to study whatever you want,” he said in an interview with Pesquisa FAPESP on October 9.

Onuchic, a researcher at Rice University in Houston, USA, was in Brazil to participate in a symposium on molecular biophysics. He also visited the Brazilian Center for Energy and Materials Research (CNPEM) in Campinas, where the Sirius synchrotron light source is based, to propose a collaboration. He is interested in helping Brazil make use of a genome sequencing technique developed by his colleagues in the USA and in using Sirius to analyze the three-dimensional structure of genomes.

José is the son of two mathematics professors: Nelson Onuchic [1926–1999] and Lourdes de La Rosa Onuchic, who remains active at 93 years old. He earned his degree in electrical engineering in 1980 and in physics in 1981, both from the University of São Paulo (USP) in São Carlos.

After completing his PhD at the California Institute of Technology (Caltech), supervised by physicist John Hopfield, winner of the 2024 Nobel Prize in Physics, he returned briefly to São Carlos before becoming a professor at the University of California, San Diego (UCSD). There, he began studying protein folding. Together with colleagues, he proposed two concepts that earned him great recognition in the field. In 2011, he moved to Rice University in Texas, where he began a line of cancer research. Aged 66, he has published more than 400 scientific articles, cited 45,000 times by other authors, and is codirector of the Center for Theoretical Biological Physics, which focuses on studies at the frontier of knowledge. You can read the highlights from the interview below.

In October, you took part in a symposium on molecular biophysics in São Paulo. What did you talk about?
The structure of the genome, how it is organized in the cell nucleus. Most of the time, the DNA molecule [which contains the genes and non-protein-coding sections] is in the form of chromatin, which is DNA loosely wrapped around proteins. The three-dimensional structure of chromatin is important for controlling which genes are read and when. It can hide some genes while exposing others to the reading mechanism of cells. Conventional books on molecular genetics do not talk about the three-dimensional structure of this molecule, an area of knowledge that is changing rapidly.

In 2016, you published an article in PNAS about how the structure of DNA changes and allows the gene to be read, right?
Exactly. We started working in this area with Erez Aiden from Baylor College of Medicine, who uses a technique called a Hi-C contact map, which allows us to identify sections of chromatin that were distant in the DNA molecule, but spatially close in the folded strand. We used this two-dimensional contact map to generate the 3D structure of chromatin. The model has evolved, and today we no longer need Hi-C. Now we take segments of 50,000 base pairs of the genome—10 times the size of a gene—and separate each segment into categories. In the simplest version of the model, we divide the chromatin into two categories called euchromatin A, which is less condensed and more fluid, where more genes are expressed, and heterochromatin B, which is more folded and holds less genetic information. The model also uses information from the ChIP-seq technique, which investigates how proteins are associated with DNA, to obtain epigenetic information (gene activation patterns) and classify chromatin types.

How does it work?
The model has four parts. Three are more important than the fourth. First, we consider DNA as a soft polymer that can be cut, as occurs naturally because of certain proteins. Then the model separates types A and B by similarity. Type As tend to group with As and Bs with Bs. The third part is the ideal chromosome, in which spatially close parts of the chain tend to attract each other. This creates local points of contact, which leads to compaction and the formation of helices. Increasing the local points of contact prevents the formation of knots. This is important in chromosomes. When its strands are expressed, they need to be unfolded, and you do not want knots to appear. The way chromatin is folded prevents this. Part of the local compaction is what we call motorized, promoted by proteins. Ninety-five percent of the time, the cell is in the interphase, during which the chromatin is unwound and duplicated. In the other 5%, it enters mitosis, which means cell division. In mitosis, compaction increases and the chromosome is more coiled. Our model shows that when motorization is increased, compaction and helix formation also increase.

This is all governed by the chemical and electrical characteristics of the molecules?
That’s right. But it is important to remember that the genome has very specific functions, which need to be preserved by its conformation. The genome must have the capacity to be transcribed [read by the cell and encoded in the form of RNA] in order to duplicate and separate. Although our models treat it as a polymer, just like proteins, the functionality of the genome and proteins is distinct. Proteins are more rigid, while the genome is more malleable. We are now trying to understand how the structural characteristics of the genome are important to its functioning. This has already been done for proteins. For a long time, scientists have been studying protein folding to define their structure. But ultimately, what we want to know is the function of the protein.

But in the case of proteins, structure defines function, right?
Many biology books talk about the relationship between structure and function. But understanding the function is more complicated than that. Many proteins have multiple structures. Some proteins are ordered. Others are disordered and become ordered when they bind to another molecule. When we began to identify the first protein structures, we started with the easiest ones—enzymes—which have a well-defined structure. With enzymes, function is strictly related to form. But proteins are much more diverse than that. These polymers, made by combining 20 types of amino acids, have an incredible ability to assume different structures. They can be enzymes, signalers, fibers.

How do physicists help biologists understand this complexity?
Protein folding can be analyzed in two ways. Biologists ask: if I give you this sequence of amino acids, can you determine the structure and function of the protein? As a physicist, I think differently: from a sequence of amino acids, is it possible to know whether or not there is a defined structure? What I am more interested in is knowing what differentiates a protein from a random polymer of amino acids. From this perspective, I proposed the concept of the folding funnel in 1992 [in an article in the journal PNAS, written with Peter Leopold and Mauricio Montal of the University of California, San Diego]. In 1995, Paul Wolynes, Joe Bryngelson, Nicholas Socci, Zaida Luthey-Schulten, and I finalized the energy landscape theory.

What do these ideas propose?
Since the 1960s, there has been a paradox about protein folding. Based on studies of small globular proteins, it had been assumed that they fold and reach their final configuration when they reach a state of minimum energy. But proteins can assume such a large number of conformations that it would take a very long time to find this state. Any given polymer has several minimum energy states that are structurally very different. For proteins, we proposed the idea that having one state that is more attractive than others is not enough. They need to be more attractive to the desired structure and less so to those they do not want. The number of favorable states decreases as they approach the most stable structure.

Hence the idea of a funnel?
Yes. There is also a concept known as configuration entropy. If a protein is above its folding temperature, there will be several possible states. If it is below, there will be a state that begins to prevail and is more stable. This state has enough attractive energy to offset the configuration entropy it is competing against. In other words, there is a temperature at which energy wins over entropy. However, this temperature that favors folding needs to be reached before the protein molecule reaches the so-called glassy state and loses mobility, becoming trapped, as if in a trap. The energy landscape theory says that there is an attractive state that is much deeper than these traps. It is possible to understand a lot about a protein using this hypothesis: if a protein folds, it has managed to optimize the stable structure over others. Based on this idea, I used the protein structure to create a model in which all native [stable] states are attractive and non-native states are repulsive. With this model, it is possible to understand the entire transition and every intermediate state of the folding process. In an article published in the journal Science this year, my team and another led by Paul Whitford at Northwestern University collaborated with a group led by Walther Mothes, of Yale University. Mothes had used cryogenic electron tomography to determine the structures assumed by the spike protein of the novel coronavirus during cellular invasion. But their results were in a low resolution. We had already made our own model of the cellular invasion process and we noticed that our spike configurations were very close to the results they obtained. So we worked together to create a model combining experimental data with our simulations.

And what was the conclusion?
The spike is on the surface of the coronavirus. When it attaches to the receptor on the surface of the human cell, the spike undergoes structural changes that bring the virus membrane closer to the cell membrane.

Has this mechanism been seen before in other viruses?
The first protein we saw this in was hemagglutinin, in the influenza virus.

It is hard to see because everything is always moving.
That is why it is necessary to use antibodies that bind to the intermediate states of the spike to remain stable. We are interested in this. If I have an antibody that binds to an intermediate state, I can modify that antibody so that it stops the transition to the final state. If you block this transition, you might be able to prevent the virus from entering the cell.

You recently participated in a study on the woolly mammoth genome. What did you discover?
The people behind the project had been working on it for years, and they needed to do some modeling with the data, so they called us. They had to develop a way to collect DNA samples from the animal, which had been frozen for 52,000 years. After doing so, they made a Hi-C map and saw some of the genetic structure was preserved. This DNA was then frozen and dried. Although the chromosome was broken into several segments, the pieces had not moved much and the 3D structure of the DNA was partially preserved. With Hi-C and modeling, we were able to recover this structure.

We used information from the elephant genome to reconstruct that of the woolly mammoth

How did your group help?
We said: “Although this DNA is 52,000 years old, we believe that the rules of how it is organized and how the sections are connected remain the same.” We then decided to look at the genome of the mammoth’s cousin, the elephant. We used information from the elephant genome to reconstruct that of the woolly mammoth. Then, with our genome models, we were able to greatly reduce the noise level in the sample and generate the genome structure in three dimensions. Next, we looked at which parts of the mammoth genome were more active and which were less so. We also looked at which genetic regions were active in the mammoth and not in the elephant. One is the region that generates hair, which is much more abundant in mammoths. This indicates that our work was consistent.

What is your opinion on the current research landscape in Brazil?
We are not paying enough attention to basic research, setting ourselves up for failure. Basic science is the repository of society’s intellectual capacity, while applied science, which is also important, generates wealth. In today’s world, national security is more associated with the control of knowledge than with military power. In Brazil, everyone wants to file a patent. The number of unnecessary patents written in this country is incredible. They are important, but we should place a greater emphasis on controlling knowledge.

Are things not the same worldwide?
No. In the USA, we only occasionally write patents. Everyone knows that maintaining good patents is not easy. It is expensive, you need a legal body to protect them. You have to be selective. But this is just part of the learning process. Intellectual property protection laws in Brazil were weak. We are only now learning how to do it properly.

How many patents does your group file per year?
One patent every four or five years and we write about 15 articles per year. What I said about patents also applies to scientific articles. There is a tendency to look at scientific production or patent numbers when promoting a researcher. But I ask: “Has anyone actually read the articles? Do you know what they’re about? What is their impact? It is easy to get hold of these numbers. It is part of the process. But I think things are moving in the right direction.

In Brazil too?
In Brazil especially. The quality of Brazilian science has improved a lot. There are several groups that have stopped merely following others and now compete on equal terms at an international level. This applies to most of the RIDCs [Research, Innovation, and Dissemination Centers funded by FAPESP], for example.

In a few days, you will visit Sirius in Campinas. What for?
I am going to speak with Antônio José Roque da Silva, general director of the CNPEM [Brazilian Center for Energy and Materials Research], and other researchers from the National Biosciences Laboratory. We want to establish a partnership between our group, Erez Aiden’s group, and the CNPEM. The aim is to make use of Sirius’s window of opportunity, the capacity of which exists in very few places in the world. Sirius can perform tomography with a resolution that other equipment cannot. We want to use it for genome tomography.

We are not paying enough attention to basic research, setting ourselves up for failure

What do you want to look at in the genome?
Erez Aiden and Olga Dudchenko, collaborators in our group, are developing a project called DNA Zoo. They want to sequence the DNA of various plant and animal species. The challenge of sequencing the genomes of these species is that there is no reference for them. When sequencing these genomes, there will be many errors. Dudchenko developed a method that uses the Hi-C technique [which allows you to know which parts are spatially close] to correct errors during alignment. It is a much cheaper way of sequencing. She and Aiden are using the approach to sequence the genomes of several species. We want to transfer this technology here, to sequence the genomes of Brazilian species.

What will synchrotron light allow you to see?
How the genome appears in three dimensions in the nucleus of cells.

Why is this important?
The genome is spectacular. During cell division, it undergoes an enormous structural change. As a physicist, I want to know if this system can become uncoordinated and then shape itself again, or if it has a memory that allows it to leave the state it is in during the interphase, move to mitosis, and then return. If it has memory, how does that work? What is the consequence? What are the control mechanisms?

Why did you decide to move from the University of California to Rice?
There were several reasons. In science, as in everything in life, you can have a midlife crisis. San Diego was very good to me. I arrived in 1990, they made me a tenured professor in 1992 and full professor in 1995. In 2006, I became a member of the National Academy of Sciences. I was 54 years old, I had done a lot of work in the proteins field, and I thought, “If I stay here, I will be doing this for the rest of my life. Why not make a change?” At the time, Rice offered me the chance to study cancer, with a budget of US$10 million from the Cancer Prevention and Research Institute of Texas (CPRIT) to start a theoretical project relating to cancer.

What do you study about cancer?
We made metabolism models using genetic networks. When I got there, I started working with Eshel Ben-Jacob [of Tel Aviv University, Israel] and we proposed an explanation for the gene activation that leads cells to become invasive in cancer when they undergo epithelial-mesenchymal transition. In this transition, epithelial cells [which are static, such as skin cells or cells in the lining of internal organs] undergo biochemical changes and acquire the characteristics of mesenchymal cells [capable of migrating and invading tissue]. We showed that it is governed by the interaction between a gene and a microRNA. This interaction allows for the generation of hybrid states, in which cells have both epithelial and mesenchymal characteristics simultaneously. We showed that this can occur when cells are subjected to stress. Hybrid cells are able to move and join together, creating clusters, which are more difficult to destroy.

Republish