The explosion of data generated worldwide by a vast range of devices in a diverse array of contexts has led scientists to begin studying the use of synthetic DNA to store digital information. Deoxyribonucleic acid, commonly known as DNA, is the data storage system of most living things. It is a molecule that holds an organism’s genetic information and is found in all cells. Thanks to the study of DNA preserved in nature, we have access to biological information about the Neanderthals, who became extinct more than 30,000 years ago, and mammoths, which lived more than one million years ago.
The potential benefits of this new technology are significant. According to the DNA Data Storage Alliance, an association formed by global technology companies with the aim of stimulating the technical research and innovation ecosystem in this field, the data storage capacity of DNA is 115,000 times greater than the magnetic media currently used in data centers. In the same physical space as one LTO-9 magnetic tape cartridge capable of storing 18 terabytes (TB)—the equivalent of 18 trillion bytes—it is possible to store about 2 million TB in DNA.
“Facebook’s data center in Oregon, USA, occupies tens of thousands of square meters [m²], the size of a large shopping mall, to store around 1 million TB of data. All that content could be stored in just 5 grams of DNA, in a device that fits in the palm of your hand,” says Bruno Marinaro Verona, head of the Micromanufacturing Laboratory at the Institute for Technological Research (IPT) in São Paulo. Verona is leading a research project in this field in Brazil.
In addition to its vast density, DNA storage has other important attributes. “It is an environmentally sustainable system,” highlights Brazilian electrical engineer Luis Ceze, a professor at the Paul G. Allen School of Computer Science and Engineering at the University of Washington, USA. Data centers use huge amounts of electricity to run the equipment and air conditioning in rooms where files are kept on hard disks (HD) and magnetic tapes. DNA, however, can be kept at room temperature. In addition, current magnetic media are manufactured from rare-earth materials and petroleum derivatives, as well as needing to be replaced within a maximum of 30 years. Researchers estimate that digital data stored in DNA will be readable for thousands of years.
Advances in digital data storage are needed due to the enormous amount of digital information being generated by the ever-increasing use of information technology (IT), with computers and smartphones creating, processing, and exchanging endless types of data. According to a report by the American consultancy IDC, the world generated and backed up three zettabytes (meaning a 3 followed by 21 zeros) of data in 2010. By 2020, that figure had jumped to 64 zettabytes (ZB) and it is projected to reach 180 ZB in the next two years.
As stated by the DNA Data Storage Alliance, this is just the beginning of what has been called the information age, in which artificial intelligence and the internet of things will play an increasing role in day-to-day life, from health and education to commerce, driving, and running a factory. According to the association, just one self-driving car generates 15TB of raw data every eight hours. Not all of this data is stored, but a significant portion is kept for various reasons, such as public safety and automotive maintenance.
It is currently archived using remote cloud-based storage architectures. “Data centers consume about 1% of all electricity produced in the world. The IT industry predicts that consumption will rise to 30% within the next few years,” warns Hildebrando Lima, director of research and development at Lenovo in Brazil. “We need to create an alternative capable of reducing this impact, and DNA storage technology is the most promising option.”
Scientists involved in the development of the DNA data storage acknowledge that it will be some time before the solution is available to the public—estimates point to next decade. There are still a number of questions over the processes that will be used. “What we can say is that storing digital data in DNA is feasible,” says Brazilian electrical engineer Karin Strauss, senior research manager at Microsoft Research in Redmond, USA. “We have already done it in the lab and by all indications it seems to be possible on a commercial scale and economically, but we still have a long way to go to reach that goal.” Microsoft Research was one of the founders of the DNA Data Storage Alliance.
The biotechnology industry already produces synthetic DNA for data storage by the health sector, but the production scale is small and the speed of the process would be too slow for the needs of the IT industry. “Tens of megabytes [MB] can be written to a conventional magnetic hard drive per second. To store a single MB in DNA takes a day of work,” points out Verona, from the IPT. “All the studies have the same objective: to establish the best techniques for storing data in DNA and then to improve upon them,” says Ceze, from the University of Washington.
The process for storing data in DNA does not involve manipulating the genetics or cells of living organisms. The DNA is manufactured through chemical synthesis and each molecule is constructed specifically as the data files are generated. This means that to store 1 TB of data, a set of molecules with a storage capacity of 1 TB has to be synthesized (see infographic).
These new devices will consist of structures capable of synthesizing and sequencing DNA strands, which contain the digital information of interest. One possible design is to use microcavities, a few nanometers deep, in which the DNA molecules can be synthesized. Scientists have dubbed the first stage of manufacturing a DNA file bits to base. In this step, computer programs are used to convert bits—the binary system of 1s and 0s used in computing to represent characters, numbers, and images—into the four nitrogenous bases that make up DNA molecules: adenine (A), cytosine (C), guanine (G), and thymine (T) (see Pesquisa FAPESP issue nº 235). Inside the molecule, these nitrogenous bases form the two spiral filaments known as the double helix, which constitute the DNA.
As an example: 00 bits can be encoded as base A; 01 bits as base T; 10 bits as C; and 11 bits as G. To access the data and read the files, they just have to be decoded, converting the nitrogenous bases back into bits. This stage is known in the industry as base to bits. The data stored in DNA can be accessed online or locally. The bases are converted into bits by a fast computer process, although it is still slower than the time taken to read traditional magnetic files.
The main challenge still to be solved by the researchers is to improve the chemical synthesis methods used to write the codes to the nitrogenous bases while simultaneously building synthetic DNA molecules.
Léo Ramos Chaves / Pesquisa Fapesp
Reagents being prepared for injection into DNA synthesis equipment at the IPT Micromanufacturing Laboratory
Léo Ramos Chaves / Pesquisa Fapesp There are two established processes in bioengineering. The oldest is the chemical synthesis of phosphoramidite, created in the 1980s by American biochemist Marvin H. Caruthers. It is the predominant method for biomedical applications. The other potential route is enzymatic synthesis, which has been refined by several research groups over the last 15 years, but has not yet reached the commercial phase.
Enzymatic synthesis uses organic protein molecules that function as catalysts for chemical reactions, accelerating the speed of the processes. One of the advantages of this process is that it uses nontoxic, aqueous reagents. It thus has less of an environmental impact compared to phosphoramidite synthesis, which uses fossil reagents. Ceze believes that enzymatic synthesis, a technology still in its infancy, has greater potential for advances and is likely to prevail.
Microsoft has been researching DNA storage technologies since 2015, in partnership with the Molecular Information Systems Lab at the University of Washington. Of their various joint studies that have resulted in scientific articles, Strauss highlights two in particular. A 2019 paper published in Nature Scientific Reports highlighted the feasibility of automating chemical synthesis, eliminating the laborious manual DNA pipetting process (the transfer of liquids) that is used today. “Automation will generate scale and reduce the costs of the DNA storage process,” predicts the scientist.
Another joint article by the two institutions, published in Science Advances in 2021, presented a nanoscale DNA recording system that uses a molecular environment control method to enable a large number of unique DNA sequences to be generated at the same time (in parallel). “The result presented in this paper was the miniaturization of the sequence writing unit and the chemical process that controls it, so that more of these units fit on the same chip,” explains Strauss. Current DNA tapes cannot hold more than 300 nitrogenous bases, which means they store less than 30 bytes (a set of eight bits) per sequence.
In Brazil, the only research group in the DNA Data Storage Alliance is a partnership between the IPT and Chinese electronics manufacturer Lenovo. Named Prometheus, the project was started in 2021 and is led by Verona. The multidisciplinary team comprises 40 researchers, of which 13 have master’s degrees and 21 have PhDs, including biologists, computer scientists, and molecular, chemical, and materials engineers. The IPT is Lenovo’s only global partner in DNA storage technology development.
According to Lima, from Lenovo, the partnership has produced four international patent applications, one from the data encoding and decoding team, two relating to chemical synthesis, and one relating to enzymatic synthesis. “We have another six studies underway that will soon reach the patent application stage,” reveals the executive.
Verona believes DNA data storage technology will advance gradually and will not initially be available for computers and smartphones. At first, it will be used for storing cold data, meaning information that users do not routinely access in day-to-day life, such as historical records and photo and video albums. In current data centers, these files are stored on magnetic tape.
Frequently accessed data, known as hot data, are usually stored on hard disk drives. “There is still no reliable projection of when DNA storage will be able to serve as an alternative for hot data storage,” says Verona. Ceze predicts that several data storage systems will coexist simultaneously in the future, each with different characteristics best suited to different purposes.
Scientific articles
TAKAHASHI, C. N. et al. Demonstration of end-to-end automation of DNA data storage. Nature Scientific Reports. Mar. 21, 2019.
NGUYEN, B. H. et al. Scaling DNA data storage with nanoscale electrode wells. Science Advances. Nov. 24, 2021.
Republish