Storing Data in DNA

From a blade of grass to the wing of a falcon, to the seed of a coconut to the stem cells in your bone marrow, almost all known life uses DNA as an instruction manual of sorts to carry out its representative functions. Recently, a team of biomedical engineers at Harvard led by Nick Goldman has successfully stored 739 kilobytes of hard-disk storage into synthetic DNA, sequenced it, and recovered the original content with 100 percent accuracy, according to Nature.

So what really is DNA? It is an acronym for deoxyribonucleic acid, a double-helical molecule that is found in the nucleus of our cells. Its hereditary nature manifests itself as one copy of a person’s genes are inherited by his or her mother, and the other from the father. Genes are linear segments on the DNA molecule that provide a blueprint of protein synthesis accomplished by an intermediary known as RNA, ribonucleic acid from an alphabet of four nucleic bases known as A, T, G, and C. The proteins synthesized therein take control of a myriad functions inside our body such as antibodies for the immune response to pathogens, enzymes for metabolic regulation, and hemoglobin for oxygen delivery, to name a few.

Goldman’s team encoded 5.2 million bits of information into DNA and developed a new code in which every byte (a string of eight ones or zeroes) was represented by a word of five letters that were each A, T, G, or C. The team broke the DNA into overlapping strings, each 117 letters long and indexed the information to show the respective location in the general code. The system was managed in such a way that the data was encoded in partially overlapping strings such that any errors in one string would be cross-checked against the other three strings. The strings were synthesized by Agilent Technologies in Santa Clara, CA and shipped to the researchers who were then able to reconstruct the files with complete accuracy.

Goldman’s team reported a projected storage capacity of 2.2 petabytes per gram of DNA in the Nature article as they were able to store 757 kilobytes of data into 337 picograms of DNA. That equates to 2,200 terabytes per gram or 2.2 million gigabytes per gram, a result of the complex interfolding of DNA into a super-dense form. 

“What struck me most,” said Dr. Joseph Coyle, Associate Professor and Director of Financial Mathematics, “was the willingness to experiment with the idea that may not see any ‘real-world’ results or success immediately.” Employing synthetic DNA in this endeavor to store information was “creative and innovative,” remarked Coyle.

Shivam Patel, a senior biology major, said, “This innovative technology is extremely useful in modern society. With a storage device that may fit in our cells, we may be able to expand our capacity of memory with further research in biotechnology.”

Nick Kulka, also a senior biology major, added, “This is an extremely impressive advancement in technology. The applications are awe inspiring and futuristic. One could potentially store information in organic material which can open up many doors in the area of biotechnology.”

This past decade is replete with examples of how storage media have evolved. We had the floppy disk capable of 1.4 megabyte storage at the turn of the millennium be replaced by compact disks boasting 700 megabytes. Recently blu-ray disks have commandeered popular storage with upwards of 50 gigabytes of storage per disk, not including the terabyte caliber hard-drives integral to today’s high end PCs. But in the end, none of these media edge the precipice that is DNA, capable of storing the equivalent of about 46,000 blu-ray dual layer 50 gigabyte disks per gram.

Despite the outstanding storage capacity, immediate concerns such as cost must be considered. One researcher in the team estimated that the cost to be about $12,400 to encode every megabyte of data, and $220 to read it back. But because the “Cost of reading and writing DNA has changed by a million-fold in the past nine years, which is unheard of even in electronics,” said George Church, a Harvard geneticist, in an article in the journal Science, the technique could be manageable for archives-based long-term storage that will rarely be accessed.

Goldman said in an interview with the Wall Street Journal, “In ten years, it’s probably going to be about 100 times cheaper. At that time, it [will] probably become economically viable.”

The true pragmatism of DNA as a medium for information storage however, stems from both its ubiquity and its longevity. Because you can “stick the DNA in a cave in Norway for a thousand years and we will still be able to read it,” archivists who are forced to keep investing in the latest equipment to update their archives in modern formats may experience significant savings, according to Nature.

As Goldman noted, DNA should also be “Apocalypse-proof.” He added that after a hypothetical disaster, future generations might eventually find the stores and be able to read them and quickly notice that this isn’t DNA like anything they’ve seen due to the absence of repeats and the constancy of the length of the code. “It’s obviously not from a bacterium or human…maybe it’s worth investigating.”

“Being involved in computational chemistry research here at Monmouth University, I am aware of the computer space and expense associated with storing data,” said Samantha Silvent, a senior chemistry major. “I feel that this discovery shows the incredible advancements and capabilities of current technology. However, I believe that anything based off of the human genome can be subjected to ethical considerations. I feel that it will be worrisome to individuals who already feel threatened by the over-use of technology in modern day society.”

The testament of DNA’s longevity was expressed in Steven Spielberg’s Jurassic Park where dinosaur DNA from a mosquito fossilized in amber several million years ago was used to clone a T-Rex. Granted our perseverance into this endeavor, exciting organic-based technologies that utilize information on new horizons may be at our grasp sooner than you think.

IMAGE TAKEN from economist.com