In a Nature Letter (and I’m sorry if that’s behind a paywall–I can’t tell from here), researchers at the European Bioinformatics Institute and Agilent Technologies introduced a method of encoding information onto DNA. The concept is not terribly new, but I appreciate their thoughtful approach to addressing many of the problems in DNA data storage.
To test their method, the researchers encoded the works of Shakespeare, an excerpt from MLK’s “I have a dream” speech, a photo of their lab, the famous Watson and Crick DNA paper (which I just love the tip of the hat to those that made this possible) and their encoding procedure (totaling about 3/4 of a MB) onto DNA, then shipped it to Germany from the USA, and decoded the information with a stated 100% recovery rate.
I really appreciate the work of these researchers, because they’ve moved beyond the ‘because we can’ and have given us very good practical reasons why one would want to store information on DNA. Among them is that DNA fragments will remain viable for thousands of years with a fairly modest storage scheme. In addition, encoding onto DNA ensures we will not forget how to decode the information–it is likely that as long as we don’t bomb ourselves back to the stone age we will continue to be able to read and write DNA (unlike some of my own data from graduate school). Also, the data density is on the order of 2 petabytes per gram (that’s 10^15 bytes or 1000 terabytes). They even compare the costs of DNA storage over magnetic tapes.
Their encoding scheme was interesting to me as well. From the paper, it is my understanding that repeated nucleotides tend to break. So rather than encoding information into a base-4 system, they encoded it into trinary and used the last nucleotide almost as a dummy (that’s a bit of a simplification). They have about a x4 redundancy in how they write the data, and other clever methods, like indexing nucleotides on the ends of the fragments.
It’s research like this that drives industry. I’m excited about the implications of this. Sure, this method isn’t for rapid storage and recovery, but (as the authors point out) for applications like CERN, who produce massive quantities of data, this is a boon. Imagine other applications: encoding literature, music, art, scientific advances, all onto a storage medium that can last for tens of thousands of years. I think the benefits of this research become clear.
Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E., Sipos, B., & Birney, E. (2013). Towards practical, high-capacity, low-maintenance information storage in synthesized DNA Nature DOI: 10.1038/nature11875