How Shakespeare and MLK Got Encoded in DNA

Here's how the process, outlined yesterday on the website of leading scientific journal Nature, works: The scientists took these writers' famous words, encrypted them using a cipher that corresponds with DNA's four nucleic acids (A, C, G, or T), synthesized strands of DNA according to that code, and chilled the resulting samples in dark, dry conditions, where they should last for millennia. Goldman tells NPR's Adam Cole that one of our generation's biggest problems—organizing and storing the deluge of data we face every day—could be solved using DNA:

The data we're being asked to be guardians of is growing exponentially. But our budgets are not growing exponentially ... We realized that DNA itself is a really efficient way of storing information.

This process shrinks information much more than existing formats like hard drives or magnetic tape. Or paper-bound books. Let's consider that a physical copy of Shakespeare's Sonnets from the Folger Shakespeare Library weighs 7 ounces. Project Gutenberg's digital version of the poems takes up 95 KB on your Kindle. That might seem pretty compact, but physical books and e-books are majorly inefficient storage methods when contrasted with genetic encoding. Shall we compare these to a strand of DNA? Goldman's team showed that they can fit the entire database of pioneering particle physics lab CERN (which holds approximately 90 petabytes of information) onto just 41 grams of DNA. In comparison, every sonnet Shakespeare ever wrote could fit on a mere speck of genetic material. 

RELATED: Personal Genomes Could Soon Be Public Information

These findings aren't necessarily new—Harvard geneticist George Church was able to encode a book in DNA last summer. And some adventurous poets are even using DNA to encode new original works. In Canadian poet Christian Bök's four-line Xenotext, the stanza "Any style of life / is prim" is encoded in DNA that always spits out proteins reading "The faery is rosy / of glow." But even Church acknowledges the strides made by Goldman and his colleagues. "I think it’s a really important milestone," he told Nature's Ed Yong. Currently, storing information in DNA is expensive. It costs about $12,400 to store every megabyte, and $220 to extract the information in readable form. But the expense is going down every year. "In 10 years, it's probably going to be about 100 times cheaper," Goldman told The Wall Street Journal's Gautam Naik. "At that time, it probably becomes economically viable."