Scientists store digital images in DNA and retrieve them perfectly

jergosh · on April 8, 2016

I wasn't directly involved but the research group I'm part of did some earlier work mentioned in this paper (ref. 10) and I would be happy to answer any questions people may have.

panic · on April 8, 2016

This is really interesting stuff! One of the potential applications they call out is long-term data storage. How do you know that the stored data will actually last hundreds of years?

skosuri · on April 9, 2016

Also not part of this study, but an earlier one (ref 6). Basically DNA has been recovered from corpses that are thousands of years old. If stored properly, it is at least extrapolated to last up to millions of years [1].

1. http://onlinelibrary.wiley.com/doi/10.1002/anie.201411378/fu...

lyle_nel · on April 9, 2016

What sort of supporting infrastructure is required to reliably store it for a long time? Infrastructure such as cooling, dessication and warehousing.

jergosh · on April 9, 2016

You don't actually need to that much, as long as you can keep it dry and reasonably cool -- one idea we had kicking around was to use a cave of the sort that's used for long term seed storage (https://en.wikipedia.org/wiki/Svalbard_Global_Seed_Vault)

andrewtbham · on April 9, 2016

what is the read / write time like?

jergosh · on April 9, 2016

In principle a few hours but currently you have to outsource synthesis (i.e. writing) to a company and its an expensive process. Sequencing (reading) can be done more easily and has been dropping in price faster than Moore's law predicts.

jdsjds · on April 9, 2016

hey there, i have two questions on your earlier work.

the assumed context is sending/reading a message via DNA is equivalent to de novo sequencing with 100% accuracy.

error-correcting via 4x overlap:

how many insertions, deletions, substitutions can it correct for? are some combinations harder to fix than others? for example, three insertions much worse than one deletion, or 5 substitutions, etc.

information storage / information blocks:

i'm guessing the 100bp segments have to do with the limits of hardware sequencing, but what limits the overall message size to 739Kb?

thanks very much for your thoughts!

jergosh · on April 9, 2016

There are four copies of each part of the message so you can lose entire chunks and still be able to recover everything. As for subtitutions, unless you get the same error in 2 copies out of 4, there should be no problem.

The 739kb isn't a limit in any sense, the main limitation is that DNA synthesis is currently expensive.

wonderingwhy · on April 10, 2016

related question, on putting messages into dynamic, living systems:

let's say the 'message' is so large it can only be inserted into a 'junk DNA' (non-conserved) region of the genome.

is it correct to assume there are less active/robust dna repair mechanisms to 'fix' the insertions, deletions, substitutions described above than in conserved regions?

what might some numbers be for errors rates in non conserved regions ( 2x, 10x,...) compared to conserved regions? or maybe one type of error is relatively much higher than another kind?

im guessing sources of insertion/deletion/substitution 'error' are mutations over lifetime of cell, and also replication errors in daughter cells.