Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scientists store digital images in DNA and retrieve them perfectly (washington.edu)
95 points by victorbojica on April 8, 2016 | hide | past | favorite | 20 comments


I wasn't directly involved but the research group I'm part of did some earlier work mentioned in this paper (ref. 10) and I would be happy to answer any questions people may have.


This is really interesting stuff! One of the potential applications they call out is long-term data storage. How do you know that the stored data will actually last hundreds of years?


Also not part of this study, but an earlier one (ref 6). Basically DNA has been recovered from corpses that are thousands of years old. If stored properly, it is at least extrapolated to last up to millions of years [1].

1. http://onlinelibrary.wiley.com/doi/10.1002/anie.201411378/fu...


What sort of supporting infrastructure is required to reliably store it for a long time? Infrastructure such as cooling, dessication and warehousing.


You don't actually need to that much, as long as you can keep it dry and reasonably cool -- one idea we had kicking around was to use a cave of the sort that's used for long term seed storage (https://en.wikipedia.org/wiki/Svalbard_Global_Seed_Vault)


what is the read / write time like?


In principle a few hours but currently you have to outsource synthesis (i.e. writing) to a company and its an expensive process. Sequencing (reading) can be done more easily and has been dropping in price faster than Moore's law predicts.


hey there, i have two questions on your earlier work.

the assumed context is sending/reading a message via DNA is equivalent to de novo sequencing with 100% accuracy.

error-correcting via 4x overlap:

how many insertions, deletions, substitutions can it correct for? are some combinations harder to fix than others? for example, three insertions much worse than one deletion, or 5 substitutions, etc.

information storage / information blocks:

i'm guessing the 100bp segments have to do with the limits of hardware sequencing, but what limits the overall message size to 739Kb?

thanks very much for your thoughts!


There are four copies of each part of the message so you can lose entire chunks and still be able to recover everything. As for subtitutions, unless you get the same error in 2 copies out of 4, there should be no problem.

The 739kb isn't a limit in any sense, the main limitation is that DNA synthesis is currently expensive.


related question, on putting messages into dynamic, living systems:

let's say the 'message' is so large it can only be inserted into a 'junk DNA' (non-conserved) region of the genome.

is it correct to assume there are less active/robust dna repair mechanisms to 'fix' the insertions, deletions, substitutions described above than in conserved regions?

what might some numbers be for errors rates in non conserved regions ( 2x, 10x,...) compared to conserved regions? or maybe one type of error is relatively much higher than another kind?

im guessing sources of insertion/deletion/substitution 'error' are mutations over lifetime of cell, and also replication errors in daughter cells.



Church encoding


I wonder what access times are like.



It's all fun and games until this DNA storage downloads a virus!


I'm off to NPM to post "dna-fs". Storage capacity is huge but read/write latency is on the order of days.


This is awe inspiring.


Basic smartphone as a unit of measure?


Brilliant


“I’ll just keep writing over my junk DNA for storage.”

http://dresdencodak.com/2009/07/12/fabulous-prizes/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: