I wasn't directly involved but the research group I'm part of did some earlier work mentioned in this paper (ref. 10) and I would be happy to answer any questions people may have.
This is really interesting stuff! One of the potential applications they call out is long-term data storage. How do you know that the stored data will actually last hundreds of years?
Also not part of this study, but an earlier one (ref 6). Basically DNA has been recovered from corpses that are thousands of years old. If stored properly, it is at least extrapolated to last up to millions of years [1].
You don't actually need to that much, as long as you can keep it dry and reasonably cool -- one idea we had kicking around was to use a cave of the sort that's used for long term seed storage (https://en.wikipedia.org/wiki/Svalbard_Global_Seed_Vault)
In principle a few hours but currently you have to outsource synthesis (i.e. writing) to a company and its an expensive process. Sequencing (reading) can be done more easily and has been dropping in price faster than Moore's law predicts.
hey there, i have two questions on your earlier work.
the assumed context is sending/reading a message via DNA is equivalent to de novo sequencing with 100% accuracy.
error-correcting via 4x overlap:
how many insertions, deletions, substitutions can it correct for? are some combinations harder to fix than others? for example, three insertions much worse than one deletion, or 5 substitutions, etc.
information storage / information blocks:
i'm guessing the 100bp segments have to do with the limits of hardware sequencing, but what limits the overall message size to 739Kb?
There are four copies of each part of the message so you can lose entire chunks and still be able to recover everything. As for subtitutions, unless you get the same error in 2 copies out of 4, there should be no problem.
The 739kb isn't a limit in any sense, the main limitation is that DNA synthesis is currently expensive.
related question, on putting messages into dynamic, living systems:
let's say the 'message' is so large it can only be inserted into a 'junk DNA' (non-conserved) region of the genome.
is it correct to assume there are less active/robust dna repair mechanisms to 'fix' the insertions, deletions, substitutions described above than in conserved regions?
what might some numbers be for errors rates in non conserved regions ( 2x, 10x,...) compared to conserved regions? or maybe one type of error is relatively much higher than another kind?
im guessing sources of insertion/deletion/substitution 'error' are mutations over lifetime of cell, and also replication errors in daughter cells.