Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article massively undersells the information content of the genome in several key ways. A non-comprehensive list of these (before my morning coffee forgive me) includes:

- DNA methylation (https://en.wikipedia.org/wiki/DNA_methylation)

- Interactions of alleles (what article refers to as the "two versions of each base pair")

- Duplications, deletions, inversions, and other structural variations (https://www.genome.gov/genetics-glossary/Structural-Variatio...)

- Physical proximity interactions in 3-dimensional space (https://cmbl.biomedcentral.com/articles/10.1186/s11658-023-0...)

- Combinatorial effect (massive) of different alleles in complex systems

Overall, it's not sensible to compare a linear sequence of bits, like a CD (sibling comment) or DVD (the article), to the linear sequence of the genome and conclude that their information content, based on length alone, is in any way comparable.



Exactly. The compression level of DNA is magnitudes better than anything we can even come close to. DNA usually doesn't even contain specific counts (like 5 fingers on hand) or sizes of organs and so on - these are given by the processes that run in parallel and cause the cells to hit spatial / chemical / electrical or other limits. It's like putting lots of house builders on specific places where the house should be and each one would just keep building a wall until the he hits another one. There is no compressed house plan, it's a compressed "engine" that builds the result.


Comparing it to machine code on CD/DVD might make more sense then. Machine code where every line has been hand-optimised by nature's hackers over 500 million years.

And in that context, hundreds of MBs is a heck of a lot of complexity.


You put my reaction to this in much more educated terms. I’ve always felt that thinking of DNA as bits was a bit simplistic. Just because we store information as bits it doesn’t mean that nature does.

Not that it means they can’t be right, but the author also doesn’t seem to have any particular expertise in genetics. Their ideas need to survive a lot more criticism by people who know what they’re talking about before you could start to see them as convincing.


T he raw bits of the base pairs is just one component of the information, but it’s like a maximally compressed version of the info.

The laws of physics are another component.

From there you would need to simulate nature to be able to decompress all the data, like how computer programs can use procedural generation.

Imagine a game like Minecraft. You can generate practically infinitely many screenshots of Minecraft worlds, but all that data can be derived from the game code and the jvm.


> T he raw bits of the base pairs is just one component of the information, but it’s like a maximally compressed version of the info.

This sounds a bit suspect. Maximally compressed version would be very sensitive to mutations which wouldn't be great for adaptation via mutations. My understanding is that only a small fraction of mutations lead to unviable phenotypes.

Also AFAIK the current understanding is that majority of DNA is "junk", i.e don't seem to affect the phenotype. Which would be a partial explanation for the above.

The process of genetic expression is indeed something like procedual generation, but if maximal compression is about something like Kolmorogov complexity, the produced phenotype doesn't contain more information than the genetic information.


He does mention structual interactions as well as duplications/deletions/inversions. I would argue methylation is more like an annotation of DNA and not part of the DNA itself, but that's a matter of opinion.

In the end, the author literally says: "nobody knows". Yes, you cannot compare a linear sequence of bits to a macromolecule that interacts structurally with its environment, and the author does not make that claim. The question he tries to answer is: how much data is needed to re-create a similar macromolecule that interacts in a similar way. His main point, in which you both agree: only the exons are surely not enough because the encoded proteins are just a (small?) part of how DNA interacts.


Exons are almost like functions where as a gene is almost like a class definition. In different tissues in the body a gene might be alternatively spliced to lead to different protein isoforms. In effect, making use of only a subset of available functions in the class depending on certain input parameters or how the class is called.


This is a Star Trek version of the subject, in that it is pure technobabble which happens to mention a few real terms.


As someone with both a biochem and CS background, I found the comment insightful and clear. Zero technobabble to my ears.


What does casting biochem in the metaphor of CS abstractions, in this example, clarify? What does it elucidate? What further predictions does it allow us to make about either subject of the metaphor? Can those predictions be tested? Do they make sense enough for that to be a meaningful question?

Show me how this isn't a more confusing than useful explanation, even for the bright ten-year-old or so at whose level it appears to be pitched, and I'll grant it may have some value.


I find that even if this just provides a lower bound it is still an interesting piece of information.


Yeah...

We know now that environmental factors change how DNA is expressed as well through epigenetics.

I don't know how any of it works. Something to do with the shape the DNA when it is wound up and how it changes the output when RNA produces proteins.

This is how parents can do things like pass some of the athleticism they earn through training to their children. It is possible for athletic parents to pass genes in such a way that it produces children even more athletic then they were.

All of this means that DNA has the ability to encode information and produce proteins in different ways using the same sequences.

So I am guessing that a lot of the DNA that is considered "junk" may not actually be. They are just missing a piece of the puzzle in how it gets read in.


But all of those emergent effects are accounted for in the DNA sequence [1], so the estimate is fine.

1. Maaaaybe you could make a case for DNA methylation, but that still requires some DNA signatures so ...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: