It's almost unbelievable how DNA seems to "just work". DNA from one organism breaking apart and slipping into another somehow leads to shared expressions across horizontal species - it's absurd.
It's like taking a random byte sequence from some binary, shoving it randomly into another, and the new binary gets useful new features.
The involvement of even more complex systems like parasites makes it that much more insane to me.
> It's like taking a random byte sequence from some binary, shoving it randomly into another, and the new binary gets useful new features.
If you think about it for a moment, our genetic code is kind of designed to work that way.
You get half of your genetic code from your mom, the other half from your dad, and somehow, all of these genes "just work" together. It's kind of miraculous when you think that there are very many genes that encode how your brain works, and how your liver works, your muscles, etc. Somehow, provided the baby can be born, a mishmash of genes from two different individuals almost always works out.
Probably because they do not really encode how anything works and because, probably by necessity, the growth of organisms is a swarm intelligence that is quite self-healing.
In particular with coinjoined twins, it's quite remarkable how much the systems for body development still produce something that connects the inner workings, which was obviously not it's “purpose”,but the self-healing growth mechanisms that corrects for errors simply leads to that.
Consider the Hensel Twins who have two mouths but their digestive system at some point merges in a way that is capable of digesting. The “tubes” of their digestive tract actually merge at one point, but they have two stomachs.
"a mishmash of genes from two different individuals almost always works out" => different individuals of the _same_ species (which btw is how a "species" is defined).
The evolution of organisms that gene mishmash (aka sexual reproduction) is thought to be the result of an ongoing arms race between gene sequences that "try" to stay unchanged (in higher level species) and gene sequences that "try" to "free ride" (from viruses etc.) Being able to build members of your species from "mishmash of genes from two different individuals" has the effect of scrambling the DNA of each species member which makes attack harder.
Organisms that do not do this and reproduce via cloning (aka Parthenogenesis) are often entirely wiped out once a pathogen figures out how to target their DNA -- hence the bananas types we eat change over time.
> The evolution of organisms that gene mishmash (aka sexual reproduction) is thought to be the result of an ongoing arms race between gene sequences that "try" to stay unchanged (in higher level species) and gene sequences that "try" to "free ride" (from viruses etc.)
Sexual reproduction means your species has a very large gene pool, and individuals with new combinations of genes can be produced very quickly. That's not just an advantage against viruses. It's also very useful for adapting rapidly and competing against other species when your environment changes. New threats (and new opportunities) show up all the time, be it dwindling or changing availability of food, climate change (e.g. new ice age), new predators or new preys, and also a group of individuals migrating to a new region of the world with a different climate.
That seems to be the point the parent is making. Tautology is the the only way we can explain the way life happens to be -- it's because it's advantageous for it to be that way.
It should also be pointe out that about two thirds of human conceptions result into early embryonic death, so evidently it is not as smooth a ride as suggested.
Do very much agree it's miraculous. Biological organisms are robust to error and chance in ways no designed system comes close to matching. It's awe-inspiring
It's less so when you know about some evolutionary programming techniques, such as using Lisp with a subset of the code that defines behavior in a tree structure that allows for parts of that tree to be swapped in and out from other programs using the same design whike still yielding an executing program. Combined with a fitness function you can "breed" programs for a task.
As I understand it there are attributes of Lisp and attributes of the program structure (such as putting much of the logic in that tree structure with defined split points) which makes this much more feasible than otherwise.
My guess is that DNA has evolved similarly, where the ways in which it splits and the mechanism in which it is interpreted and executed help, and also the organisms we're talking about (us and the other complex ones) have iterated to a design that's more amenable to bits being swapped. That is large chunks of us may be more similar to a lisp program with attributes that make it easy to swap parts than to a bunch of object bytecode with absolute and relative jumps all over.
Note: A lot of this is poorly remembered from a survey of AI class two decades ago, so it bears someone with a stronger background verifying I'm not making a complete hash of it.
This is crazy from the perspective of "old fashioned binaries", but less so in the context of neural networks. You can do all sorts of splicing and dicing of the bits in neural networks (their weights) and end up with useful networks. Dropout, for example, specifically trains a network to be resilient to having swaths of the network removed, and makes individual features in the network resilient to having a random selection of other features present or not-present. If I remember right, the original dropout paper even analogizes this to how genes have evolved to be resilient to this type of random pairing.
Don't try to think about genomics in programming terms. At best you'll only confuse yourself; at worst, others also. Both a computer program and a genome encode information, but that's about where the similarities end.
I think a working analogy here would be that source code (dna) gets compiled (physics/chemistry) into an executable (living organism with even more dna) which gets executed by the os (phys/chem again) to produce changes on some data (organism interacts with the environment), and on and on..
The details are of course endless and they aren’t interchangeable between the two fields, but the analogy is still there..
Sure, but is it meaningful? What predictions does it enable that are sufficiently borne out by reality to make it seem likely that less easily testable predictions on the same basis may likewise prove sound?
I mean, I can as well say that chalk and cheese are alike in that both have mass, occupy space, and leave a streak behind when you rub them on something. It is a true statement, but what does it help me predict about either?
I think the only real takeaways on the coding/biology comparison are applying base-level informatic systems ideas to explain some biological developments, and in reverse looking at biological system mechanics as inspiration for designed systems.
I don't think the 'chemistry is an OPERATING SYSTEM' level of handwaving is sufficient to glean insights, but understanding general systems-level interactome patterns of how proteins interact does help provide knowledge about how natural and designed systems can self-regulate, how they fail, how they can be structured, etc.
Sure, at that level it makes sense. The trouble seems to be that in order to know that that's the level at which it makes sense, you need to know considerably more about informatics than is the default among programmers who like to indulge in this kind of speculation. Kind of a Dunning-Kruger problem, maybe; there certainly was a time when I likewise didn't know what I didn't know.
I'm not sure what it is you think I'm trying to say, but much of your point seems to be "don't talk about things you don't understand", which I have no interest in abiding. I like talking about things I don't understand, and I've enjoyed the posts from you and others on the topic, even if I'll only ever be a layman.
Talking about things you don't understand is no problem by me! What I'm trying to point to here is the hazard of making assumptions about something one doesn't understand, and then trying to reason about the thing based on those assumptions.
>Don't try to think about genomics in programming terms.
At some point, some Newton person will figure it. It always happen.
As for now, it might be interesting to understand why exactly the analogy between genomics and programming fails. It might bring interesting insights into both fields.
I'm profoundly ignorant in neither (PhD in biophysics, software engineer for 20 years). Genomics and programming analogies are cool, but the most important thing is that understanding that molecular structures can encode information in a replicable way, and the discovery of application of entropy to data storage and transmission, demonstrates that information is a universal concept, that the genome is a data storage system, and the enzymes that operate it are operating on information, in a computational way. To me that's a pretty useful comparison.
Software changes over spans of minutes to decades; genomes change over spans of millions of years. Software is written; genomes are not. The complexity of software is constrained by programmers' ability to comprehend it; the complexity of genomes is not. The environment in which software functions is determined by humans; the environment in which genomes function is not.
Those are trivial surface level differences relative to the central idea of encoding, storing, replicating, editing digital information, which interfaces with other digital and analog systems.
Not that there's much point to saying so, since you appear to be here for no other reason than to assert that my argument is false because you would prefer it be so, but here's another: software is digital; genomes are not.
FWIW all of these differences still feel extremely surface level. I'm no expert but I certainly am, so far, aware of everything you've said with regards to how they differ - I'm kinda hoping for more, given the strong assertion you made that one can not relate the two without being fundamentally ignorant of either topic.
I also think it's somewhat ironic that you're accusing them of only being here to say "you're wrong" but that's what you've done in this thread? I only bring this up because I think we're all after the same thing here - to understand an incredibly interesting topic.
I suspect most of us are really here to learn and discuss. You seem like you have a background in the area, I'm sure we would all benefit from learning about the differences.
If it's the case that the similar is that DNA and code both encode information, and the differences are based on how they do so, it's hard to see why you think they can't be related at all. You've been relating the two.
If I've given the impression that the difference is merely a question of varying encodings, then I have to agree my arguments have thus far been lacking.
The idea that a genome as expressed in nucleic acid is purely, and only, an informational medium, is fundamentally in error. It does encode information in the sequence of base pairs, this is true. But it is also a physical structure in its own right, and properties of that structure incidental to the encoded information have what recently looks to be at least as important a role in the process of transcription as the sequence itself.
There are, for example, some sequences which will cause a ribosome to transcribe the surrounding genes differently or with varying frequency, due to the physical interaction between the molecules involved. (I recently discussed this here in the context of recent research on causes of eye color; it should not be too far back in my comment history.) We also see, for example, that both viral and eukaryotic DNA can be and often are transcribed in ways that produce different proteins from the same sequence, again as a result of physical constraints affecting the interaction with the ribosome. This is one reason why "junk DNA" is a bit of a misnomer, and why we more recently see the term fall out of use in favor of "noncoding DNA" - these regions carry no information in their own right, but nonetheless can strongly affect the outcome of transcription because transcription is not only an informatic process. This isn't true of software; there is no general case in which two programs varying only in nonsyntactic ways will be evaluated differently under otherwise identical conditions - we create programming languages as we do in part to ensure that won't happen, and it's also part of the reason why we use transistors instead of vacuum tubes or relays: in order to engineer that kind of variance as much as we can out of existence. What is therefore an accidental property in software is an essential one in gene expression, and cannot be overlooked without reaching an inaccurate conception of how the latter process works.
That's just one example, and it's true that processes like these can be modeled in software to variously imperfect degrees of fidelity and that information-theoretical models can be useful in understanding some aspects of how they work. But that's not the same thing as them working similarly enough that understanding one very well suffices to reason about the other. I definitely can see how it's easy to assume otherwise! It's an assumption I shared, before my own yearlong exposure to the field at a sufficient level of detail to start to understand what I hadn't understood about it before, and considerable reading and study thereafter.
Unfortunately, I was there to provide engineering support to people doing that work, not to do it myself, and the knowledge I've derived from that experience apparently does not extend so far as producing a concise and positive statement of the fundamental difference between the two fields of study - I spent considerably more time teaching informaticists how to program, formally and otherwise, than I spent learning about bioinformatics. That leaves me able to recommend little beyond seeking out similar experience of your own, which I do recommend if the depth of your interest suffices -although I do also have to say working in academia as a nonacademic has very little else to recommend it.
I know there are some folks on HN with formal knowledge and training greatly exceeding my own, and some of whom have probably also had experience teaching the basics in an accessible way. Perhaps one of them might give a more useful answer here than I've been able to.
>some sequences which will cause a ribosome to transcribe the surrounding genes differently
Not to be a negative nancy here, but if we're being precise, ribosomes do not transcribe. They translate.
Under the fairly reductive central dogma of biology:
DNA -> RNA (Transcription)
RNA -> Protein (Translation)
Transcription and translation are separate mechanics that don't occur in the same area of the cell, and both use very different complexes to mediate the rates of each in different physical environments.
I don't disagree with any of the substantive points being made, but I think the proper terminology only adds to your argument so I found it strange that it was left out.
It's one of the drawbacks of being an autodidact; I pretty much always have to check to be sure I'm not confusing these two similar terms, and I didn't stop to check this time. Thanks for the correction.
Thanks, this was much more interesting to read, and educational for someone with a software background, which I think kind of goes to show that discussing analogs is actually a reasonable way to approach the unknown :)
Again I agree with you, because I had a similar experience. But, again, my conclusion is different than yours.
You write that we should not talk about biochemistry as computation, as far as I understand. Instead I'd say that we have not studied enough how nature does computation without programmers or even human friendly semantics.
Is still computation, involving space and physics. Too complex to efficiently simulate it (for now) but not big enough so that the emerging behaviour is simple, like for a gas.
Genomes are absolutely digital. GATC is no different from 1 and 0. It's just using a different base (pun intended).
Files on disks have end of file markers, just like the start and stop sequences in DNA. Operating systems have cron jobs (themselves digital) that control when other programs execute.
Genomes are much more than just their sequence. Their spatial organisation, their methylation, their fiolding, their packing etc, have no equivalents in a filesystem.
You're talking about a digital<->analog interface. Take a digitally encoded audio file, read it out and turn it into sound waves using a digital analog converter, play it out on physical speakers, record it back with a microphone, use that information to control a robotic arm with a magnet that will swipe over the physical medium... etc. They are absolutely analogous.
False by definition: Digital data is "information represented as a string of discrete symbols each of which can take only one of a finite number of values"
I agree with you here but I get to a happy conclusion. The (self- or culturally imposed) constraint on computation to be semantically meaningful for humans does not apply for genomes. But this is already useful, because it means we at least have a hint about where to dig more in programming.
There is Theory of Computation and there is Theory of Programming. Your arguments apply to TOP but not to TOC.
Plenty of software is neither written nor comprehensible I can assure you of that.
Like I don't think your necessarily wrong, but pointing out the literal differences between the two topics doesn't explain to me why the analogy is wrong and therefore doesn't support your argument.
It's like saying "I'm nothing like my mother; I don't even have long hair"
I love this. It's a little black and white, but the comparison is as between video game worlds and the real world. Only enough to fool the willing eye.
I use a variation of this form as 'persons whos science and religions conflict don't know enough about either one'.
As an enthusiastically former staff engineer at a bioinformatics institute, I'm happy to have been of help! Please feel free to do so without attribution; if nothing else, it'd be a shame at this late date to have my opinions of the caste system in academia disturbed by the novel experience of receiving credit for my contributions to the work of people with letters after their names. :D
A program has to run those sequences mostly in order. Rather than swapping around blobs of binary it's more like each gene being its own small program, and things working is much less surprising in that context.
What I'm very interested in ATM(just now after reading this topic) is how the process of evolution really works. Not the selection so much, but the actual mutations.
The last I ever learned about it, and perhaps the common belief, is that random-ish gene mutations account for it. 4 billion years doesn't seem like enough time to account for all that unless changes are heavily weighted towards doing something somewhat useful. Like there is a system at play.. Lego blocks vs bits. IDK.
If you think evolution is just selection and mutations, I can see how you'd think it's not enough (even though selection is usually a very strong force that "locks good mutations in place", which can create a compounding effect in chances of an organism's fitness, so a long time of mutations + locking good ones in place should be almost enough to convince you if you do the math). It's been some time since I studied this so I'm just going to write whatever comes to mind.
Maybe a key part of what's missing in your understanding is not so much "micro" genetics but biogeography and population genetics. You'd also want to check simulations and comparisons with real-world data on some models to see how a population evolves to see that it "really works". It's important to understand that there are different models, and for each one there's a set of "forces" and important parameters.
The bigger picture is, it's the whole interplay between mutation, selection, genetic drift, gene flow; things like differences in population size over time and space, migrations, isolation and reconnection, etc. that makes it work. You might also want to take a look into genetic/functional/morphological modularity. I've just skimmed these articles, but they seem relevant:
There's much more but my memory is murky. Ideally you would want to take a course or read a textbook on evolution. A few popsci books are ok (Dawkins, E. O. Wilson).
TLDR: What the other commenter said -- 4 billion years is a very, very, very long time.
The closest programming analogy for a new gene is probably dropping a new listener/sender on a message bus. It can send messages independently in response to messages that were already on the bus before it arrived. If there's a little bit of a shared language (which there is here, since the bus is chemistry itself), that can lead to new behaviors of the system without necessarily breaking anything.
I think the binary example is incredibly poor and makes understanding this much harderr.
Genes code for proteins (and promoters, etc.) and wind up in a chemical soup in flux. They're going to bounce around and do things.
Their presence will be more akin to new kinds of cars or trucks entering a highway, and they'll have different impacts to traffic (kinetics, thermodynamics).
It's like taking a random byte sequence from some binary, shoving it randomly into another, and the new binary gets useful new features.
The involvement of even more complex systems like parasites makes it that much more insane to me.