The top comment there (from a structural biologist) is worth reading. Here's my ...

COGlory · on Aug 2, 2022

For an unrelated reason, shortly after making that comment, I put 31 genes from a viral genome (the whole genome, assuming we have the reading frames correct and nothing else funky is going on) through AlphaFold. We're getting ready to do some proteomics to see what's in the capsid, and I wanted to inform the proteomics by doing some sequence analysis. Only three genes of the 31 came back with any sort of confidence. Two of the three were crystallized and solved by my group a few years back.

westurner · on Aug 2, 2022

Is the AlphaFold team winning Folding@home? (which started at Washington University in St. Louis, home of the Human Genome Project)

https://foldingathome.org/

FWIU, Folding@home has additional problems for AlphaFold, if not the AlphaFold team;

> Install our software to become a citizen scientist and contribute your compute power to help fight global health threats like COVID19, Alzheimer’s Disease, and cancer. Our software is completely free, easy to install, and safe to use. Available for: Linux, Windows, Mac

ejstronge · on Aug 2, 2022

> which started at Washington University in St. Louis, home of the Human Genome Project

While Wash U was a contributor, I am confused about why you call it the home of the Human Genome Project. The Project seems a lot more strongly linked to the Whitehead/MIT in terms of press and the site of key figures.

westurner · on Aug 2, 2022

https://en.wikipedia.org/wiki/Human_Genome_Project #History

"The Cost of Sequencing a Human Genome" https://www.genome.gov/about-genomics/fact-sheets/Sequencing...

ejstronge · on Aug 2, 2022

If we're just sharing links, I have one too:

https://www.genome.gov/human-genome-project/Completion-FAQ

westurner · on Aug 2, 2022

Together, these teams have achieved a very significant cost reduction: the link I shared cites a sub-$1K cost to sequence a genome today; a cost savings of millions of dollars per genome.

flobosg · on Aug 2, 2022

Both projects tackle related problems, but each is trying to answer a different question: https://news.ycombinator.com/item?id=32264059

westurner · on Aug 2, 2022

> Folding@home answers a related but different question. While AlphaFold returns the picture of a folded protein in its most energetically stable conformation, Folding@home returns a video of the protein undergoing folding, traversing its energy landscape.

Is there any NN architectural reason that AlphaFold could not learn and predict the Folding@home protein folding interactions as well? Is there yet an open implementation?

dekhn · on Aug 2, 2022

I think it would be much harder to do that, since it probably requires modelling physics at some level, while AlphaFold is really just mining statistical correlations of structures and sequences.

Yes, there are open implementations of nearly-AlphaFold at this point.

westurner · on Aug 3, 2022

FWIU there's no algorithmic reason that AlphaZero-style self play w/ rules could not learn the quantum chemistry / physics. Given the infinite monkey theorem, can an e.g. bayesian NN learn quantum gravity enough to predictively model multibody planetary orbits given an additional solar mass in transit through the solar system? (What about with "try Bernoulli's on GR and call it superfluid quantum gravity" or "the bond yield-curve inversion is a known-good predictor, with lag" as Goal-programming nudges to distributedly-partitioned symbolic EA/GA with a cost/error/survival/fitness function?)

E.g. re-derivations of Lean Mathlib would be the strings to evolve.

bpiche · on Aug 2, 2022

RIP Folding@home on Playstation 3! Bring it back!

dekhn · on Aug 2, 2022

Pretty much everything you said doesn't make any sense. Folding@Home started at Stanford, not WashU. WashU was also not "the home of the human genome project", that was a distributed effort. AlphaFold doesn't contribute to Folding@Home, it's an entirely different problem.

Protostome · on Aug 2, 2022

Disclaimer: I'm a professional (computational) structural biologist. My opinion is slightly different than the researcher that commented on the linked post.

I didn't see any claim by DeepMind that protein structure prediction is a solved problem. I think these guys are pretty diligent when it comes to communicating their science. What you may have seen, is a non-scientist reporter making inaccurate claims.

The problem with the structure prediction problem is not a loss/energy function problem, even if we had an accurate model of all the forces involved we'd still not have an accurate protein structure prediction algorithm.

Protein folding is a chaotic process (similar to the 3 body problem). There's an enormous number of interactions involved - between different amino acids, solvent and more. Numerical computation can't solve chaotic systems because floating point numbers have a finite representation, which leads to rounding errors and loss of accuracy.

Besides, Short range electro static and van der waals interactions are pretty well understood and before alphafold many algorithms (like Rosetta) were pretty successful in a lot of protein modeling tasks.

Therefore, we need a *practical* way to look at protein structure determination that is akin to AlphaFold2.

dmix · on Aug 2, 2022

Now I really want to read a long form book like this comment ‘A Computer Scientists Guide to an intuitive understanding of biochemistry’

I’ve found it extremely hard to have a casual understanding of biology, unlike math where I feel like I have a solid high level sampling of the field. I’ve done a few bio and chemistry courses and books but it’s so deep and ill suited for a programmer who is used to asking how things work underneath at every level (you have to constantly stop yourself from asking why something does what it does and just go with it until it starts to connect later, which is more of a commitment than I could give).

Anyway thanks for your comment

inciampati · on Aug 2, 2022

I would suggest carefully reading a deep textbook on biology like Molecular Biology of the Cell. You can't get a casual but realistic understanding of biology without a significant effort. That's a big problem in modern society. Biology is subtle and yet ever-important to us earth-bound organisms. The vast majority of people have only the most trivial understanding of biology, but scientifically we have a rather complete perspective and mental model that, due to its recent development, hasn't yet become common.

feet · on Aug 2, 2022

Great book suggestion! Absolutely agree as someone in the field

Biology and biochemistry is unbelievably complicated and difficult to grasp without truly going deep into the fundamentals

sseagull · on Aug 2, 2022

Slightly OT, but I am a computational chemist (PhD) looking to learn more about molecular biology (to say, and undergraduate or beginning graduate level). I am looking to learn more to see ways in which advances in computational chemistry tools could be applicable outside of our usual domains.

I am looking at Molecular Biology of the Cell (Alberts) and Cell Biology (Pollard). Both were recommended to me, but wondering what the pros and cons of each are (if you are familiar with both of them).

feet · on Aug 2, 2022

I'm not familiar with Cell Biology by Pollard but MBoC has incredible diagrams and flow charts that make pathways and other concepts incredibly easy to understand

imranq · on Aug 2, 2022

I would suggest taking MIT's Secret of Life course on EdX. Its taught by Eric Lander who was a key figure in the human genome project and was a mathematician beforehand, so he follows an axiomatic approach that is much different than the way other schools teach biology

https://www.edx.org/course/introduction-to-biology-the-secre...

Alternatively, Harvard Extension School has some great biology courses you can sign up and get credit for. Though those are mostly for pre-med career changers

joshuahedlund · on Aug 2, 2022

Two recs:

- There is a (short) book called "A Computer Scientist's Guide to Cell Biology" by William Cohen which is a little pricey but very dense and helpful with a lot of concepts.

- Combine that with David Goodsell's "The Machinery of Life" which has a lot of great illustrations and practical examples.

t_serpico · on Aug 2, 2022

Only way to truly learn biology imo is to read and do experiments. The feedback loop between those two things is what actually gives someone real intuition.

the__alchemist · on Aug 2, 2022

Is it possible or likely that the folding process is more procedurally deterministic than it seems? (given sequence, temperature etc) The degrees of freedom perhaps seem intractable because we don't know what steps the structure takes between the linear extrusion and final fold. AlphaFold, if I understand correctly, doesn't attempt to solve this problem. Your comment implies we should be skeptical of it because it's solving a potentially-intractable problem; perhaps it's both tractable, and AlphaFold doesn't solve it.

Let's say you have a car (or lego set etc). The number of possible ways the parts could go together are astronomical! Does that mean it's not possible to figure out how it fits together, or how you might build one?

tsimionescu · on Aug 2, 2022

Yes, if you have a Lego set, or a series of car parts, there are many ways to put them together to make something. What AF is doing as far as I understand is essentially looking at a catalog of all Lego sets ever produced, or all car models ever produced, and choosing one that most closely matches the pieces it is seeing.

But there is no reason to expect this process to produce the right end-result for a Lego set that has never been seen before.

FartyMcFarter · on Aug 2, 2022

Didn't AlphaFold win a competition based on folding proteins that had a secret result?

simiones · on Aug 2, 2022

Yes, but that competition is using lots of proteins that are similar to other known proteins, as far as I understand.

There is also a lot of sub-structure that helps - similar parts of proteins tend to fold in similar ways, so even if you don't have real predictive power on unknown sequences, you may do quite well for a protein that is 90% the same as one in the training set - you will be quite correct on ~90% of the folds, even if your pretty way off on the remaining 10%.

Note that all of this is not to minimize the success of what AlphaFold achieved. I am just trying to explain how you can do well at this problem without having discovered some deeper deterministic structure in protein folds.

COGlory · on Aug 3, 2022

Yes, but many proteins can be boiled down to basically two classes - the folded portion, and the unfolded portion. The folded portions are typically shared (shared is a loose term, there's a lot of leeway) among almost all proteins.

So, I can pull a protein out of thin air and there's a good chance it'll have an overall fold similar to another protein that's got a structure. Unfortunately, the devil is almost always in the details. An amino acid here or there, a short extension here or there, a missing charged residue or an extra glycine and now you have a different target and entirely different behavior in a biological system.

One cool thing I found actually, was a protein in an Archaeal virus had no known homology a few years ago, but when I checked the other day, it now matches most closely to an (otherwise thought to be) entirely synthetic protein out of David Baker's lab at UW. Which means this Archeal virus and David Baker converged on the same fold somehow (likely because it was "stable").

xvilka · on Aug 2, 2022

Given how quantum physics and chemistry works, highly unlikely.

sebzim4500 · on Aug 2, 2022

>When someone claims to have "solved" folding, you should be as skeptical as you would be if someone claimed to have solved the halting problem for arbitrary machine code

That's absurd. The halting problem is provably impossible with either conventional computers or Quantum computers.

This is clearly not true for protein folding, although it is possible that it is computationally intractable with a conventional computer.

rch · on Aug 2, 2022

I think the parent comment is saying that it's impossible to arrive at a specific folding endpoint because that state is dependent on continuously changing environmental variables.

Take a look at the configs for Amber (molecular dynamics simulation -- https://ambermd.org). QC might help map the space of inputs that would converge, but it probably couldn't identify a hypothetical 'done folding' state for any given protein.

dekhn · on Aug 2, 2022

I don't think it's super valuable to spend time thinking about the computational class protein folding (or structure prediction) is in. It's clear now that approaches that approximate the expensive physics and extended sampling using every bit of additional information available are going to be much more successful in providing data that people need from structures.

the__alchemist · on Aug 2, 2022

I propose this as a thought experiment: Nature has solved this. How? Some lines of reasoning:

#1: The quantum interactions of electrons that are the basis for chemical bonds behave in ways our computers and intuition are incapable of simulating

#2 It's a matter of degree, not kind, and nature is more sophisticated than our computers, reasoning and thought processes.

#3 Nature is magic, whatever you define that to be

#4 When stipulating the degrees of freedom involved (ie from dihedral angles), the possibility of additional information we haven't discovered is being overlooked. Is there a recipe or algorithm that could help?

jhbadger · on Aug 2, 2022

#5 Proteins don't fold in isolation. We know some proteins need chaperone proteins to fold, for instance. Others form part of a complex. The problem can't be solved in the general case just based on the sequence of the protein you want to know the structure of. That's also a problem experimentally -- we don't know if the structure of a crystalized protein is really the biologically meaningful form.

GuB-42 · on Aug 2, 2022

I'd go with #1. Especially considering that there are quantum approaches to protein folding.

But nature hasn't really "solved" the problem, it is just doing its thing, but the way it does things is completely different from what our computers do.

It is like trying to reproduce a guitar sound using a synthesizer. A guitar solves to problem of sounding like a guitar, but it doesn't mean it is more sophisticated than a synthesizer, in fact, a synthesizer can do much more, it is just that the process by which the guitar makes sounds are hard to simulate.

hexmiles · on Aug 2, 2022

Could not be just bruteforce? Nature operates on a much bigger temporal scale than us.

the__alchemist · on Aug 2, 2022

Could be! Are you thinking thermodynamic fluctuations from surrounding water molecules jostling things around into many combinations? In this view, do you think the final protein would be found by chance, or through intermediate assemblies?

simiones · on Aug 2, 2022

Isn't #1 the most likely, given that most quantum interactions take exponential time to simulate on classical computers with any known algorithm?

heavenlyblue · on Aug 2, 2022

Why did we escape the land of NP?

FartyMcFarter · on Aug 2, 2022

NP problems are ones whose positive solutions are verifiable in polynomial time.

For example, the problem "is there a route in this graph that visits all nodes and has length <= L" can be quickly verified with a classical computer, as long as you're given a "yes" answer accompanied by such a route. Finding the answer from scratch might be much slower, but checking it is quick.