How to determine a protein’s shape

drug_design_PhC · on Aug 14, 2017

I was surprised to see the "breakthrough" in this article. The idea to use correlated mutations to identify contacting residues has been around since the 90's [1](if not earlier), but it seems like it's just now getting implemented/benchmarked in reliable prediction workflows. As someone finishing a PhD in this field, my perspective is that there's a big lack of software engineering talent here.

There are ongoing efforts in the computational drug design field to make these problems more accessible to AI/ML specialists. While some biochem knowledge is required, packages like OpenMM, PyRosetta, and MDTraj automate out a lot of the details of working with protein structures. Further, there are a number of public contests to identify the best-performing approaches, with no degree/publication requirements for entry. These challenges include CASP (run every other year for 3D structure prediction -- Jinbo in the article did very well in this), CAMEO (run weekly for structure prediction), D3R Grand Challenges/CELPP (run at different frequencies for drug design), and probably many others I'm not remembering.

New folks to the field might start with FoldIt[2], a sort of protein-folding video game that has a LUA interface, to get familiar with protein folding. I'd be interested to know what sorts of resources we could make available to make the field more accessible to AI/ML talent.

[1] http://csbg.cnb.csic.es/rev_coevol_nrg/ [2] https://fold.it/portal/

SomeStupidPoint · on Aug 14, 2017

Is there a FoldIt that doesn't track what I do for other people to profit from?

I want to approach this topic (and have been interested since undergrad), but the "be a forced collaborator -- we'll even include your name while taking the credit and patenting through our organization!" is a complete no-go. (Though, it is more reasonable than a lot of such terms.)

As it stands, I would have to consider if exposing the trial pattern of a ML algorithm to the team would violate any IP agreements about ML optimization I'm party to before I could even get started, while if I weren't disclosing the intermediate trials, I could become invested in the topic before I had to make such a choice (or rather, before I had to manage such issues). The forced collaboration is too high a barrier to entry for me to find out if I'm actually interested in the topic.

So... uh... a non-spyware SDK would be a really good start.

harveywi · on Aug 14, 2017

> As someone finishing a PhD in this field, my perspective is that there's a big lack of software engineering talent here.

As someone with a PhD in computer science who did a postdoc in a bioinformatics lab, I can vouch for that.

x0x0 · on Aug 14, 2017

I'm a data scientist, and I left a biolab to do adtech.

My salary tripled. That may have something to do with it...

AStellersSeaCow · on Aug 14, 2017

Yep, I did software development in academia for a biochem lab and they paid less than 1/3 what I make in industry. Not only that: I was lower on the totem pole than a first semester PhD student, there was zero potential for career growth of any sort (the prof I worked for laughed out loud when I asked about it), and my job security was entirely governed by the grant approval/extension whims of the NSF and NIH.

Foreknowledge of all that wasn't enough to keep me from working at the job for a while. It was a super interesting experience, and I learned an enormous amount about biochem, comp bio, synthetic bio, and several other fascinating subjects.

What eventually caused me to leave was the continuous, losing battle for sane software development practices. It wasn't just that lab: everyone I encountered in the techy side of bio - save for the oddball comp bio or synth bio prof/student with a CS background that included industry experience - was completely adverse to treating their software as anything other than a means to an end. In the year and a half I lasted before taking a job in industry, that one lab easily wasted hundreds of work hours navigating easily preventable tech debt, writing the exact same code for the Nth time, fixing the same deployment or revision control mistakes for the Nth time, etc, while any attempt on my part to put in standards and practices to alleviate any of it was dismissed out of hand as a waste of time.

In short, I agree that there's a shortage of software engineering knowledge and skills in the field, but beyond the obvious financial, organizational, and career development hurdles keeping talent away, there's a major attitude adjustment required by the researchers themselves.

tomjakubowski · on Aug 14, 2017

Similarly, I left a computational chemistry lab to work in web dev and my salary tripled, and that was before I realized how much they'd lowballed me compared to my peers and asked for a raise.

cing · on Aug 14, 2017

Coevolution was part of it, but the actual "breakthrough" in their paper was the use metagenome sequences for protein structure prediction [1]. I don't disagree that engineering talent is needed in this field, but in this case it's not really fair to boil it down to an engineering problem. I mean, this work relies on the development of the Rosetta energy function, which is a wild beast of a scientific problem [2].

[1] https://www.bakerlab.org/wp-content/uploads/2017/01/ovchinni... [2] http://www.biorxiv.org/content/early/2017/02/07/106054

drug_design_PhC · on Aug 15, 2017

> Coevolution was part of it, but the actual "breakthrough" in their paper was the use metagenome sequences for protein structure prediction

Ah -- I hadn't read into the metagenomics aspect. That is quite substantial. Thanks for the link.

> I don't disagree that engineering talent is needed in this field, but in this case it's not really fair to boil it down to an engineering problem. I mean, this work relies on the development of the Rosetta energy function, which is a wild beast of a scientific problem

Interesting -- I guess my disagreement here stems from the definitions of "science" and "engineering". I agree that Rosetta scoring function development is extremely valuable. I also agree that figuring out how to efficiently balance coevolution-derived constraints vs. traditional Rosetta score is a substantial accomplishment. However, the profs on my committee consider this type of work (which I find really meaningful) to be more "engineering" (or "optimization") than "science". Their definitions had never sat well with me, and I'm glad you agree that this is valuable science.

j_m_b · on Aug 14, 2017

Protein folding researchers typically use monte carlo and gradient descent methods. It makes sense to look to machine learning to expand the available toolsets for determining protein structure.

fabian2k · on Aug 14, 2017

There's quite a difference between modelling a structure, and actually determining the structure. Methods that don't use any real data, but entirely predict folding will have to get a lot of testing before they'll be trusted at all.

The failures of these models are probably more likely in the really interesting cases. The boring ones that are very similar to existing structures are likely much easier to predict.