I've spent a lot of time working with pipelining software, first for my last job...

I've spent a lot of time working with pipelining software, first for my last job doing bioinformatics research, and now for handling analytics workflows at Custora. We ultimately decided to write our own (which we are considering open sourcing, email me if you are interested in learning more).

The initial system that I used was pretty similar to Paul Butler's technique, with a whole bunch of hacks to inform Make as to the status of various MySQL tables, and to allow jobs to be parallelized across the cluster.

At Custora, we needed a system specifically designed for running our various machine learning algorithms. We are always making improvements to our models, and we need to be able to do versioning to see how the improvements change our final predictions about customer behavior, and how these stack up to reality. So in addition to versioning code, and rerunning analysis when the code is out of date we also need to keep track of different major versions of the code, and figure out exactly what needs to be recomputed.

We did a survey of a number of different workflow management systems such as JUG, Taverna, and Kepler. We ended up finding a reasonable model in an old configuration management program called VESTA. We took the concepts from VESTA and wrote a system in Ruby and R to handle all of our workflow needs. The general concepts are pretty similar to to Drake, but it is specialized for our ruby and R modeling.

Some more useful links for those interested:

JUG https://github.com/luispedro/jug

Taverna http://www.taverna.org.uk/

Kepler https://kepler-project.org/

VESTA http://vesta.sourceforge.net/