I have been in discussions about this with one of my friends working in academic...

jurassic · on May 9, 2013

> Its amazing the amount of work today done by scientist at universities writing code without very basic software development tools.

I agree 100%. I recently quit my PhD so I still know a lot of people on the frontlines of science. One of these friends recently asked me to help them with a coding issue so they gave me an ssh login to group's server. I login and start reading the source.

It was all Fortran, with comments throughout like "C A major bug was present in all versions of this program dated prior to 1993." What bug, and of what significance for past results? Unknowable. As far as I can tell from the comments, the software has been hacked on intermittently by various people of various skill since at least 1985 without ever using source control or even starting a basic CHANGELOG describing the program's evolution. The README is a copy/paste of some old emails about the project. There are no tests.

So even though computer modeling projects should, in theory, be highly reproducible... it often seems like researchers are not taking the necessary steps to know what state their codebase was in at the time certain results were obtained.

epistasis · on May 9, 2013

This is an entirely different issue than code; code mostly does the same thing when you run it twice. There's no such guarantee in biology. A cancer cell line growing in one lab may behave differently than descendants of those cells in a different lab. This may be due to slight differences in the timings between feeding the cells and the experiments, stochastic responses built into the biology, slight variations between batches of input materials for the cells, mutations in the genomes as the cell line grows, or even mistaking one cell line for another.

Reproducibility of software is a truly trivial problem in comparison.

Someone · on May 9, 2013

Also, sometimes, doing the experiment is extremely hard. I know a guy who only slightly jokingly claims he got his Ph.D. on one brain cell. He spent a couple of years building a setup to measure electrical activity of neurons, and 'had' one cell for half an hour or so (you stick an electrode in a cell, hope it doesn't die in the process, and then hope your subject animal remains perfectly subdued, and that there will not be external vibrations that make your electrode move, thus losing contact with the cell or killing it)

Reproducible? Many people could do it, if they made the effort, but how long it would take is anybody's guess.

Experiments like that require a lot of fingerspitzengefühl from those performing them. Worse, that doesn't readily translate between labs. For example, an experimental setup in a small lab might force an experimenter in a body posture that makes his hand vibrate less when doing the experiment. If he isn't aware of that advantage, he will not be able to repeat his experiment in a better lab (I also know guys who jokingly stated they got best results with a slight hangover; there might have been some truth in that)

campnic · on May 9, 2013

Oh, I agree. Biological experiment reproducibility is an incredibly hard problem. You are probably right that it is 'trivial' by comparison in the same way that landing on mars is trivial to landing on Alpha Centauri.

shabble · on May 9, 2013

Not to mention the ever-present zombie threat[1][2]

[1] https://www.ncbi.nlm.nih.gov/pubmed/10516762?dopt=Citation

[2] https://en.wikipedia.org/wiki/List_of_contaminated_cell_line...

bigiain · on May 9, 2013

Have you seen: http://matt.might.net/articles/crapl/

"Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about "proof of concept." These rough edges make academics reluctant to release their software. But, that doesn't mean they shouldn't.

Most open source licenses (1) require source and modifications to be shared with binaries, and (2) absolve authors of legal liability.

An open source license for academics has additional needs: (1) it should require that source and modifications used to validate scientific claims be released with those claims; and (2) more importantly, it should absolve authors of shame, embarrassment and ridicule for ugly code."

phireal · on May 9, 2013

I think that's what the folks at Software Carpentry [0] are trying to do. I went on one of their courses, and you're taught the basics of writing good software, version control and databases (SQLite). I've frequently recommended it to fellow scientists.

[0] http://software-carpentry.org/

campnic · on May 9, 2013

This is great! Thanks for sharing.

Thrymr · on May 9, 2013

Recent article on git and reproducability in science: http://www.scfbm.org/content/8/1/7

It is badly needed.

cowsandmilk · on May 9, 2013

That article says "Data are ideal for managing with Git."

I one time tried using git to manage my data. The problem is, I frequently have thousands of files and gigabytes of data. And git just does not handle that well.[1]

One time, I even tried building a git repo that just had the history of pdb snapshots. The PDB frequently has updates, and I have run into many cases where an analysis of a structure was done in a paper 3 years ago, but the structure has been updated and changed since then, making the paper make no sense until I thought to look at the history of changes to the structure. Unfortunately, git could not handle this at all when I tried it, taking days to construct the repo and then that repo was unbearably slow when I tried to use it.

Git would probably work well for storing the data used by most bench scientists, but for a computational chemist puking up gigabytes of data weekly on a single project, it is sadly horrible for handling the history of your data.

[1] http://osdir.com/ml/git/2009-05/msg00051.html

momerath · on May 9, 2013

You might find git-annex useful:

http://git-annex.branchable.com/

nitid_name · on May 9, 2013

As someone who, fresh out of high school, coded for a quite published astrophysicist at a major government research institution, I can confirm that I had no idea what I was doing.