Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Behold, academia.

The only maintainers of this code, ever, have been grad students and postdocs. I estimate there have been about 12-15 generations worth. This code has supported hundreds of publications in its lifespan.

A codebase that began life in 1987, in C. First ported to matlab in 1999. First source control was added (as SVN) in 2015. Between 2015 and 2018, there were 6 commits total, yet 3 people graduated out of the lab from it. Probably 100,000 loc total, of which I estimate maybe a third is ever used. 1400-line matlab functions are normal-ish. I've found loops nested 11 levels deep.

It's a series of psychophysical experiments. Each experiment exists in at least 4 different versions side by side in source, each named slightly different, often by incorrect datestamp of last modification. Version control across machines is not well maintained, so you have to diff everything before you can copy or move files lest you accidentally blow something away completely.

Oh, and it's mexed and wrapped for use on a mac on exactly one snow leopard machine, hardware from 2007.

edit: I think this counts as a job, not a student experience, because I am not a student. I just have to clean this mess up once in a while.



Yeah. At this point I think teaching source control and abstraction should be part of the "the scientific method" part of the course


I think it is a problem in general with code for experiments. You just need to change a tiny bit for new experiment and you don’t want to ruin earlier experiment.


There's a very simple solution: guard your tiny bit of code with e.g. a command line flag that defaults to "off", and commit that.

It's a little more work sometimes, especially when your experiment changes things structurally, but it pays off over and over.


Isn't this what git branches are for?


No!

There's a great article somewhere about how the normal version control flow doesn't really work for this style of computing.

You want to keep both "versions" of code live and active in the same place at the same time (often in the same notebook).

People end up with methods named methodName, methodName2 etc, which isn't very good. But once you see the workflow you understand why normal version control doesn't work either.

There should be a solution to this, but AFAIK there isn't yet.


No, branches break down badly when you have many experiments. You end up with a bunch of incompatible versions of the system that you need to merge together later which can be a huge mess (depending on the size of the changes).

By all means, branches are great for super-prototypey early code, but once you know that you want to keep the ability to run the experiment around, guard it with a flag and merge it into mainline to avoid nightmare merges later!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: