More

etal · on Aug 27, 2012

We should be clear which of two kinds of scientific code we're talking about:

1. A program that implements a new technique which forms an important part of a research project. Maybe a program that is the research project, which will be described in a paper.

No doubt this code should be included with the publication, no matter how "ugly" it is. Some journals, e.g. Bioinformatics, already require that an article about software must include the software itself. This is the stuff the Bioinformatics Testing Consortium would run a smoke test on, because amazingly, a lot of programs that have been written up as journal articles just don't compile or work at all on somebody else's machine; many articles don't include the source code, and some don't even say how to get a redistributable binary. That's wrong, and we can fix it.

2. The mountain of single-use scripts and shell commands that are used in a research project that's not really about software at all, only a small fraction of which produce some output that the scientist follows up on.

Key points: (1) this code is very unlikely to work on anyone else's machine as-is; (2) crucial parts of these pipelines are lost in the Bash history, or were executed on a 3rd-party web server, or depend on a data set on loan from a collaborator who is not ready to release the data yet; (3) almost all of the code is dead; (4) whatever comments or notes exist are usually misleading or completely wrong.

As an example of what can go wrong when this code is released as-is, remember when the East Anglia Climate Research Unit "hide the decline" stuff hit the fan? It wasn't clear which code was dead, the comments made no sense, and people freaked because they couldn't be sure how the published results came out of that godawful mess. The eventual solution, way too late, was to make a proper open-source, openly developed software project out of the important bits. That, in a nutshell, is why scientists won't release ALL the code -- even the hard drive itself is not the whole story; the scientist still needs to be available to explain it and navigate over the red herrings. And getting code into a state where it's self-explanatory takes time.

anamax · on Aug 27, 2012

> That, in a nutshell, is why scientists won't release ALL the code -- even the hard drive itself is not the whole story; the scientist still needs to be available to explain it and navigate over the red herrings.

If said scientist can't do that, how does anyone know what was actually run?

etal · on Aug 27, 2012

That's why we write papers. Plain English can be more coherent than a pile of code.

anamax · on Aug 27, 2012

> That's why we write papers. Plain English can be more coherent than a pile of code.

"Plain english" doesn't analyze data - software does.

If the software is a mess, how likely is it that the "plain English" description is correct? How do you know? Why should anyone believe that the description is correct?

Code is truth.

etal · on Aug 27, 2012

Right, which is why the novel parts should get more attention and undergo code review, which is the goal of the Bioinformatics Testing Consortium.

To be clear, I'm all for open science and even open notebooks where it's a good fit for the project. I just don't think a pile of single-use scripts is a sufficient replacement for a clear English description of the analysis workflow and the reasons for each step. If I can't understand how an analysis was done from the article itself and the documentation for any associated software, I would not trust the article. Including more code, particularly the code further down the Pareto curve of relevance to the final article, does not make the article more correct -- most journal articles are wrong or flawed in some way, even if the code works as advertized.

etal · on Feb 16, 2012

If pre-publication peer review goes out of fashion, then another possibility if that the brand of major journals becomes more important. We still need a quick gauge of the quality of an article, other than its Google rank or number of page views. Nature can retract popular articles that are later proven flawed; I don't think Google would attempt to wield that kind of authority.

Relevant example: You published these two posts in TechCrunch to get a wide audience. (And I'm glad you did!) I read them partly because they appeared in TechCrunch.

etal · on Feb 16, 2012

The major non-governmental funding agencies recently banded together to solve the problem roughly the way you suggest, by creating their own open-access journal which they will enourage their grantees to submit their work to. It will be called eLife:

http://www.hhmi.org/news/elife20111107.html

It would probably be considered dubious/anti-competitive if NIH and NSF launched their own journals, but because of the Open Access Initiative (which RWA attempts to reverse), NIH is able to host articles that have already been released to the public via PubMed Central.

etal · on Feb 16, 2012

Apparently it brought NPG back to the negotiating table:

http://blogs.nature.com/news/2010/08/nature_and_california_m...

Unclear on the details, but presumably UC got a somewhat better price. (Note that UC was getting a better price than most libraries to begin with.)

SeanLuke · on Feb 16, 2012

I've seen no article indicating anything actually came out of this. Ongoing? Fizzled out? CA backed off?

etal · on May 29, 2011

It's probably being poor, which also correlates with worse working environments. Remember, "drive until you qualify."

etal · on May 4, 2011

Unladen Swallow isn't dead, actually -- it just merged into the main CPython code base. The first few quarters of optimizations are in Python 2.7 (it's noticeably faster than Python2.6) and the more adventurous bits are on separate branches in SVN.

etal · on Jan 3, 2011

This is also called "flow" or being "in the zone" -- focusing on one thing, intensely, without interruptions. It's one more reason to lump programming in with the other creative arts.

etal · on Nov 5, 2010

It looks like Wayland isn't quite aiming to replace X, but it's close -- X will run on top of Wayland (one day).

More about the architecture:

http://wayland.freedesktop.org/architecture.html

etal · on Oct 3, 2010

LTS versions have stable patch-level releases every 6 months while they're supported (e.g. 10.04.1 was released in July, and there will probably be a 10.04.2 in January), so they're able to add drivers to the installer when they become available.

etal · on Oct 3, 2010

The word you probably want is "truthiness".

If there's a fundamental difference between the weaselly narratives constructed by Fox News and the psychedelic screeds Thompson put out, it's that most reporters aren't making it explicit that their stories are fully personal, opinionated interpretations of true events -- they record some isolated facts, sample a few quotes and make vague references to public sentiment to back up any narrative they need. But they present all of this as objective information. This was happening well before H.S.T. (see "yellow journalism") and happens outside the U.S. too (see Daily Mail).

Thompson's approach was (1) a veil of entertaining literary showmanship over (2) complete, self-accountable interpretations of the events being covered. He was clear that his stories were subjective, and that freed him to explain exactly why he felt the way he did about Nixon, drug laws, Southern culture, etc.

jbooth · on Oct 3, 2010

The other fundamental difference is that Thompson was driven by actually trying to express the truth of the situation as he saw it, by piecing together things that by themselves wouldn't add up to "journalistic integrity" as defined at georgetown cocktail parties.

He once wrote a lengthy, 15-page feature piece for Rolling Stone about how the front-runner for the Democratic nomination in 72, Muskie I believe, was addicted to an obscure stimulant found in an African root. The whole tale was entirely fabricated, and he never let on that he was joking. I'm pretty sure it would have qualified as libel. But in the process of telling the (deprave) story he managed to pinpoint everything wrong with Muskie's campaign at the time. Muskie sank to those very weaknesses (basically being a weakling/faker who was led around by his staff, in HST's estimation), and lost the sure thing nomination to a nobody named McGovern.

HST also once shaved his head before a debate while running for Sheriff of some county out in Colorado. He then spent the whole debate referring to the Republican in the race, clean-cut guy with a crew cut, as "my long-haired opponent". That one's not as profound, but it's hilarious. And says something about the media as well.