This article is about the poor state of research in the software field. The 'bug...

SloopJon · on July 22, 2021

I don't recall encountering this specific 100x claim, but I do recall Steve McConnell making a similar case early in Code Complete. In chapter three, he has a chart showing that the average cost of fixing defects post-release is 10-100 more expensive than during requirements, which he supports with the following citations:

"Design and Code Inspections to Reduce Errors in Program Development" (Fagan 1976)

Software Defect Removal (Dunn 1984)

"Software Process Improvement at Hughes Aircraft" (Humphrey, Snyder, and Willis 1991)

"Calculating the Return on Investment from More Effective Requirements Management" (Leffingwell 1997)

"Hughes Aircraft's Widespread Deployment of a Continuously Improving Software Process" (Willis et al. 1998)

"An Economic Release Decision Model: Insights into Software Project Management" (Grady 1999)

"What We Have Learned About Fighting Defects" (Shull et al. 2002)

Balancing Agility and Discipline: A Guide for the Perplexed (Boehm and Turner 2004)

I only have one of these (Dunn), and it's boxed up in the attic, so I can't readily check their sources, but I somehow doubt that all of these simply launder the study under discussion.

I don't want to trivialize some of the good points that Hillel Wayne makes about soft-science research applied to software productivity, but he would have us dismiss all of these citations out of hand, simply because they predate capital-A Agile, which of course changes everything. That doesn't strike me as a particularly compelling approach either.

jt2190 · on July 22, 2021

> … he would have us dismiss all of these citations out of hand…

Does he? I thought the point of his talk [1] was that we developers ascribe way too much weight to few, small studies. So rather than saying that we should dismiss the claims, we should instead take great care because the claim may not be generalizable at all.

[1] https://www.hillelwayne.com/talks/what-we-know-we-dont-know/

SloopJon · on July 22, 2021

This is what I'm referring to:

> A lot will be from before 2000, before we had things like "Agile" and "unit tests" and "widespread version control", so you can't extrapolate any of their conclusions to what we're doing. As a rule of thumb I try to keep to papers after 2010.

While I admit that even in the late 90s, it seemed strange to me to see citations from the 70s or 80s about projects from the 60s (like, say, OS/360), I'm not convinced that so much has changed in the last ten to twenty years as to render all previous research irrelevant.

hwayne · on July 22, 2021

Yeah I'll admit it was an off-the-cuff quip that really isn't all that accurate. I don't put as much time into editing the newsletter as I do into my proper essays, so stuff that I'd normally polish out gets through. I do prefer to keep to papers after 2000 in general, less because of dramatic quality differences and more because it leaves fewer ways for people to dismiss stuff without looking at it.

tene · on July 22, 2021

It's also bizarre to see claims that unit tests are so new. I can't say I really know about other communities, but Perl at least was doing a lot of unit testing using the Test Anything Protocol (TAP) back in 1987.

https://en.wikipedia.org/wiki/Test_Anything_Protocol#History

duped · on July 22, 2021

If anything it raises a much better question that a survey of past research might help answer, is there a difference in productivity over the last 20 years since Agile/source control/unit testing became popularized?

tibbetts · on July 22, 2021

I enjoyed the book _Leprechauns of Software Engineering_ which did track down all the chains of citations to find where the original work was misunderstood or even nonexistent. I would bet that it covers most or even all of these citations, but I’m not taking the time to pull it out and cross reference. https://leanpub.com/leprechauns

SloopJon · on July 22, 2021

One of the most recent citations (Shull et al. 2002) is freely downloadable:

https://www.computer.org/csdl/proceedings-article/metrics/20...

Its own citations may not be satisfying, but I find it nevertheless interesting. Here's the summary of the eWorkshop discussion of "Effort to find and fix":

> A 100:1 increase in effort from early phases to post-delivery was a usable heuristic for severe defects, but for non-severe defects the effort increase was not nearly as large. However, this heuristic is appropriate only for certain development models with a clearly defined release point; research has not yet targeted new paradigms such as extreme programming (XP), which has no meaningful distinction between "early" and "late" development phases.

jdlshore · on July 22, 2021

Yes, and it’s worth noting that Hillel was referring to _Leprechauns_ when he talked about the cost to fix claim.

nerdponx · on July 22, 2021

Heck, the subtitle even says it: "It's probably still true, though, says formal methods expert"

What's more interesting anyway is the circumstances around the nonexistence of the study. It's not a case of falsified research! It's a case of someone citing apocrypha as fact, and then other people citing that citation, etc. And nobody questioned it because it's so obviously true!

AndrewDucker · on July 22, 2021

Is it obviously true though?

More expensive, sure? But 100x? Is it not 50x? Or 300x? Without some actual measurements I'd want to put some very very big error bars on my feelings there!

laurent92 · on July 22, 2021

Exactly: If bugs cost 50x more in production, but only bugs that are relevant for the core product are reported , then I’ve won 90% of the development time. That’s what I did for my startup (no tests, no 3-tier), and it was extremely valuable seeing all reports focus on a few areas, half of the reports being requests for improvement, extremely useful in determining 90% of the critical path and proving the ballpark of revenue that people would put into it, and since we can now hire, we’re now fully 3-tier with modern tech an all.

Good management is judging where to put the team’s energy.

wongarsu · on July 22, 2021

I would be very surprised if there is a meaningful number without adding much more context. Some early stage startup might be able to fix a low impact bug in production at only 10x cost, but I wouldn't be surprised if the two bugs exhibited in the first test flight of goings Boing's Starliner crew capsule have cost them over a million times as much as fixing them earlier, with the retests, the massive timeline slip, and all the knock on effects from the delay that majorly benefited their most promising competitor

IncRnd · on July 22, 2021

It is a case of drawing something to a large extension in order to demonstrate a point. This is a rhetorical device and people shouldn't read the number 100 as literal.

We could go so far as to say that focusing on the number 100 is not seeing the forest for the trees. Even though there isn't a literal trees or forest, and there is no need to create a study to find the trees or the forest.

throwawayboise · on July 22, 2021

And it all depends on how you define "bug."

Misallocating memory? Use after free? Off-by-one? Probably cheap to fix even in production.

A misunderstood or missed requirement? Sure, that could be very expensive if it affects a data structure or some other interface with other systems that then also have to change. But that goes beyond what I normally think of as a "bug".

AndrewDucker · on July 22, 2021

That depends on how expensive it is to get your code into production, how many other dependencies need to move in lockstep, how much manual regression testing there is, etc.

Fixing a ten-second bug on my PC might easily be 100x cheaper than organising a release.

taeric · on July 22, 2021

That subtitle really just reads as a doubling down on information that is in dispute.

That is, we have research that shows you strengthen beliefs by weakening evidence.

Now, to your point. This weakened the evidence, it did not falsify it. Does feel very similar.

Arnt · on July 22, 2021

I came across it in a book that we were supposed to read at university thirty years ago, but the book was outrageously expensive. It cost more than all of my other textbooks put together, and the university book shop didn't have it and told me I'd have to wait months.

It's probably different now that I can click around a little on some web sites and three days later a book arrives from New Zealand. But that was thirty years ago.

Can't remember how I read that book, but I did and I know that most others in the course didn't. I find it very easy to believe that people might assume things about what it said rather than actually read the text, and hand it down to others as fact. And so what's illustrative anecdote transforms into assumedly well-researched fact.

Arnt · on July 22, 2021

Replying to myself... I wonder what book that was. It was a large slim hardback published sometime in the 1970s, around A4, so much earlier than the 1987 book. I don't have any course materials any more.

As I recall, they argued that making a change in the requirements specification was ~10 times cheaper than finding and fixing a bug while writing the implementation, which in turn was ~10 times cheaper than fixing it after delivery, ie. on other people's production computers at other site. There was some argument around it, but the text didn't imply exactness, as far as I can remember.

whoisthemachine · on July 22, 2021

Truly seems to get to the root of some of software's troubles...

jbellis · on July 22, 2021

It's buried a bit, but it actually does kind of refute that.

> Here is a 2016 paper [PDF] whose authors "examined 171 software projects conducted between 2006 and 2014," all of which used a methodology called the Team Software Process. The researchers concluded that "the times to resolve issues at different times were usually not significantly different."

Link is to https://arxiv.org/pdf/1609.04886.pdf

qayxc · on July 22, 2021

Except that "time to fix" isn't the only factor that contributes to cost once a bug makes it into production.

There's damages caused by the bug, possible downtime due to deployment, testing and possibly certification, a potential need for data migration including dry-runs and more. These are just direct costs, too, there's also potential damage to reputation, cancelled orders, etc.

It shouldn't really surprise anyone that it doesn't really matter when you fix the bug in terms of effort required to just fix the bug itself. It's the process involved in communicating, planning and performing the rollout of fix that might sting real hard.

toast0 · on July 22, 2021

It comes down to:

What's the cost of testing/specifying/formal methods/etc to a level that catches a bug before production vs what's the cost to fix a bug and its downstream effects.

If I work in an unregulated environment with on-demand deployment, the cost to fix a bug isn't that big, except for bugs that persist bad state/data and especially things involved with schema changes; those changes that are costly to fix necessitate costly testing. If I produce a game rom for unconnected systemsin million+ unit batches which sit on shelves for unknown time for people to use, it would be very costly to fix bugs and a costly test procedure for the whole thing makes sense.

If it's life safety, then yeah, lots of testing (and don't hire me)

cestith · on July 22, 2021

All of that is true. Also, deployment from the cited date range of 1967 to 1981 didn't mean the same for a lot of software as it does today. For an integrated mainframe and minicomputer hardware and software business with a bunch of isolated customer sites and huge support contracts, we're talking customer outreach and physical media at the barest minimum. There's no "push to production" automated pipeline that publishes to a web server or package update that gets picked up from the repo by the next system cron at the customer's site over a fiber connection.

gregmac · on July 22, 2021

A significant issue with this is simply designing a study or a way of measuring.

First of all, define "bug". Is that a ticket filed in a system? What about the "bug" that was found and fixed by the developer testing it a few minutes after finishing the new feature (and thus never had a ticket filed)? What about QA finding a problem with a new feature while testing it (and sending the feature ticket back to the developer)?

Consider: Filing a bug ticket itself takes a bunch of time. If you find and fix a bug early enough then even the overhead time of filing/verifying/closing the ticket isn't spent. How do you measure the time compared to a bug that effectively didn't exist?

You can't do any like-for-like tests: two developers could take significantly different amounts of time to fix the same bug. If essentially the same bug occurs again in another area of code, the fix still can't be compared: if the developer applies essentially the same fix, it'll be much faster than the first time; if they apply a more systematic fix like refactoring the code to avoid that type of bug, that's not comparable.

There's also some bias that will make production bugs take longer. A team that has good quality controls (unit and automated testing, QA team, staging environments) will likely catch a lot of bugs before they get to production. As a result, the bugs that do end up in production are more likely to be complicated ones -- such as requiring several obscure conditions to be met, or only happening at large scale -- and those naturally take more time to address, even if in some cases the time is just spent figuring out the reproduction steps or replicating the environment.

This all makes comparing overall trends pretty difficult -- maybe to the point of being entirely useless.