A couple of quick notes, from someone who has actually put this to practice — and in a non-manufacturing context, to boot!
(From a brief reading of this thread, it seems like kqr, jacques_chester, and I are the only ones who have put this to practice in non-manufacturing contexts — though correct me if I'm wrong.)
The bulk of the debate in this HN thread seems to be centred around what is or isn't a 'stable process'. I think this is partially a terminology issue, which Donald Wheeler called out in the appendix of Understanding Variation. He recommends not using words like 'stable' or 'in-control', or even 'special cause variation', as the words are confusing ... and in his experience lead people to unfruitful discussions.
Instead, he suggests:
- Instead of calling this 'Statistical Process Control', call this 'Methods of Continual Improvement'
- Use the term 'routine variation' and 'exceptional variation' whenever possible. In practice, I tend to use 'special variation' in discussion, not 'exceptional variation', simply because it's easier to say.
- Use the term 'process behaviour chart' instead of 'process control chart' — we use these charts to characterising the behaviour of a process, not merely to 'control' it.
- Use 'predictable process' and 'unpredictable process' (instead of 'stable'/'in-control' vs 'unstable'/'out-of-control' processes) because these are more reflective of the process behaviours. (e.g. a predictable process should reliably show us data between two limit lines).
Using this terminology, the right question to ask is: are there processes in software development that display routine variation? And the answer is yes, absolutely. kqr has given a list in this comment: https://news.ycombinator.com/item?id=39638491
In my experience, people who haven't actually tried to apply SPC techniques outside of manufacturing do not typically have a good sense for what kinds of processes display routine variation. I would urge you to see for yourself: collect data, and then plot it on an XmR chart. It usually takes you only a couple of seconds to see if it does or does not apply — at which point you may discard the chart if you do not find it useful. But you should discover that a surprisingly large chunk of processes do display some form of routine variation. (Source: I've taught this to a handful of folk by now — in various marketing/sales and software engineering roles —and they typically find some way to use XmR charts relatively quickly within their work domains).
[Note: this 'XmR charts are surprisingly useful' is actually one of the major themes in Wheeler's Making Sense of Data — which was written specifically for usage in non-manufacturing contexts; the subtitle of the book is 'SPC for the Service Sector'. You should buy that book if you are serious about application!]
I realise that a bigger challenge with getting SPC adopted is as follows: why should I even use these techniques? What benefits might there be for me? If you don't think SPC is a powerful toolkit, you won't be bothered to look past the janky terminology or the weird statistics.
So here's my pitch: every Wednesday morning, Amazon's leaders get together to go through 400-500 metrics within one hour. This is the Amazon-style Weekly Business Review, or WBR. The WBR draws directly from SPC (early Amazon exec Colin Bryar told me that the WBR is but a 'process control tool' ... and the truth is that it stems from the same style of thinking that gives you the process behaviour chart). What is it good for? Well, the WBR helps Amazon's leaders build a shared causal model of their business, at which point they may loop on that model to turn the screws on their competition and to drive them out of business.
But in order to understand and implement the WBR, you must first understand some of the ideas of SPC.
If that whets your interest, here is a 9000 word essay I wrote to do exactly that, which stems from 1.5 years of personal research, and then practice, and then bad attempts at teaching it to other startup operator friends: https://commoncog.com/becoming-data-driven-first-principles/
I don't get into it too much, but the essay calls out various other applications of these ideas, amongst them the Toyota Production System (which was bootstrapped off a combination of ideas taught by W Edwards Deming — including the SPC theory of variation), Koch Industries's rise to powerful conglomerate, Iams pet foods, etc etc.
> (From a brief reading of this thread, it seems like kqr, jacques_chester, and I are the only ones who have put this to practice in non-manufacturing contexts — though correct me if I'm wrong.)
And roenxi.
> So here's my pitch: every Wednesday morning, Amazon's leaders get together to go through 400-500 metrics within one hour.
Amazon's core value proposition is they maintain a large and very physical fleet of machines that they rent out. With serious standards for up-time that they can take real pride in.
They don't sell themselves as a software house. I'm sure they have tentacles everywhere and they aren't bad at it (if anything I'd expect them to be pretty good on a given project), but they've greatly benefited from using other people's software - they don't have their own DB for example, they reuse others and have a couple of PostgreSQL forks for more at-scale use cases.
I'm sure they get huge value from SPC (anything physical generally benefits from it), and I'm sure they use SPC for software out of reflex; but it doesn't follow that it is driving productive behaviour in the software branch of the business. A fleet of ~infinite servers benefits from controlling 400 metrics. Software development does not.
What would you say if I told you Bryar has lots of stories of this style of thinking applied in early Amazon? This is pre-AWS Amazon, mind you — where they were trying to figure out how to build e-commerce web software at scale, from scratch. Granted, the bulk of their process control was directed at customer-facing controllable input metrics, but the software engineers were as much a part of it as the operational folks.
(To be fair to you, you are adamant that SPC does not apply to software development — which I take to mean measuring the productivity or act of building software. And I think we are all in agreement there! (That said, like kqr and jacques_chester, I want to believe that this has not been sufficiently explored) But it's not true that SPC has no place in software development — one way I've used this is that because XmR charts detect changes in variation, you can use it in a customer-facing software context to see if a feature change has resulted in user behaviour change without running an A/B test. Naturally, it makes sense to have the software engineer be responsible for observing this behaviour change themselves, since XmR charts are easy enough for the layman to use, and it gives them a sense of ownership for the feature or change. Some detail (on usage vs A/B tests) here: https://commoncog.com/two-types-of-data-analysis/)
Saw this on twitter...I actually think SPC can apply to Software Development in that the concept of normal variation, and being able to understand and measure the range, can be pretty useful. More detailed comment here if interested...
Very interesting to get the perspective of someone who did thisbin a non-manufacturing evironment. One interesting bit, for someone like me who knows SPC from manufacturing related processes, are the discussions around what a stable process is. Because I cannot remember a single of those discussions ever in manfucturing related fields. Intriguiging, especially since on HN sometimes discussion miss the point by turning into disputes about the exact definition of a term, something that sounds very similar to the "misunderstandings" about stuff like special-casue variation you described.
Edit: Fully agree on the Amazon style WBR, what you said is exactly what is happening at Amazon. Daily during Q4 peak for a large enough subset of metrics.
(From a brief reading of this thread, it seems like kqr, jacques_chester, and I are the only ones who have put this to practice in non-manufacturing contexts — though correct me if I'm wrong.)
The bulk of the debate in this HN thread seems to be centred around what is or isn't a 'stable process'. I think this is partially a terminology issue, which Donald Wheeler called out in the appendix of Understanding Variation. He recommends not using words like 'stable' or 'in-control', or even 'special cause variation', as the words are confusing ... and in his experience lead people to unfruitful discussions.
Instead, he suggests:
- Instead of calling this 'Statistical Process Control', call this 'Methods of Continual Improvement'
- Use the term 'routine variation' and 'exceptional variation' whenever possible. In practice, I tend to use 'special variation' in discussion, not 'exceptional variation', simply because it's easier to say.
- Use the term 'process behaviour chart' instead of 'process control chart' — we use these charts to characterising the behaviour of a process, not merely to 'control' it.
- Use 'predictable process' and 'unpredictable process' (instead of 'stable'/'in-control' vs 'unstable'/'out-of-control' processes) because these are more reflective of the process behaviours. (e.g. a predictable process should reliably show us data between two limit lines).
Using this terminology, the right question to ask is: are there processes in software development that display routine variation? And the answer is yes, absolutely. kqr has given a list in this comment: https://news.ycombinator.com/item?id=39638491
In my experience, people who haven't actually tried to apply SPC techniques outside of manufacturing do not typically have a good sense for what kinds of processes display routine variation. I would urge you to see for yourself: collect data, and then plot it on an XmR chart. It usually takes you only a couple of seconds to see if it does or does not apply — at which point you may discard the chart if you do not find it useful. But you should discover that a surprisingly large chunk of processes do display some form of routine variation. (Source: I've taught this to a handful of folk by now — in various marketing/sales and software engineering roles —and they typically find some way to use XmR charts relatively quickly within their work domains).
[Note: this 'XmR charts are surprisingly useful' is actually one of the major themes in Wheeler's Making Sense of Data — which was written specifically for usage in non-manufacturing contexts; the subtitle of the book is 'SPC for the Service Sector'. You should buy that book if you are serious about application!]
I realise that a bigger challenge with getting SPC adopted is as follows: why should I even use these techniques? What benefits might there be for me? If you don't think SPC is a powerful toolkit, you won't be bothered to look past the janky terminology or the weird statistics.
So here's my pitch: every Wednesday morning, Amazon's leaders get together to go through 400-500 metrics within one hour. This is the Amazon-style Weekly Business Review, or WBR. The WBR draws directly from SPC (early Amazon exec Colin Bryar told me that the WBR is but a 'process control tool' ... and the truth is that it stems from the same style of thinking that gives you the process behaviour chart). What is it good for? Well, the WBR helps Amazon's leaders build a shared causal model of their business, at which point they may loop on that model to turn the screws on their competition and to drive them out of business.
But in order to understand and implement the WBR, you must first understand some of the ideas of SPC.
If that whets your interest, here is a 9000 word essay I wrote to do exactly that, which stems from 1.5 years of personal research, and then practice, and then bad attempts at teaching it to other startup operator friends: https://commoncog.com/becoming-data-driven-first-principles/
I don't get into it too much, but the essay calls out various other applications of these ideas, amongst them the Toyota Production System (which was bootstrapped off a combination of ideas taught by W Edwards Deming — including the SPC theory of variation), Koch Industries's rise to powerful conglomerate, Iams pet foods, etc etc.