When the world calls for blood against your organization, it's a test of the organization's character: will they throw a scapegoat under the bus (even if there is a directly responsible person) or will they defend their staff, accept fault, and demonstratively improve process?
More importantly, the companies that enabled auto update from a vendor to production rather than having a validation process. This sort of issue can happen with any vendor, penalising the vendor won't help with the next time this happens.
It’s both. If you’re an engineer and you push out shitty code that takes down 911 systems and ambulances, you f’ed up. Push back against processes that cause harm, or have the potential to cause harm. You are ultimately responsible for your actions. No one else. The excuse of “I was just following orders” has been dead and buried since WW2.
Yeah, ideally management should know better. But management aren’t usually engineers. Even when they are, they don’t deal with the code on a day to day basis. They usually know much less about the actual processes and risks than the engineers on the ground.
if one of the people i manage is not up to the task the fault is mine. I've hired them. I should setup a system of hard gained trust and automation to avoid or at least minimize them fucking up. When fuckups happen, they are my fuckups. Critical systems don't survive only on trust, obviously. If I don't setup the teams and the systems properly, my bosses will also take the blame for having put me in that position.
I'm not advocating for lower layers to avoid responsabilities. But if an head needs to roll you should look above. That said, peole are hardened by fuckups, so there are better solutions than rolling heads, usually.
Right. In one sense, what we're talking about is different ideas on how companies / teams work. There's a wonderful book called "Reinventing Organizations" by Laloux that I recommend to basically everyone. In the book, the authors lay out a series of different organisational structures which have been invented and used throughout the ages. The book talks about early tribes where the big man tells everyone what to do (eg mobsters), to rigid hierarchies + fixed roles (the church, schools) to modern corporations with a flexible hierarchy, and some organisation structures beyond that.
The question of "who is ultimately responsible" changes based on how we see the organisation. In organisations where the chief decides everything, its up to the chief to decide if they should place blame on someone or not. In a modern corporation, people at the bottom of the hierarchy are shielded from the consequences of their actions by the corporation. But there's also a weird form of infantilisation that goes along with that. We don't actually trust people on the ground to take responsibility for the work they do. All responsibility goes up the management hierarchy, along with control, power and pay. Its sort of assumed that people who haven't been promoted are too incompetent to make important choices.
I don't think thats the final form of how high functioning teams should work. Its noble that you're willing to put your head on the chopping block, but I think its also really important to give maximal agency to your employees. And that includes making people feel responsible and empowered to fix problems when they see them. You get more out of people by treating them like adults, not children. And they learn more, and I think that's usually, in the long run, better for everyone.
I agree that if a company has a bad process, employees shouldn't be fired over it. But I also think if you're an employee in a company with a bad process, you should fight to make the process better. Never let yourself be complicit in a mistake like this.
> It’s both. If you’re an engineer and you push out shitty code that takes down 911 systems and ambulances, you f’ed up.
This is wrong. If a company is developing that kind of software is the responsibility of the company to provide a certain level of QA before they release software. And no, it's not that "engineers are pushing out shitty code", but that the shitty company allows shitty code to be deployed in customers' machines.
Many major companies have post-mortem reviews for this kind of thing. Most of the big failures we see is a mix of people being rushed, detection processes failing, a miscommunication/misunderstanding of the effects of a small change.
One analogy is rounding - one rounding makes no difference to a transaction, but multiple systems rounding the same direction can have a large scale impact. It's not always rounding money - it can be error handling. A stops at the error, B goes on, turns out they're not in sync.
Which guy is it? The person who pressed the button? The manager who gave that person more than one task that day? The people who didn't sufficiently test the detection process? The people who wrote the specs without sufficient understanding of the full impact? The person who decided to layoff the people who knew the impact three months ago?