"no manager gets dinged for "internet-wide" outages"
Kind of like, nobody gets fired for hiring IBM, or using SAP. They are just so big, every manager can say, "look how many people are using them, how was I supposed to know they are crap".
But, seems like for uptime, someone should be identifiable. If your job is uptime, and there is a world wide outage, I'd think it would roll down hill onto someone.
> Kind of like, nobody gets fired for hiring IBM, or using SAP. They are just so big, every manager can say, "look how many people are using them, how was I supposed to know they are crap".
I wouldn't necessarily say IBM or SAP are "crap". It's much more likely that orgs buying into IBM or SAP don't the due diligence on what the true costs to properly set it up and keep it running, therefore cut tons of corners.
They basically want to own a Ferrari and when it comes to maintenance, they want run Regular gas and try to get their local mechanic to slap Ford parts on it because its too expensive to keep going back to the dealership.
The thing is usually this argument goes something like this:
A: Should prod be running a failover / <insert other safety mechanism>?
B: Yes!
A: This is how much it costs: <number>
B: Errm... Let me check... OK I got an answer, let's document how we'd do it, but we can't afford the overhead of an auto-failover setup.
And so then there will be 2 types of companies, the ones that "do it properly" will have more costs, their margins will be lower, over time they'll be less successful as long as no big incident happens. When a big incident happens though, for most businesses - recent history proves that if everyone was down, nobody really complains. If your customers have 1 vendor down due to this issue, they will complain, but if your customers have 10 vendors down, and are themselves down, they don't complain anymore. And so you get this tragedy of the commons type dynamic where it pays off to do what most people do rather than the right thing.
And the thing is, in practice, doing the thing most people do is probably not a bad yardstick - however disappointing that is. 20 years ago nobody had 2FA and it was acceptable, today most sites do and it's not acceptable anymore not to have it.
Parents may teach this to kids but the kids usually notice their parents don't practice what they preach. So they don't either.
The world is filled with people following everybody else off a cliff. If you're warning people or even just not playing along in a time of great hysteria, people at best ignore your warnings and direct verbal abuse at you. At worst, you can face active persecution for being right when the crowd has gone insane. So most people are cowards who go along to get along.
Kind of like, nobody gets fired for hiring IBM, or using SAP. They are just so big, every manager can say, "look how many people are using them, how was I supposed to know they are crap".
But, seems like for uptime, someone should be identifiable. If your job is uptime, and there is a world wide outage, I'd think it would roll down hill onto someone.