The problem with the "everyone" model being pitched here is that it may as well be a synonym for "nobody."
I've worked for a few orgs where quality and testing were "everyone's" responsibility and it ultimately led to everyone pushing it off their plates and lots of it simply not getting done. Why? We could collectively borrow against the future and "everyone" being responsible meant that nobody could be held accountable, as then the debate would be in deciding fractions of responsibility.
It also encouraged those with other incentives, like product, to lean heavily on that to ship more features over doing reliable tech work as they figured the debt would be someone else's problem down the road.
People have this naive idea that people who are given responsibility will step up. There are those that do, but the rest often see the far easier path of externalizing problems and frankly most jobs reward that as they don't see externalities well.
I would have it so that platform team is responsible for identifying and engineering is responsible for fixing it. I am not sure that either team would have the skills needed to prevent such things from happening, so perhaps canary deployments would be the way to go if it is a substantial risk in your domain.
> The problem with the "everyone" model being pitched here is that it may as well be a synonym for "nobody."
Can't agree with this enough!
Thanks for your inputs. A lot of it resonates with what I've observed which translates to the fact that this is as much a cultural/people problem as much it is a technical problem. If teams took ownership by just building visibility, then it'd be an easier problem to solve.
You bring up a good point of doing canary deployments for solving this problem. I'll check this out.
But its interesting that you say ".. if it is a substantial risk in your domain". Isn't this a problem that most engineering teams are struggling with, especially in last few years? Being part of a few DevOps meetups in my area(Seattle) for a while and having attended a bunch of conferences in last couple of year, I've noticed cost coming up as one of the most recurring discussion topics.
Just curious why cloud costs wont be a risk in any domain.
It is a risk for any company, but the possible harm is variable.
At a prior employer, cloud costs could have doubled or even gone up an order of magnitude and because the margins were so good and the tech costs so low, it wouldn't have mattered and may barely have been noticed. Compute wasn't a substantial business cost in any way, as customers were paying for domain expertise in the product.
At another prior employer, costs scaled with revenue pretty linearly, so while bad, it wouldn't be catastrophic before being noticed as it would also mean increased revenue.
However, for say a company that does video streaming where cloud costs are already enormous, poor cloud usage can cut months off runway. Same with AI, where the money is overwhelmingly being burned on compute.
Cloud waste can happen anywhere, but the harm can range from still a tiny number to destroying the ability to make payroll depending on what you are doing.
I've worked for a few orgs where quality and testing were "everyone's" responsibility and it ultimately led to everyone pushing it off their plates and lots of it simply not getting done. Why? We could collectively borrow against the future and "everyone" being responsible meant that nobody could be held accountable, as then the debate would be in deciding fractions of responsibility.
It also encouraged those with other incentives, like product, to lean heavily on that to ship more features over doing reliable tech work as they figured the debt would be someone else's problem down the road.
People have this naive idea that people who are given responsibility will step up. There are those that do, but the rest often see the far easier path of externalizing problems and frankly most jobs reward that as they don't see externalities well.
I would have it so that platform team is responsible for identifying and engineering is responsible for fixing it. I am not sure that either team would have the skills needed to prevent such things from happening, so perhaps canary deployments would be the way to go if it is a substantial risk in your domain.