> I was partly expecting the rest of the article to explain to my why exactly it...

depereo · on Nov 22, 2022

Slack space leads to innovations, like developing infrastructure automation and improving capacity planning. SRE as a practice needs slack space for operations teams to work on improvements and fixes in addition to BAU fault fixing, deployments and patching.

bjarneh · on Nov 22, 2022

> Slack space leads to innovations

I'm certainly no supporter of "lean operations" with minimum staff etc, and fully agree that you need people that are well rested (for the lack of a better analogy) to do great stuff. But I do think that some of these internet giants do have to many people working there; wasn't LinkedIn 14_000 strong when Microsoft bought it?

I've always felt that the American model of doing business is based on how we optimize network traffic, i.e. double the amount of data until failure; then turn in down a bit. Fire on all cylinders until people are truly worn out, then turn down the pace a bit. Haven't worked in the US so I'm pulling this info out of thin air...

pas · on Nov 22, 2022

Well, half of the staff prepares for some conference, meetup, tech talk, or helps organize one, or does 20% time, or sits in unnecessary meetings, or sits in necessary but inefficient meetings, or is on PTO or on unpaid leave in some retreat.

Plus above a certain headcount the communication overhead becomes seriously large, so just to compensate for the lost velocity you need to break out into smaller more agile more autonomous teams, which further increases the coordination requirements (thus the comms overhead), but allows overall throughput to scale.

And the leading edge technologies commonly used require a large headcount to begin with. (I mean just to start running something twitter/linkedin sized requires at least 1 engineer/million users, so a few hundred folks is a given. You need someone who understands networking, from BGP to TLS to VPN to whatever, internal IT, CI/CD, ops/SRE ... at that scale if you use anything, you need an expert for it. You use Kafka with a hundred millions of users? You might need at least a few people who actually know what the fuck a partition means. Unless you want to just directly give all of your money to Jeff in the form of egress fees, you might need folks to setup CDNs, and whatnot.

So without naming names, ina big password manager company (around a hundred million users?) a few years ago there was a certain rewrite project. 3+ people worked on it for 8-10 months, then it was put on hold temporarily. And then obviously nobody speak of it ever.

It happens that there are inefficiencies that for months not one line goes into production from certain individuals.

It was bad management, yes. But if good management was easy to find then we would be talking about different things :)