To me, Cloud is all about the shift left of DevOps. It’s not a cost play. I’m a Dev Lead / Manager and have worked in both types of environments over the last 10 years. It’s immeasurable the velocity difference as far as system provisioning between the two approaches. In the hardware space, it took months to years to provision new machines or upgrade OSes. In the cloud, it’s a new terraform script and a CI deploy away. Need more storage? It’s just there, available all the time. Need to add a new firewall between machines or redo the network topology? Free. Need a warm standby in 4 different regions that costs almost nothing but can scale to full production capacity within a couple of minutes? Done. Those types of things are difficult to do with physical hardware. And if you have an engineering culture where the operational work and the development work are at odds (think the old style of Dev / QA / Networking / Servers / Security all being separate teams), processes and handoffs eat your lunch and it becomes crippling to your ability to innovate. Cloud and DevOps are to me about reducing the differentiation between these roles so that a single engineer can do any part of the stack, which cuts out the communication overhead and the handoff time and the processes significantly.
If you have predictable workloads, a competent engineering culture that fights against process culture, and are willing to spend the money to have good hardware and the people to man it 24x7x365 then I don’t think cloud makes sense at all. Seems like that’s what y’all have and you should keep up with it.
> In the hardware space, it took months to years to provision new machines or upgrade OSes.
If it takes this long to manage a machine, I strongly suspect it means that when initially designing the system engineers had failed to account for those for some reason. Was that true in your case?
Back in late '00s until mid '10s, I worked for an ISP startup as a SWE. We had a few core machines (database, RADIUS server, self-service website, etc) - ugly mess TBH - initially provisioned and originally managed entirely by hand as we didn't knew any better back then. Naturally, maintaining those was a major PITA, so they sat on the same dated distro for years. That was before Ansible was a thing, and we haven't really heard about Salt or Chef before we started to feel the pains and started to search for solutions. Virtualization (OpenVZ, then Docker) helped to soften a lot of issues, making it significantly easier to maintain the components, but the pains from our original sins were felt for a long time.
But we also had a fleet of other machines, where we understood our issues with the servers enough to design new nodes to be as stateless as possible, with automatic rollout scripts for whatever we were able to automate. Provisioning a new host took only a few hours, with most time spent unpacking, driving, accessing the server room, and physically connecting things. Upgrades were pretty easy too - reroute customers to another failover node, write a new system image to the old one, reboot, test, re-route traffic back, done.
So it's not like self-owned bare metal is harder to manage - the lesson I learned is that one just gotta think ahead of time what the future would require. Same as the clouds, I guess, one has to follow best practices or they'll end up with crappy architectures that will be painful to rework. Just different set of practices, because of the different nature of the systems.
Exactly this. It is culture and organisation (structure) dependent. I'm in the throes of the same discussion with my leader ship team, some of whom have built themselves an ops/qa/etc. empire and want to keep their moat.
Are you running a well understood and predictable (as in, little change, growth, nor feature additions) system? Are your developers handing over to central platform/infra/ops teams? You'll probably save some cash by buying and owning the hardware you need for your use case(s). Elasticity is (probably) not part of your vocabulary, perhaps outside of "I wish we had it" anyway.
Have you got teams and/or products that are scaling rapidly or unpredictably? Have you still got a lot of learning and experimenting to do with how your stack will work? Do you need flexibility but can't wait for that flexibility? Then cloud is for you.
n.b. I don't think I've ever felt more validated by a post/comment than yours.
I think I understand your point, and this is not directed at you personally, but: I think "shift left" is another one of those phrases that's lost all meaning, like "synergy" or "agile" before it.
My first job in tech was building servers for companies when they needed more compute, physically building them from our warehouse of components, driving them to their site, and setting it up in their network.
You could get same day builds deployed on prem with the right support bundle!
If you have predictable workloads, a competent engineering culture that fights against process culture, and are willing to spend the money to have good hardware and the people to man it 24x7x365 then I don’t think cloud makes sense at all. Seems like that’s what y’all have and you should keep up with it.