You think like an engineer :-) I think cloud services are appealing for a couple...

Pyxl101 · on March 8, 2016

I don't think he thinks like an engineer (not to insult him), but I think he's overlooking the challenges that exist to running large-scale systems.

> Add a second for failover.

That's not how it works. How does the second server pick up where the first server left off, when the requests are stateful? How does it know when to take over? (What detects that and how?) Or if it's a human, how does that human make the judgment call?

Cloud services are absolutely appealing from an engineering perspective. I am capable of building an application and running it on a fleet of machines such that the infrastructure takes care of itself completely and 100% by itself. Essentially no failure mode can involve me, unless either (A) my application has a defect in its logic, in which case it's unavoidable under any solution (B) probably a lot of customers are similarly affected, and someone is working on it. The data stores that I use are hosted and will not be impacted by any typical failure mode. The application that I've designed is deployed onto machines automatically, and if any of them encounter a failure that takes them offline, they'll be replaced automatically in a way that's a non-event. Furthermore, I have a blueprint for such systems such that building a new application in this way requires no more effort than filling out a single web form, after which an automation layer sets everything up, from the source control to the continuous build and deployment onwards.

Not all systems are simple enough to fit into this model, but if you account for it while designing new software, quite a lot of it can be. With the right building blocks you can operate very operations-free infrastructure at the application layer even at large scale. These problems might start to look different once you consider challenges that require tens, hundreds, thousands, or more individual machines to solve.

Another challenge is ensuring that new systems are built in a way from the beginning that can scale and that will be easy for other people to maintain. You don't want some random new-hire at your company setting up the "opaque box from hell" that has all the data on its disk (and, ahem, a "second for failover", that through a mysterious and unknown mechanism somehow takes over for the first) and that no one else will be able to maintain during failure. You also don't want to be surprised a year or two down the line when you're scaling up for the Super Bowl or Tax Day or Black Friday, or whatever, and your software falls over because "Bob the Intern, who thought every software system can be solved with a single box and 100 lines of assembly code" didn't anticipate that the business would be larger one day. If you standardize onto a certain set of building blocks, you can build software that, in addition to being operations-free, can scale to a large degree.

Note: I'm speaking hyperbolically. No software is truly operations-free. It's all a matter of degree. I have worked however with teams that are expected to hand-manage their own machines, and teams that use effective cloud automation, and the pattern I've seen is a huge difference in their operational load from routine engineering tasks. I have also never seen systems that completely 100% automatically scale themselves, but there is still a huge difference between systems where you simply say "I want 10 machines" / "I want 10 requests per second to my data store", and a system like, "I guess it's time to tell our DBA to figure out how to shard our MySQL server".

I also think that a large company could achieve a lot of the benefits provided by the cloud in-house, with its own data centers, if it had really strong internal automation and APIs, and strong teams who operate shared services for other teams (e.g., managed data layers). The really strong software I'm aware of for solving these problems has been kept as a competitive advantage, not released open source. So the practical reality is that few companies have through-and-through automation of the quality and degree that you can get from top cloud providers.

> a) You free the capital, and the balance sheet looks nicer.

Yes, I agree that this is a genuine reason for some companies to prefer the cloud, but these are largely not companies who have "top brass". Companies that are large enough to be "top brass" are typically not so capital-constrained as that this would majorly affect their purchasing decisions. The companies that care about this, rather, tend to be smaller and more capital-constrained.

I think the more mundane reason why companies prefer the cloud, which is frequently overlooked, is that individual business units can purchase cloud services with more autonomy than they could traditionally order IT services in-house. Cloud budgets often fit within what an executive can expense within their own authority on a month-to-month basis, where as the same capital purchase would require more review and approval. So some companies tend toward the cloud because it makes them more agile by providing decisions they can make that are smaller and more easily fit within the scope of less-senior people.

kpil · on March 8, 2016

>individual business units can purchase cloud services with more autonomy

Yes, this is also probably true, as hardware costs are typically very noticeable as they are large but few. The old trick of splitting the bill works fine :-)

Anyway, I'm a control freak. I would very much want my teams to keep control of everything that is important to our core product and services, including servers and infrastructure - and everything that have to be ordered from another party - internal or not, actually more so if it's inhouse, is a potential delay or outage that is not fixed as soon as it is possible.

But then again, everyone should know their limits... :-)