Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The whole push to the cloud has always fascinated me. I get it - most people aren't interested in babysitting their own hardware. On the other hand, a business of just about any size that has any reasonable amount of hosting is better off with their own systems when it comes purely to cost.

All the pro-cloud talking points are just that - talking points that don't persuade anyone with any real technical understanding, but serve to introduce doubt to non-technical people and to trick people who don't examine what they're told.

What's particularly fascinating to me, though, is how some people are so pro-cloud that they'd argue with a writeup like this with silly cloud talking points. They don't seem to care much about data or facts, just that they love cloud and want everyone else to be in cloud, too. This happens much more often on sites like Reddit (r/sysadmin, even), but I wouldn't be surprised to see a little of it here.

It makes me wonder: how do people get so sold on a thing that they'll go online and fight about it, even when they lack facts or often even basic understanding?

I can clearly state why I advocate for avoiding cloud: cost, privacy, security, a desire to not centralize the Internet. The reason people advocate for cloud for others? It puzzles me. "You'll save money," "you can't secure your own machines," "it's simpler" all have worlds of assumptions that those people can't possibly know are correct.

So when I read something like this from Fastmail which was written without taking an emotional stance, I respect it. If I didn't already self-host email, I'd consider using Fastmail.

There used to be so much push for cloud everything that an article like this would get fanatical responses. I hope that it's a sign of progress that that fanaticism is waning and people aren't afraid to openly discuss how cloud isn't right for many things.



"All the pro-cloud talking points are just that - talking points that don't persuade anyone with any real technical understanding,"

This is false. AWS infrastructure is vastly more secure than almost all company data centers. AWS has a rule that the same person cannot have logical access and physical access to the same storage device. Very few companies have enough IT people to have this rule. The AWS KMS is vastly more secure than what almost all companies are doing. The AWS network is vastly better designed and operated than almost all corporate networks. AWS S3 is more reliable and scalable than anything almost any company could create on their own. To create something even close to it you would need to implement something like MinIO using 3 separate data centers.


> AWS infrastructure is vastly more secure than almost all company data centers

Secure in what terms? Security is always about a threat model and trade-offs. There's no absolute, objective term of "security".

> AWS has a rule that the same person cannot have logical access and physical access to the same storage device.

Any promises they make aren't worth anything unless there's contractually-stipulated damages that AWS should pay in case of breach, those damages actually corresponding to the costs of said breach for the customer, and a history of actually paying out said damages without shenanigans. They've already got a track record of lying on their status pages, so it doesn't bode well.

But I'm actually wondering what this specific rule even tries to defend against? You presumably care about data protection, so logical access is what matters. Physical access seems completely irrelevant no?

> Very few companies have enough IT people to have this rule

Maybe, but that doesn't actually mitigate anything from the company's perspective? The company itself would still be in the same position, aka not enough people to reliably separate responsibilities. Just that instead of those responsibilities being physical, they now happen inside the AWS console.

> The AWS KMS is vastly more secure than what almost all companies are doing.

See first point about security. Secure against what - what's the threat model you're trying to protect against by using KMS?

But I'm not necessarily denying that (at least some) AWS services are very good. Question is, is that "goodness" required for your use-case, is it enough to overcome its associated downsides, and is the overall cost worth it?

A pragmatic approach would be to evaluate every component on its merits and fitness to the problem at hand instead of going all in, one way or another.


Physical access is pretty relevant if you could bribe an engineer to locate some valuable data's physical location, then go service the particular machine, copy the disk (during servicing "degraded hardware"), and thus exflitrate the data without any traces of a breach.


> They've already got a track record of lying on their status pages, so it doesn't bode well.

???


Physical access and logical root access can't hide things form each other. It takes both to hide an activity. If you only have one, then the other can always be used to uncover or detect in the first place, or at least diagnose after.


OTOH:

1. big clouds are very lucrative targets for spooks, your data seem pretty likely to be hoovered up as "bycatch" (or maybe main catch depending on your luck) by various agencies and then traded around as currency

2. you never hear about security probems (incidents or exposure) in the platforms, there's no transparency

3. better than most coporate stuff is a low bar


>3. better than most corporate stuff is a low bar

I think it's a very relevant bar, though. The top level commenter made points about "a business of just about any size", which seems pretty exactly aligned with "most corporate stuff".


If you don't want your data to be accessible to "various agencies", don't share it with corporations, full stop. Corporations are obliged by law to make it available to the agencies, and the agencies often overreach, while the corporations almost never mind the overreach. There are limitations for stuff like health or financial data, but these are not impenetrable barriers.

I would just consider all your hosted data to be easily available to any security-related state agency; consider them already having a copy.


That depends where it's hosted and how it's encrypted. Cloud hosts can just reach into your RAM, but dedicated server hosts would need to provision that before deploying the server, and colocation providers would need to take your server offline to install it.


Colocated / Dedicated is not Cloud, AFAICT. It's the "traditional hosting", not elastic / auto-scalable. You of course may put your own, highly tamper-proof boxes in a colocation rack, and be reasonably certain that any attempt to exfiltrate data from them won't be invisible to you.

By doing so, you share nothing with your hosting provider, you only rent rack space / power / connectivity.


And this is why I colocate, because all the data that hits my server is my data.

Sure I do have an AUP/T&C but without proper warrant no one is allowed to touch my server.

Case is monitored if it's opened. Encrypted on start-up, USB disabled. I just wished I had my own /24.


At least you can get your own /48, at least if you're under RIPE.

You should only do it if you expect to multihome though, or you're doing some experimentation that absolutely needs a PI address. Please don't pollute the default-free zone just for no reason.


There's much variation by jurisdiction. Eg US based big-cloud companies would seem more risky here if you're from a country with traditionally less invasive (and less funded) spooks.


4. we keep hitting hypervisor bugs and having to work around the fact that your software coexists on the same machine with 3rdparty untrusted software who might in fact be actively trying to attack you. All this silliness with encrypted memory buses and the various debilitating workarounds for silicon bugs.

So yes, the cloud is very secure, except for the very thing that makes it the cloud that is not secure at all and has just been papered over because questioning it means the business model is bust.


What hypervisor bugs are you referring to? AWS does offer bare metal servers.


Most corporations (which is the vast majority of cloud users) absolutely don't care about spooks, sadly enough. If that's the threat model, then it's a very very rare case to care about it. Most datacenters/corporations won't even fight or care about sharing data with local spooks/cops/three letter agencies. The actual threat is data leaks, security breaches, etc.


> you never hear about security probems (incidents or exposure) in the platforms

Except that one time...

https://www.seattlemet.com/news-and-city-life/2023/04/how-a-...


If I remember right, the attacker’s AWS employment is irrelevant - no privileged AWS access was used in that case. The attacker working for AWS was a pure coincidence, it could’ve been anyone.


one of my greatest learnings in life is to differentiate between facts and opinions- sometimes opinions are presented as facts and vice-versa. if you think about it- the statement "this is false" is a response to an opinion (presented as a fact) but not a fact. there is no way one can objectively define and defend what does "real technical understanding" means. the cloud space is vast with millions of people having varied understanding and thus opinions.

so let's not fight the battle that will never be won. there is no point in convincing pro-cloud people that cloud isn't the right choice and vice-versa. let people share stories where it made sense and where it didn't.

as someone who has lived in cloud security space since 2009 (and was founder of redlock - one of the first CSPMs), in my opinion, there is no doubt that AWS is indeed superiorly designed than most corp. networks- but is that you really need? if you run entire corp and LOB apps on aws but have poor security practices, will it be right decision? what if you have the best security engineers in the world but they are best at Cisco type of security - configuring VLANS and managing endpoints but are not good at detecting someone using IMDSv1 in ec2 exposed to the internet and running a vulnerable (to csrf) app?

when the scope of discussion is as vast as cloud vs on-prem, imo, it is a bad idea to make absolute statements.


Great points. Also if you end up building your apps as rube goldberg machines living up to "AWS Well Architected" criteria (indoctrinated by staff lots of AWS certifications, leading to a lot of AWS certified staff whose paycheck now depends on following AWS recommended practices) the complexity will kill your security, as nobody will understand the systems anymore.


about security, most businesses using AWS invest little to nothing in securing their software, or even adopt basic security practices for their employees

having the most secure data center doesn't matter if you load your secrets as env vars in a system that can be easily compromised by a motivated attacker

so i don't buy this argument as a general reason pro-cloud


This exactly, most leaks don't involve any physical access. Why bother with something hard when you can just get in through an unmaintained Wordpress/SharePoint/other legacy product that some department can't live without.


The cloud is someone else’s computer.

It’s like putting something in someone’s desk drawer under the guise of convenience at the expense of security.

Why?

Too often, someone other than the data owner has or can get access to the drawer directly or indirectly.

Also, Cloud vs self hosted to me is a pendulum that has swung back and forth for a number of reasons.

The benefits of the cloud outlined here are often a lot of open source tech packaged up and sold as manageable from a web browser, or a command line.

One of the major reasons the cloud became popular was networking issues in Linux to manage volume at scale. At the time the cloud became very attractive for that reason, plus being able to virtualize bare metal servers to put into any combination of local to cloud hosting.

Self-hosting has become easier by an order of magnitude or two for anyone who knew how to do it, except it’s something people who haven’t done both self-hosting and cloud can really discuss.

Cloud has abstracted away the cost of horsepower, and converted it to transactions. People are discovering a fraction of the horsepower is needed to service their workloads than they thought.

At some point the horsepower got way beyond what they needed and it wasn’t noticed. But paying for a cloud is convenient and standardized.

Company data centres can be reasonably secured using a number of PaaS or IaaS solutions readily available off the shelf. Tools from VMware, Proxmox and others are tremendous.

It may seem like there’s a lot to learn, except most problems they are new to someone have often been thought of a ton by both people with and without experience that is beyond cloud only.


> The cloud is someone else’s computer.

And in the case of AWS it is someone else's extremely well designed and managed computer and network.


Extremely well designed? I doubt it.

Usually the larger the company and the more mission critical the product: the worse the implementation.

Twitch source code (which, I guess counts as Amazon already), Disney leaks- and my own experience working with very large companies. (Nokia, Ubisoft, Facebook, Activision/Blizzard).


Your comment tells me you have never read any of AWS many documents about how they engineer their components. They put an huge amount of effort into it. AWS is much more reliable that Azure. They have built the largest and most reliable storage system in the world with S3. AWS has stated that some customers have S3 buckets using over 1 million hard drives. Netflix relies heavily on AWS for its streaming services. Lyft runs its ride-sharing platform on AWS. Capital One migrated its entire infrastructure to AWS. Slack relies on AWS for its messaging platform. GE utilizes AWS for industrial IoT (Internet of Things) solutions, predictive maintenance, and data analytics. Twitch streams video to 31 million viewers from AWS.

https://www.amazon.science/publications/cloud-resource-prote...

https://www.amazon.science/tag/formal-verification

https://aws.amazon.com/security/provable-security/resources/

https://www.amazon.science/blog/custom-policy-checks-help-de...

https://www.amazon.science/publications/formal-verification-...

AWS is an industry leader in using formal methods and automated reasoning to prove the security and reliability of critical software and detect insecure configurations


Generally I look to people who could build an AWS on the value of it or doing it themselves because they can do both.

Happy to hear more.


One of the ways the NSA and security services get so much intelligence on targets isn't by direct decryption of what they are storing in data or listening in. A great deal with their intelligence is simply metadata intelligence. They watch what you do. They watch the amount of data you transport. They watch your patterns of movement.

So even if eight of us is providing direct security and encryption in the sense of what most security professionals are concerned with key strength etc etc etc, Eddie of us still has a great deal about of information about what you do, because they get to watch how much data moves from where to where and other information about what those machines are


> The cloud is someone else’s computer

Isn’t it more like leasing in a public property? Meaning it is yours as long as you are paying the lease? Analogous to renting an apartment instead of owning a condo?


Not at all. You can inspect the apartment you rent. The cloud is totally opaque in that regard.


Totally opaque is a really nice way to describe it.


Nope. It's literally putting private data in a shared drawer in someone else's desk where you have your area of the drawer.


Literally?

I would just like to point out that most of us who have ever had a job at an office, attended an academic institution, or lived in rented accommodation have kept stuff in someone else’s desk drawer from time to time. Often a leased desk in a building rented from a random landlord.

Keeping things in someone else’s desk drawer can be convenient and offer a sufficient level of privacy for many purposes.

And your proposed alternative to using ‘someone else’s desk drawer’ is, what, make your own desk?

I guess, since I’m not a carpenter, I can buy a flatpack desk from ikea and assemble it and keep my stuff in that. I’m not sure that’s an improvement to my privacy posture in any meaningful sense though.


It doesn’t have to be entirely literal, or not literal at all.

A single point of managed/shared access to a drawer doesn’t fit all levels of data sensitivity and security.

I understand this kind of wording and analogy might be triggering for the drive by down voters.

A comment like the above though allows both people to openly consider viewpoints that may not be theirs.

For me it shed light on something simpler.

Shared access to shared infrastructure is not always secure as we want to tell ourselves. It’s important to be aware when it might be security through abstraction.

The dual security and convenience of self-hosting IaaS and PaaS even at a dev, staging or small scale production has improved dramatically, and allows for things to be built in a cloud agnostic way to allow switching clouds to be much easier. It can also easily build a business case to lower cloud costs. Still, it doesn’t have to be for everyone either, where the cloud turns to be everything.

A small example? For a stable homeland - their a couple of usff small servers running proxmox or something residential fibre behind a tailscale or cloudflare funnel and compare the cost for uptime. It’s surprising how much time servers and apps spend idling.

Life and the real world is more than binary. Be it all cloud or no cloud.


> Keeping things in someone else’s desk drawer can be convenient and offer a sufficient level of privacy for many purposes.

Too torture a metaphor to death, are you going to keep your bank passwords in somebody else's desk drawer? Are you going to keep 100 million people's bank passwords in that drawer?

> I guess, since I’m not a carpenter, I can buy a flatpack desk from ikea and assemble it and keep my stuff in that. I’m not sure that’s an improvement to my privacy posture in any meaningful sense though.

If you're not a carpenter I would recommend you stay out of the business of building safe desk drawers all together. Although you should probably still be able to recognize that the desk drawer you own, that is inside your own locked house is a safer option then the one at the office accessible by any number of people.


If you have something physical of equivalent value to 100 million people's bank passwords, you may well not want to risk keeping it in a desk drawer at all, and instead want to look into renting a nice secure drawer from someone else to keep it in. That would be a safety deposit box.

Which I would argue is rather more like what cloud providers offer than 'someone else's desk drawer' is.


AWS is so complicated, we usually find more impactful permission problems than in any company using their own hardware


The other part is that when us-east-1 goes down, you can blame AWS, and a third of your customer's vendors will be doing the same. When you unplug the power to your colo rack while installing a new server, that's on you.


It's not always a full availability zone going down that is the problem. Also, despite the "no one ever got fired for buying Microsoft" logic, in practice I've never actually found stakeholders to be reassured by "its AWS and everyone is affected" when things are down. People want things back up and they want some informed answers about when that might happen, not "ehh its AWS, out of our control".


When there's little trust between the business and IT, both are incentivized to move to the cloud.

It's harder to build trust than the opposite.


OTOH, when your company's web site is down you can do something about it. When the CEO asks about it, you can explain why its offline and more importantly what is being done to bring it back.

The equivalent situation for those who took a cloud based approach is often... ¯\_(ツ)_/¯


The more relevant question is whether my efforts to do something lead to a better and faster result than my cloud providers efforts to do something. I get it - it feels powerless to do nothing, but for a lot of organizations I’ve seen the average downtime would still be higher.


I worked in IT for a state government and they had a partial outage of their Exchange server that lasted over 2 weeks. It triggered a full migration to Exchange online.


With the cloud, in a lot of cases you can have additional regions that incur very little cost as they scale dynamically with traffic. It’s hard to do that with on-prem. Also many AWS services come cross-AZ (AZ is a data center), so their arch is more robust than a single Colo server even if you’re in a single region.


Cross region from on-prem to the cloud for a website is easy. In fact, as long as you don't buy into "cloud native" ("cloud lock-in"?), it's probably more cost effective than two on-prem regions or two cloud regions.


Being able to choose from so many different Availability Zones in so many different regions is one of the best things about AWS. Combined with sophisticated routing strategies that Route 53 supports allows for some very effective designs.


When AWS goes down you can tell your boss that dozens of people are working to get it back up.


Hey boss, I go to sleep now, site should be up anytime. Cheers


Making API calls from a VM on shared hardware to KMS is vastly more secure than doing AES locally? I'm skeptical to say the least.


Encrypting data is easy, securely managing keys is the hard part. KMS is the Key Management Service. And AWS put a lot of thought and work into it.

https://docs.aws.amazon.com/kms/latest/cryptographic-details...


KMS access is granted by either environment variables or by authorizing the instance itself. Either way, if the instance is compromised, then so is access to KMS. So unless your threat model involves preventing the government from looking at your data through some theoretical sophisticated physical attack, then your primary concerns are likely the same as running a box in another physically secure location. So the same rules of needing to design your encryption scheme to minimize blowout from a complete hostile takeover still apply.


An attacker gaining temporary capability to encrypt/decrypt data through a compromised instance is painful. An attacker gaining a copy of a private key is still an entirely different world of pain.


Painful is an understatement. Keys for sensitive customer data should be derived from customer secrets either way. Almost nobody does that though, because it requires actual forethought. Instead they just slap secrets in KMS and pretend it's better than encrypted environment variables or other secrets services. If an attacker can read your secrets with the same level of penetration into your system, then it's all the same security wise.


There are many kinds of secrets that are used for purposes where they cannot be derived from customer secrets, and those still need to be secured. TLS private keys for example.

I do disagree on the second part - there’s a world of a difference whether an attacker obtains a copy of your certificates private key and can impersonate you quietly or whether they gain the capability to perform signing operations on your behalf temporarily while they maintain access to a compromised instance.


It's all unencrypted secrets from perspective of an attacker. If they somehow already have enough access to read your environment variables, then they can definitely access secrets manager records authorized for that service. By all means put secrets management in a secondary service to prevent leaking keys, but you don't need a cloud service to do that.


It's the same pain, since the resolution is the exact same. You have to rotate.


It's now been two years since I used KMS, but at the time it seemed little more than S3 API interface with Twitter size limitations

Fundamentally why would KMS be more secure than S3 anyway? Both ultimately have the same fundamental security requirements and do the same thing.

So the big whirlydoo is KMS has hardware keygen. im sorry, that sounds like something almost guaranteed to have nsa backdoor, or has so much nsa attention it has been compromised.


If your threat model is the NSA and you’re worried about backdoors then don’t use any cloud provider?

Maybe I’m just jaded from years doing this, but two things have never failed me for bringing me peace of mind in the infrastructure/ops world:

1. Use whatever your company has already committed to. Compare options and bring up tradeoffs when committing to a cloud-specific service(ie. AWS Lambdas) versus more generic solutions around cost, security and maintenance.

2. Use whatever feels right to you for anything else.

Preventing the NSA from cracking into your system is a fun thought exercise, but life is too short to make that the focus of all your hosting concerns


I guess since this is Hacker News, I shouldn’t be surprised that there are a bunch of commenters who are absolutely certain they and their random colo provider will do a better job of defeating the almighty NSA than AWS.

You won’t even know when they serve your Colo provider with a warrant under gag order, and I’m certain they’ll be able to bypass your own “tamper-proof” protections.


Soo..... you're saying that KMS hardware key generation isn't that great anyway...

so, again, why bother with KMS? What does it offer?

My point about the hardware was asking why KMS hardware key generation has any real value vs a software generated key, and then why bother with KMS and its limited secret size, and you access KMS with a policy/security user or role that can be used equally to lock down S3?

What is the value of KMS?


If the NSA is part of your threat model then good luck. I'm not sure any single company could withstand the NSA really trying to hack them for years. The threat of possible NSA backdoors is not a reasonable argument against a cloud provider as the NSA could also have backdoors in every CPU AMD and Intel and AWS makes.


You can securely store your asymmetric key for signing, but if I remember correctly the logs are pretty useless, basically you just know the key was used to make a signature, no option to log the signature or additional metadata, which would help auditing after an account/app compromise.


Taking for granted all these points. How many businesses out there actually need this kind of security/scalability, compared to how many use cloud services and pay extra cost for something they don't need?


From a critical perspective, your comment made me think about the risks posed by rogue IT personnel, especially at scale in the cloud. For example, Fastmail is a single point of failure as a DoS target, whereas attacking an entire datacenter can impact multiple clients simultaneously. It all comes down to understanding the attack vectors.


Cloud providers are very big targets but have enormous economic incentive to be secure and thus have very large teams of very competent security experts.


You can have full security competence but be a rogue actor at the same time.


You can also have rogue actors in your company, you don’t need 3rd parties for that


That doesn't sum up my comments in the thread. A rogue actor in a datacenter could attack zillions of companies at the same time while rogue actors in a single company only once.


And I bet AWS is also better at detecting rogue actors.


And I bet AWS is better at detecting them.


I don't understand what this is trying to say.


<citations needed>


AWS hires the same cretins that inhabit every other IT department, they just usually happen to be more technically capable. That doesn't make them any more or less trustworthy or reliable.


"cretins"?


This trivializes some real issues.

The biggest problem the cloud solves is hardware supply chain management. To realize the full benefits of doing your own build at any kind of non-trivial scale you will need to become an expert in designing, sourcing, and assembling your hardware. Getting hardware delivered when and where you need it is not entirely trivial -- components are delayed, bigger customers are given priority allocation, etc. The technical parts are relatively straightforward; managing hardware vendors, logistics, and delivery dates on an ongoing basis is a giant time suck. When you use the cloud, you are outsourcing this part of the work.

If you do this well and correctly then yes, you will reduce costs several-fold. But most people that build their own data infrastructure do a half-ass job of it because they (understandably) don't want to be bothered with any of these details and much of the nominal cost savings evaporate.

Very few companies do security as well as the major cloud vendors. This isn't even arguable.

On the other hand, you will need roughly the same number of people for operations support whether it is private data infrastructure or the cloud, there is little or no savings to be had here. The fixed operations people overhead scales to such a huge number of servers that it is inconsequential as a practical matter.

It also depends on your workload. The types of workloads that benefit most from private data infrastructure are large-scale data-intensive workloads. If your day-to-day is sling tens or hundreds of PB of data for analytics, the economics of private data infrastructure is extremely compelling.


> managing hardware vendors, logistics, and delivery dates on an ongoing basis is a giant time suck

You can rent servers and it's still not cloud.

I'm pretty neutral and definitely see the value of cloud. But a lot of cloud proponents seem to lack, what to me, seems like basic knowledge.


> don't want to be bothered with any of these details

Isn't the job to be bothered with the details? 90% of employment for most people is doing shit you don't really want to be doing, but that's the job.


<ctoHatTime> Dunno man, it's really really easy to set up an S3 and use it to share datasets for users authorized with IAM....

And IAM and other cloud security and management considerations is where the opex/capex and capability argument can start to break down. Turns out, the "cloud" savings comes from not having capabilities in house to manage hardware. Sometimes, for most businesses, you want some of that lovely reliability.

(In short, I agree with you, substantially).

Like code. It is easy to get something basic up, but substantially more resources are needed for non-trivial things.


I feel like IAM may be the sleeper killer-app of cloud.

I self-host a lot of things, but boy oh boy if I were running a company it would be a helluvalotta work to get IAM properly set up.


I strongly agree with this and also strongly lament it.

I find IAM to be a terrible implementation of a foundationally necessary system. It feels tacked on to me, except now it's tacked onto thousands of other things and there's no way out.


like terraform! isn't pulumi 100% better but there's no way out of terraform.


That's essentially why "platform engineering" is a hot topic. There are great FOSS tools for this, largely in the Kubernetes ecosystem.

To be clear, authentication could still be outsourced, but authorizing access to (on-prem) resources in a multi-tenant environment is something that "platforms" are frequently designed for.


My firm belief after building a service at scale (tens of millions of end users, > 100K tps) is that AWS is unbeatable. We don’t even think about building our own infrastructure. There’s no way we could ever make it reliable enough, secure enough, and future-proof enough to ever pay back the cost difference.

Something people neglect to mention when they tout their home grown cloud is that AWS spends significant cycles constantly eliminating technical debt that would absolutely destroy most companies - even ones with billion dollar services of their own. The things you rely on are constantly evolving and changing. It’s hard enough to keep up at the high level of a SaaS built on top of someone else’s bulletproof cloud. But imagine also having to keep up with the low level stuff like networking and storage tech?

No thanks.


I've done it. It's nowhere as complicated as you make it seem. It definitely doesn't kill - no more than failing to manage your software tech debt. In fact, the latter is both harder keep up with and more risky, because it changes faster than the low level stuff, to support business needs.

With the cloud you have IT/DevOps deal only with scaling the software components of the infra. When doing on-prem they take on the physical layer as well. Do you have enough trust in them to scale the physical part where needed?


...and power, backup power, HVAC, physical security...


Or buy colo space and they do it for you. It's not all cloud vs owning a datacenter - There's a thousand shades of ltgrey


> All the pro-cloud talking points... don't persuade anyone with any real technical understanding

This is a very engineer-centric take. The cloud has some big advantages that are entirely non-technical:

- You don't need to pay for hardware upfront. This is critical for many early-stage startups, who have no real ability to predict CapEx until they find product/market fit.

- You have someone else to point the SOC2/HIPAA/etc auditors at. For anyone launching a company in a regulated space, being able to checkbox your entire infrastructure based on AWS/Azure/etc existing certifications is huge.


You can over-provision your own baremetal resources 20x and it will be still cheaper than cloud. The capex talking point is just that, a talking point.


As an early-stage startup?

Your spend in the first year on AWS is going to be very close to zero for something like a SaaS shop.

Nor can you possibly scale in-house baremetal fast enough if you hit the fabled hockey stick growth. By the time you sign a colocation contract and order hardware, your day in the sun may be over.


> You have someone else to point the SOC2/HIPAA/etc auditors at.

I would assume you still need to point auditors to your software in any case


You do, which makes it very nice to not have to answer questions about the physical security of your servers.


Cloud expands the capabilities of what one team can manage by themselves, enabling them to avoid a huge amount of internal politics.

This is worth astronomical amounts of money in big corps.


I’m not convinced this is entirely true. The upfront cost if you don’t have the skills, sure – it takes time to learn Linux administration, not to mention management tooling like Ansible, Puppet, etc.

But once those are set up, how is it different? AWS is quite clear with their responsibility model that you still have to tune your DB, for example. And for the setup, just as there are Terraform modules to do everything under the sun, there are Ansible (or Chef, or Salt…) playbooks to do the same. For both, you _should_ know what all of the options are doing.

The only way I see this sentiment being true is that a dev team, with no infrastructure experience, can more easily spin up a lot of infra – likely in a sub-optimal fashion – to run their application. When it inevitably breaks, they can then throw money at the problem via vertical scaling, rather than addressing the root cause.


I think this is only true for teams and apps of a certain size.

I've worked on plenty of teams with relatively small apps, and the difference between:

1. Cloud: "open up the cloud console and start a VM"

2. Owned hardware: "price out a server, order it, find a suitable datacenter, sign a contract, get it racked, etc."

Is quite large.

#1 is 15 minutes for a single team lead.

#2 requires the team to agree on hardware specs, get management approval, finance approval, executives signing contracts. And through all this you don't have anything online yet for... weeks?

If your team or your app is large, this probably all averages out in favor of #2. But small teams often don't have the bandwidth or the budget.


I work for a 50 person subsidiary of a 30k person organisation. I needed a domain name. I put in the purchase request and 6 months later eventually gave up, bought it myself and expensed it.

Our AWS account is managed by an SRE team. It’s a 3 day turnaround process to get any resources provisioned, and if you don’t get the exact spec right (you forgot to specify the iops on the volume? Oops) 3 day turnaround. Already started work when you request an adjustment? Better hope as part of your initial request you specified backups correctly or you’re starting again.

The overhead is absolutely enormous, and I actually don’t even have billing access to the AWS account that I’m responsible for.


> 3 day turnaround process to get any resources provisioned

Now imagine having to deal with procurement to purchase hardware for your needs. 6 months later you have a server. Oh you need a SAN for object storage? There goes another 6 months.


At a previous job we had some decent on prem resources for internal services. The SRE guys had a bunch of extra compute and you would put in a ticket for a certain amount of resources (2 cpu, SSD, 8GB memory x2 on different hosts). There wasn’t a massive amount of variability between the hardware, and you just requested resources to be allocated from a bunch of hypervisors. Turnaround time was about 3 days too. Except, you were t required to be self sufficient in AWS terminology to request exactly what you needed .


> Our AWS account is managed by an SRE team.

That's an anti-pattern (we call it "the account") in the AWS architecture.

AWS internally just uses multiple accounts, so a team can get their own account with centrally-enforced guardrails. It also greatly simplifies billing.


That’s not something that I have control over or influence over.


Manageability of cloud without a dedicated resource is a form of resource creep, and shadow labour costs that aren’t factored in.

How many things don’t end up happening because of this? When they need a sliver of resources in the start?


You're assuming that hosting something in-house implies that each application gets its own physical server.

You buy a couple of beastly things with dozens of cores. You can buy twice as much capacity as you actually use and still be well under the cost of cloud VMs. Then it's still VMs and adding one is just as fast. When the load gets above 80% someone goes through the running VMs and decides if it's time to do some house cleaning or it's time to buy another host, but no one is ever waiting on approval because you can use the reserve capacity immediately while sorting it out.


The SMB I work for runs a small on-premise data center that is shared between teams and projects, with maybe 3-4 FTEs managing it (the respective employees also do dev and other work). This includes self-hosting email, storage, databases, authentication, source control, CI, ticketing, company wiki, chat, and other services. The current infrastructure didn’t start out that way and developed over many years, so it’s not necessarily something a small startup can start out with, but beyond a certain company size (a couple dozen employees or more) it shouldn’t really be a problem to develop that, if management shares the philosophy. I certainly find it preferable culturally, if not technically, to maximize independence in that way, have the local expertise and much better control over everything.

One (the only?) indisputable benefit of cloud is the ability to scale up faster (elasticity), but most companies don’t really need that. And if you do end up needing it after all, then it’s a good problem to have, as they say.


Your last paragraph identifies the reason that running their own hardware makes sense for Fastmail. The demand for email is pretty constant. Everyone does roughly the same amount of emailing every day. Daily load is predictable, and growth is predictable.

If your load is very spiky, it might make more sense to use cloud. You pay more for the baseline, but if your spikes are big enough it can still be cheaper than provisioning your own hardware to handle the highest loads.

Of course there's also possibly a hybrid approach, you run your own hardware for base load and augment with cloud for spikes. But that's more complicated.


I’ve never worked at a company with these particular problems, but:

#1: A cloud VM comes with an obligation for someone at the company to maintain it. The cloud does not excuse anyone from doing this.

#2: Sounds like a dysfunctional system. Sure, it may be common, but a medium sized org could easily have some datacenter space and allow any team to rent a server or an instance, or to buy a server and pay some nominal price for the IT team to keep it working. This isn’t actually rocket science.

Sure, keeping a fifteen year old server working safely is a chore, but so is maintaining a fifteen-year-old VM instance!


The cloud is someone else’s computer.

Having redirected of a vm provider or installing a hyper visor on equipment is another thing.


Obligation? Far from it. I've worked at some poorly staffed companies. Nobody is maintaining old VMs or container images. If it works, nobody touches it.

I worked at a supposedly properly staffed company that had raised 100's of millions in investment, and it was the same thing. VMs running 5 year old distros that hadn't been updated in years. 600 day uptimes, no kernel patches, ancient versions of Postgres, Python 2.7 code everywhere, etc. This wasn't 10 years ago. This was 2 years ago!


There is a large gap between "own the hardware" and "use cloud hosting". Many people rent the hardware, for example, and you can use managed databases which is one step up than "starting a vm".

But your comparison isn't fair. The difference between running your own hardware and using the cloud (which is perhaps not even the relevant comparison but let's run with it) is the difference between:

1. Open up the cloud console, and

2. You already have the hardware so you just run "virsh" or, more likely, do nothing at all because you own the API so you have already included this in your Ansible or Salt or whatever you use for setting up a server.

Because ordering a new physical box isn't really comparable to starting a new VM, is it?


I've always liked the theory of #2, I just haven't worked anywhere yet that has executed it well.


Before the cloud, you could get a VM provisioned (virtual servers) or a couple of apps set up (LAMP stack on a shared host ;)) in a few minutes over a web interface already.

"Cloud" has changed that by providing an API to do this, thus enabling IaC approach to building combined hardware and software architectures.


You have omitted the option between the two, which is renting a server. No hardware to purchase, maintain or set up. Easily available in 15 minutes.


While I did say "VM" in my original comment, to me this counts as "cloud" because the UI is functionally the same.


3. "Dedicated server" at any hosting provider

Open their management console, press order now, 15 mins later get your server's IP address.


For purposes of this discussion, isn't AWS just a very large hosting provider?

I.e. most hosting providers give you the option for virtual or dedicated hardware. So does Amazon (metal instances).

Like, "cloud" was always an ill-defined term, but in the case of "how do I provision full servers" I think there's no qualitative difference between Amazon and other hosting providers. Quantitative, sure.


> Amazon (metal instances)

But you still get nickel & dimed and pay insane costs, including on bandwidth (which is free in most conventional hosting providers, and overages are 90x cheaper than AWS' costs).


Qualitatively, AWS is greedy and nickle and dime you to death. Their Route53 service doesn't even have all the standard DNS options I need and I can get everywhere else or even on my own running bind9. I do not use IPv6 for several reasons, when AWS decided charge for IPv4, I went looking elsewhere to get my VM's.

I can't even imagine how much the US Federal Government is charging American taxpayers to pay AWS for hosting there, it has to be astronomical.


Out of curiosity, which DNS record types do you need that Route53 doesn't support?


More like 15 seconds.


You gave me flashbacks to a far worse bureaucratic nightmare with #2 in my last job.

I supported an application with a team of about three people for a regional headquarters in the DoD. We had one stack of aging hardware that was racked, on a handshake agreement with another team, in a nearby facility under that other team's control. We had to periodically request physical access for maintenance tasks and the facility routinely lost power, suffered local network outages, etc. So we decided that we needed new hardware and more of it spread across the region to avoid the shaky single-point-of-failure.

That began a three year process of: waiting for budget to be available for the hardware / license / support purchases; pitching PowerPoints to senior management to argue for that budget (and getting updated quotes every time from the vendors); working out agreements with other teams at new facilities to rack the hardware; traveling to those sites to install stuff; and working through the cybersecurity compliance stuff for each site. I left before everything was finished, so I don't know how they ultimately dealt with needing, say, someone to physically reseat a cable in Japan (an international flight away).


There is. Middle ground between the extremes of those pendulums of all cloud or physical metal.

You can start with using a cloud only for VMs and only run services on it using IaaS or PaaS. Very serviceable.


You can get pretty far without any of that fancy stuff. You can get plenty done by using parallel-ssh and then focusing on the actual thing you develop instead of endless tooling and docker and terraform and kubernetes and salt and puppet and ansible. Sure, if you know why you need them and know what value you get from them OK. But many people just do it because it's the thing to do...


Do you need those tools? It seems that for fundamental web hosting, you need your application server, nginx or similar, postgres or similar, and a CLI. (And an interpreter etc if your application is in an interpreted lang)


I suppose that depends on your RTO. With cloud providers, even on a bare VM, you can to some extent get away with having no IaC, since your data (and therefore config) is almost certainly on networked storage which is redundant by design. If an EC2 fails, or even if one of the drives in your EBS drive fails, it'll probably come back up as it was.

If it's your own hardware, if you don't have IaC of some kind – even something as crude as a shell script – then a failure may well mean you need to manually set everything up again.


All EBS volumes except io2 have advertised durability of 99.8%, which is pretty low, so don't count it in the magic networked storage category.


Get two servers (or three, etc)?


Well, sure – I was trying to do a comparison in favor of cloud, because the fact that EBS Volumes can magically detach and attach is admittedly a neat trick. You can of course accomplish the same (to a certain scale) with distributed storage systems like Ceph, Longhorn, etc. but then you have to have multiple servers, and if you have multiple servers, you probably also have your application load balanced with failover.


For fundamentals, that list is missing:

- Some sort of firewall or network access control. Being able to say "allow http/s from the world (optionally minus some abuser IPs that cause problems), and allow SSH from developers (by IP, key, or both)" at a separate layer from nginx is prudent. Can be ip/tables config on servers or a separate firewall appliance.

- Some mechanism of managing storage persistence for the database, e.g. backups, RAID, data files stored on fast network-attached storage, db-level replication. Not losing all user data if you lose the DB server is table stakes.

- Something watching external logging or telemetry to let administrators know when errors (e.g. server failures, overload events, spikes in 500s returned) occur. This could be as simple as Pingdom or as involved as automated alerting based on load balancer metrics. Relying on users to report downtime events is not a good approach.

- Some sort of CDN, for applications with a frontend component. This isn't required for fundamental web hosting, but for sites with a frontend and even moderate (10s/sec) hit rates, it can become required for cost/performance; CDNs help with egress congestion (and fees, if you're paying for metered bandwidth).

- Some means of replacing infrastructure from nothing. If the server catches fire or the hosting provider nukes it, having a way to get back to where you were is important. Written procedures are fine if you can handle long downtime while replacing things, but even for a handful of application components those procedures get pretty lengthy, so you start wishing for automation.

- Some mechanism for deploying new code, replacing infrastructure, or migrating data. Again, written procedures are OK, but start to become unwieldy very early on ('stop app, stop postgres, upgrade the postgres version, start postgres, then apply application migrations to ensure compatibility with new version of postgres, then start app--oops, forgot to take a postgres backup/forgot that upgrading postgres would break the replication stream, gotta write that down for net time...').

...and that's just for a very, very basic web hosting application--one that doesn't need caches, blob stores, the ability to quickly scale out application server or database capacity.

Each of those things can be accomplished the traditional way--and you're right, that sometimes that way is easier for a given item in the list (especially if your maintainers have expertise in that item)! But in aggregate, having a cloud provider handle each of those concerns tends to be easier overall and not require nearly as much in-house expertise.


I have never ever worked somewhere with one of these "cloud-like but custom on our own infrastructure" setups that didn't leak infrastructure concerns through the abstraction, to a significantly larger degree than AWS.

I believe it can work, so maybe there are really successful implementations of this out there, I just haven't seen it myself yet!


You are focusing on technology. And sure of course you can get most of the benefits of AWS a lot cheaper when self-hosting.

But when you start factoring internal processes and incompetent IT departments, suddenly that's not actually a viable option in many real-world scenarios.


Exactly. With the cloud you can suddenly do all the things your tyrannical Windows IT admin has been saying are impossible for the last 30 years.


It is similar to cooking at home vs ordering cooked food everyday. If some guarantees the taste & quality people would happy to outsource it.


All of that is... completely unrelated to the GP's post.

Did you reply to the right comment? Do you think "politics" is something you solve with Ansible?


> Cloud expands the capabilities of what one team can manage by themselves, enabling them to avoid a huge amount of internal politics.

It's related to the first part. Re: the second, IME if you let dev teams run wild with "managing their own infra," the org as a whole eventually pays for that when the dozen bespoke stacks all hit various bottlenecks, and no one actually understands how they work, or how to troubleshoot them.

I keep being told that "reducing friction" and "increasing velocity" are good things; I vehemently disagree. It might be good for short-term profits, but it is poison for long-term success.


> I keep being told that "reducing friction" and "increasing velocity" are good things

As always, good rules are good, and bad rules are bad.

Like most people on the internet, you are assuming only one of those sets exist. But you are just assuming a different set from everybody that you are criticizing.


Our big company locked all cloud resources behind a floating/company-wide DevOps team (git and CI too). We have an old on-prem server that we jealously guard because it allows us to create remotes for new git repos and deploy prototypes without consulting anyone.

(To be fair, I can see why they did it - a lot of deployments were an absolute mess before.)


This is absolutely spot on.

What do you mean, I can't scale up because I've used my hardware capex budget for the year?


I have said for years the value of cloud is mainly its api, thats the selling point in large enterprise.


Self-hosted software also has APIs, and Terraform libraries, and Ansible playbooks, etc. It’s just that you have to know what it is you’re trying to do, instead of asking AWS what collection of XaaS you should use.


Even as an Anti-Cloud ( Or more accurately Anti-everything Cloud ) person I still think there are many benefits to cloud. Just most of the them are over sold and people dont need it.

Number one is company bureaucracy and politics. No one wants to beg another person or department, go on endless meetings just to have extra hardware provisioned. For engineers that alone is worth perhaps 99% of all current cloud margins.

Number two is also company bureaucracy and politics. CFOs dont like CapX. Turning it into OpeX makes things easier for them. Along with end of year company budget turning into Cloud credits for different departments. Especially for companies with government fundings.

Number three is really company bureaucracy and politics. Dealing with either Google, AWS and Microsoft meant you no longer have to deal with dozens of different vendors from on server, networking hardware, software licenses etc. Instead it is all pre-approved into AWS, GCP or Azure. This is especially useful for things that involves Government contracts or fundings.

There are also things like instant worldwide deployment. You can have things up and running in any regions within seconds. And useful when you have site that gets 10 to 1000x the normal traffic from time to time.

But then a lot of small business dont have these sort of issues. Especially non-consumer facing services. Business or SaaS are highly unlikely to get 10x more customers within short period of time.

I continue to wish there is a middle ground somewhere. You rent dedicated server for cheap as base load and use cloud for everything else.


But isn't using Fastmail akin to using a cloud provider (managed email vs managed everything else)? They are similarly a service provider, and as a customer, you don't really care "who their ISP is?"

The discussion matters when we are talking about building things: whether you self-host or use managed services is a set of interesting trade-offs.


Yes, FastMail is a SAAS. But there adepts of a religion which would tell you that companies like FastMail should be built on top of AWS and it is the only true way. It is good to have some counter narrative to this.


Being cloud compatible (packaged well) can be as important as being cloud-agnostic (work on any cloud).

Too many projects become beholden to one cloud.


The fact is, managing your own hardware is a pita and a distraction from focusing on the core product. I loathe messing with servers and even opt for "overpriced" paas like fly, render, vercel. Because every minute messing with and monitoring servers is time not spent on product. My tune might change past a certain size and a massive cloud bill and there's room for full time ops people, but to offset their salary, it would have to be huge.


That argument makes sense for PaaS services like the ones you mention. But for bare "cloud" like AWS, I'm not convinced it is saving any effort, it's merely swapping one kind of complexity with another. Every place I've been in had full-time people messing with YAML files or doing "something" with the infrastructure - generally trying to work around the (self-inflicted) problems introduced by their cloud provider - whether it's the fact you get 2010s-era hardware or that you get nickel & dimed on absolutely arbitrary actions that have no relationship to real-world costs.


In what sense is AWS "bare cloud"? S3, DynamoDB, Lambda, ECS?


How do you configure S3 access control? You need to learn & understand how their IAM works.

How do you even point a pretty URL to a lambda? Last time I looked you need to stick an "API gateway" in front (which I'm sure you also get nickel & dimed for).

How do you go from "here's my git repo, deploy this on Fargate" with AWS? You need a CI pipeline which will run a bunch of awscli commands.

And I'm not even talking about VPCs, security groups, etc.

Somewhat different skillsets than old-school sysadmin (although once you know sysadmin basics, you realize a lot of these are just the same concepts under a branded name and arbitrary nickel & diming sprinkled on top), but equivalent in complexity.


How does one install and run Linux/BSD/another UNIX? One needs to learn and understand how a UNIX works.

The essence of the complaint that one has to have the knowledge of something before that something can be used. It seems like a reasonable expectation for just about anything in life.

(The API gateway in AWS is USD 2.35 for 10 million 32 kB requests, a Lambda can have its own private URL if required and Fargate does not deploy Git repos, it runs Docker images.)


> The essence of the complaint that one has to have the knowledge of something before that something can be used

My point was to disprove that "cloud" is simpler than conventional sysadmin - it is not, and it involves similar effort, complexity and manpower requirements.


I will have to disagree on that.

Cloud is simpler than conventional sysadmin, once its foundational principles are understood and the declarative approach to the cloud architecture is adopted. If I want to run a solution, cloud gives me just that – a platform that simply runs my solution and abstracts the sysadmin ugliness away.

I have experienced both sides, including UNIX kernel and system programming, and I don't want to even think about sysadmin unless I want to tinker with a UNIX box on a weekend as a leisure activity.


EC2


I would actually argue that EC2 is a "cloud smell"--if you're using EC2 you're doing it wrong.


Counterpoint: if you’re never “messing with servers,” you probably don’t have a great understanding of how their metrics map to those of your application’s, and so if you bottleneck on something, it can be difficult to figure out what to fix. The result is usually that you just pay more money to vertically scale.

To be fair, you did say “my tune might change past a certain size.” At small scale, nothing you do within reason really matters. World’s worst schema, but your DB is only seeing 100 QPS? Yeah, it doesn’t care.


I don’t think you’re correct. I’ve watched junior/mid-level engineers figure things out solely by working on the cloud and scaling things to a dramatic degree. It’s really not a rocket science.


I didn't say it's rocket science, nor that it's impossible to do without having practical server experience, only that it's more difficult.

Take disks, for example. Most cloud-native devs I've worked with have no clue what IOPS are. If you saturate your disk, that's likely to cause knock-on effects like increased CPU utilization from IOWAIT, and since "CPU is high" is pretty easy to understand for anyone, the seemingly obvious solution is to get a bigger instance, which depending on the application, may inadvertently solve the problem. For RDBMS, a larger instance means a bigger buffer pool / shared buffers, which means fewer disk reads. Problem solved, even though actually solving the root cause would've cost 1/10th or less the cost of bumping up the entire instance.


> Most cloud-native devs

You might be making some generalizations from your personal experience. Since 2015, at all of my jobs, everything has been running on some sort of a cloud. I'm yet to meet a person who doesn't understand IOPS. If I was a junior (and from my experience, that's what they tend to do), I'd just google "slow X potential reasons". You'll most likely see some references to IOPS and continue your research from there.

We've learned all these things one way or another. My experience started around 2007ish when I was renting out cheap servers from some hosting providers. Others might be dipping their feet into readily available cloud-infrastructure, and learning it from that end. Both works.


Anecdotal - but I once worked for a company where the product line I built for them after acquisition was delayed by 5 months because that's how long it took to get the hardware ordered and installed in the datacenter. Getting it up on AWS would have been a days work, maybe two.


Yes, it is death by 1000 cuts. Speccing, negotiating with hardware vendors, data center selection and negotiating, DC engineer/remote hands, managing security cage access, designing your network, network gear, IP address ranges, BGP, secure remote console access, cables, shipping, negotiating with bandwidth providers (multiple, for redundancy), redundant hardware, redundant power sources, UPS. And then you get to plug your server in. Now duplicate other stuff your cloud might provide, like offsite backups, recovery procedures, HA storage, geographic redundancy. And do it again when you outgrown your initial DC. Or build your own DC (power, climate, fire protection, security, fiber, flooring, racks)


Much of this is still required in cloud. Also, I think you're missing the middle ground where 99.99% of companies could happily exist indefinitely: colo. It makes little to no financial or practical sense for most to run their own data centers.


Oh, absolutely, with your own hardware you need planning. Time to deployment is definitely a thing.

Really, the one major thing that bites on cloud providers in there 99.9% margin on egress. The markup is insane.


Writing piles of IaC code like Terraform and CloudFormation is also a PITA and a distraction from focusing on your core product.

PaaS is probably the way to go for small apps.


A small app (or a larger one, for that matter) can quite easily run on infra that's instantiated from canned IaC, like TF AWS Modules [0]. If you can read docs, you should be able to quite trivially get some basic infra up in a day, even with zero prior experience managing it.

[0]: https://github.com/terraform-aws-modules


Yes, I've used several of these modules myself. They save tons of time! Unfortunately, for legacy projects, I inherited a bunch of code from individuals that built everything "by hand" then copy-pasted everything. No re-usability.


But that effort has a huge payoff in that it can be used to disaster recovery in a new region and to spin up testing environments.


I'm with you there, with stuff like fly.io, there's really no reason to worry about infrastructure.

AWS, on the other hand, seems about as time consuming and hard as using root servers. You're at a higher level of abstraction, but the complexity is about the same I'd say. At least that's my experience.


I agree with this position and actively avoid AWS complexity.


> every minute messing with and monitoring servers

You're not monitoring your deployments because "cloud"?


> All the pro-cloud talking points are just that - talking points that don't persuade anyone with any real technical understanding ...

And moreover most of the actual interesting things, like having VM templates and stateless containers, orchestration, etc. is very easy to run yourself and gets you 99.9% of the benefits of the cloud.

About just any and every service is available as container file already written for you. And if it doesn't exist, it's not hard to plumb up.

A friend of mine runs more than 700 containers (yup, seven hundreds), split over his own rack at home (half of them) and the other half on dedicated servers (he runs stuff like FlightRadar, AI models, etc.). He'll soon get his own IP addresses space. Complete "chaos monkey" ready infra where you can cut any cable and the thing shall keep working: everything is duplicated, can be spun up on demand, etc. Someone could still his entire rack and all his dedicated server, he'd still be back operational in no time.

If an individual can do that, a company, no matter its size, can do it too. And arguably 99.9% of all the companies out there don't have the need for an infra as powerful as the one most homelab enthusiast have.

And another thing: there's even two in-betweens between "cloud" and "our own hardware located at our company". First is colocating your own hardware but in a datacenter. Second is renting dedicated servers from a datacenter.

They're often ready to accept cloud-init directly.

And it's not hard. I'd say learning to configure hypervisors on bare metal, then spin VMs from templates, then running containers inside the VMs is actually much easier than learning all the idiosyncrasies of all the different cloud vendors APIs and whatnots.

Funnily enough when the pendulum swung way too far on the "cloud all the things" side, those saying at some point we'd read story about repatriation were being made fun of.


> If an individual can do that, a company, no matter its size, can do it too.

Fully agreed. I don't have physical HA – if someone stole my rack, I would be SOL – but I can easily ride out a power outage for as long as I want to be hauling cans of gasoline to my house. The rack's UPS can keep it up at full load for at least 30 minutes, and I can get my generator running and hooked up in under 10. I've done it multiple times. I can lose a single server without issue. My only SPOF is internet, and that's only by choice, since I can get both AT&T and Spectrum here, and my router supports dual-WAN with auto-failover.

> And arguably 99.9% of all the companies out there don't have the need for an infra as powerful as the one most homelab enthusiast have.

THIS. So many people have no idea how tremendously fast computers are, and how much of an impact latency has on speed. I've benchmarked my 12-year old Dells against the newest and shiniest RDS and Aurora instances on both MySQL and Postgres, and the only ones that kept up were the ones with local NVMe disks. Mine don't even technically have _local_ disks; they're NVMe via Ceph over Infiniband.

Does that scale? Of course not; as soon as you want geo-redundant, consistent writes, you _will_ have additional latency. But most smaller and medium companies don't _need_ that.


I hear this debate repeated often, and I think there's another important factor. It took me some time to figure out how to explain it, and the best I came up with was this: It is extremely difficult to bootstrap from zero to baseline competence, in general, and especially in an existing organization.

In particular, there is a limit to paying for competence, and paying more money doesn't automatically get you more competence, which is especially perilous if your organization lacks the competence to judge competence. In the limit case, this gets you the Big N consultancies like PWC or EY. It's entirely reasonable to hire PWC or EY to run your accounting or compliance. Hiring PWC or EY to run your software development lifecycle is almost guaranteed doom, and there is no shortage of stories on this site to support that.

In comparison, if you're one of these organizations, who don't yet have baseline competence in technology, then what the public cloud is selling is nothing short of magical: You pay money, and, in return, you receive a baseline set of tools, which all do more or less what they say they will do. If no amount of money would let you bootstrap this competence internally, you'd be much more willing to pay a premium for it.

As an anecdote, my much younger self worked in mid-sized tech team in a large household brand in a legacy industry. We were building out a web product that, for product reasons, had surprisingly high uptime and scalability requirements, relative to legacy industry standards. We leaned heavily on public cloud and CDNs. We used a lot of S3 and SQS, which allowed us to build systems with strong reliability characteristics, despite none of us having that background at the time.


Well cloud providers often give more than just VMs in a data enter somewhere. You may not be able to find good equivalents if you aren’t using the cloud. Some third-party products are also only available on clouds. How much of a difference those things make will depend on what you’re trying to do.

I think there are accounting reasons for companies to prefer paying opex to run things on the cloud instead of more capex-intensive self-hosting, but I don’t understand the dynamics well.

It’s certainly the case that clouds tend to be more expensive than self-hosting, even when taking account of the discounts that moderately sized customers can get, and some of the promises around elastic scaling don’t really apply when you are bigger.

To some of your other points: the main customers of companies like AWS are businesses. Businesses generally don’t care about the centralisation of the internet. Businesses are capable of reading the contracts they are signing and not signing them if privacy (or, typically more relevant to businesses, their IP) cannot be sufficiently protected. It’s not really clear to me that using a cloud is going to be less secure than doing things on-prem.


> All the pro-cloud talking points are just that - talking points that don't persuade anyone with any real technical understanding,(...)

This is where you lose all credibility.

I'm going to focus on a single aspect: performance. If you're serving a global user base and your business, like practically all online businesses, is greatly impacted by performance problems, the only solution to a physics problem is to deploy your application closer to your users.

With any cloud provider that's done with a few clicks and an invoice of a few hundred bucks a month. If you're running your hardware... What solution do you have to show for? Do you hope to create a corporate structure to rent a place to host your hardware manned by a dedicated team? What options f you have?


Is everyone running online FPS gaming servers now? If you want your page to load faster, tell your shitty frontend engineers to use less of the latest frameworks. You are not limited by physics, 99% aren't.

I ping HN, it's 150ms away, it still renders in the same time that the Google frontpage does and that one has a 130ms advantage.


Erm, 99%'s clearly wrong and I think you know it, even if you are falling into the typical trap of "only Americans matter"...

As someone in New Zealand, latency does really matter sometimes, and is painfully obvious at times.

HN's ping for me is around: 330 ms.

Anyway, ping doesn't really describe the latency of the full DNS lookup propogation, TCP connection establishment and TLS handshake: full responses for HN are around 900 ms for me till last byte.


> latency does really matter sometimes

Yes, sometimes.

You know what matters way more?

If you throw 12MBytes to the client in a multiple connections on multiple domains to display 1KByte of information. Eg: 'new' Reddit.


The complexity of scaling out an application to be closer to the users has never been about getting the hardware closer. It's always about how do you get the data there and dealing with the CAP theorem, which requires hard tradeoffs to be decided on when designing the application and can't be just tacked on - there is no magic button to do this, in the AWS console or otherwise.

Getting the hardware closer to the users has always been trivial - call up any of the many hosting providers out there and get a dedicated server, or a colo and ship them some hardware (directly from the vendor if needed).


> This is where you lose all credibility.

People who write that, well...

If you're greatly impacted by performance problems, how does that become a physics problem that has as a solution which is being closer to your users?

I think you're mixing up your sales points. One, how do you scale hardware? Simple: you buy some more, and/or you plan for more from the beginning.

How do you deal with network latency for users on the other side of the planet? Either you plan for and design for long tail networking, and/or you colocate in multiple places, and/or you host in multiple places. Being aware of cloud costs, problems and limitations doesn't mean you can't or shouldn't use cloud at all - it just means to do it where it makes sense.

You're making my point for me - you've got emotional generalizations ("you lose all credibility"), you're using examples that people use often but that don't even go together, plus you seem to forget that hardly anyone advocates for all one or all the other, without some kind of sensible mix. Thank you for making a good example of exactly what I'm talking about.


If have a global user base, depending on your workload, a simple CDN in front of your hardware can often go a long ways with minimal cost and complexity.


> If have a global user base, depending on your workload, a simple CDN in front of your hardware can often go a long ways with minimal cost and complexity.

Let's squint hard enough to pretend a CDN does not qualify as "the cloud". That alone requires a lot of goodwill.

A CDN distributes read-only content. Any usecase that requires interacting with a service is automatically excluded.

So, no.


> Any usecase that requires interacting with a service is automatically excluded

This isn't correct. Many applications consist of a mix of static and dynamic content. Even dynamic content is often cacheable for a time. All of this can be served by a CDN (using TTLs) which is a much simpler and more cost effective solution than multi-region cloud infra, with the same performance benefits.


I have about 30 years as a linux eng, starting with openbsd and have spent a LOT of time with hardware building webhosts and CDNs until about 2020 where my last few roles have been 100% aws/gcloud/heroku.

I love building the cool edge network stuff with expensive bleeding edge hardware, smartnics, nvmeOF, etc but its infinitely more complicated and stressful than terraforming an AWS infra. Every cluster I set up I had to interact with multiple teams like networking, security, storage sometimes maintenance/electrical, etc. You've got some random tech you have to rely on across the country in one of your POPs with a blown server. Every single hardware infra person has had a NOC tech kick/unplug a server at least once if they've been in long enough.

And then when I get the hardware sometimes you have different people doing different parts of setup, like NOC does the boot, maybe boostraps the hardware with something that works over ssh before an agent is installed (ansible, etc), then your linux eng invokes their magic with a ton of bash or perl, then your k8s person sets up the k8s clusters with usually something like terraform/puppet/chef/salt probably calling helm charts. Then your monitoring person gets it into OTEL/grafana, etc. This all organically becomes more automated as time goes on, but I've seen it from a brand new infra where you've got no automation many times.

Now you're automating 90% of this via scripts and IAC, etc, but you're still doing a lot of tedious work.

You also have a much more difficult time hiring good engineers. The markets gone so heavily AWS (I'm no help) that its rare that I come across an ops resume that's ever touched hardware, especially not at the CDN distributed systems level.

So.. aws is the chill infra that stays online and you can basically rely on 99.99something%. Get some terraform blueprints going and your own developers can self serve. Don't need hardware or ops involved.

And none of this is even getting into supporting the clusters. Failing clusters. Dealing with maintenance, zero downtime kernel upgrades, rollbacks, yaddayadda.


This 1000%. There are so many cool networking/virtualization/hardware things I love dealing with. But the stress of doing ceph upgrades isn't the right trade off usually.


Most companies severely understaff ops, infra, and security. Your talking points might be good but, in practice, won’t apply in many cases because of the intractability of that management mindset. Even when they should know better.

I’ve worked at tech companies with hundreds of developers and single digit ops staff. Those people will struggle to build and maintain mature infra. By going cloud, you get access to mature infra just by including it in build scripts. Devops is an effective way to move infra back to project teams and cut out infra orgs (this isn’t great but I see it happen everywhere). Companies will pay cloud bills but not staffing salaries.


It's the exact same reason why most companies don't just run their own power stations, and instead buy it from a power company.

Computation has become a utility these days - this includes the fat ISP lines and connectivity etc, not just the CPU and harddrives. These things have economies of scale that smaller companies cannot truly reach, and will pay a huge fixed cost if they want state of the art management, monitoring and redundancy. So unless you are a massive consumer, just like power stations, you really don't need nor want to build your own.


Using a commercial cloud provider only cements understaffing in, in too many cases.


There is a whole ecosystem that pushes cloud to ignorant/fresh graduates/developers. Just take a look at the sponsors for all the most popular frameworks. When your system is super complex and depends on the cloud they make more money. Just look at the PHP ecosystem, Laravel needs 4 times the servers to server something that a pure PHP system would need. Most projects don't need the cloud. Only around 10% of projects actually need what the cloud provides. But they were able to brainwash a whole generation of developers/managers to think that they do. And so it goes.


Having worked with Laravel, this is absolutely bull.


>What's particularly fascinating to me, though, is how some people are so pro-cloud that they'd argue with a writeup like this with silly cloud talking points. They don't seem to care much about data or facts, just that they love cloud and want everyone else to be in cloud, too.

The irony is absolutely dripping off this comment, wow.

Commenter makes emotionally charge comment with no data or facts and decries anyone who disagrees with them as "silly talking points" for not caring about data and facts.

Your comment is entirely talking about itself.


My take on this whole cloud fatigue is that system maintenance got overly complex over the last couple years/decades. So much that management people now think that it's too expensive in terms of hiring people that can do it compared to the higher managed hosting costs.

DevOps and kubernetes come to mind. A lot of people using kubernetes don't know what they're getting into, and k0s or another single machine solution would have been enough for 99% of SMEs.

In terms of cyber security (my field) everything got so ridiculously complex that even the folks that use 3 different dashboards in parallel will guess the answers as to whether or not they're affected by a bug/RCE/security flaw/weakness because all of the data sources (even the expensively paid for ones) are human-edited text databases. They're so buggy that they even have Chinese idiom symbols instead of a dot character in the version fields without anyone ever fixing it upstream in the NVD/CVE process.

I started to build my EDR agent for POSIX systems specifically, because I hope that at some point this can help companies to ditch the cloud and allows them to selfhost again - which in return would indirectly prevent 13 year old kids like from LAPSUS to pwn major infrastructure via simple tech support hotline calls.

When I think of it in terms of hosting, the vertical scalability of EPYC machines is so high that most of the time when you need its resources you are either doing something completely wrong and you should refactor your code or you are a video streaming service.


There was a time when cloud was significantly cheaper then owning.

I'd expect that there are people who moved to the cloud then, and over time started using services offered by their cloud provider (e.g., load balancers, secret management, databases, storage, backup) instead of running those services themselves on virtual machines, and now even if it would be cheaper to run everything on owned servers they find it would be too much effort to add all those services back to their own servers.


The cloud wasn’t about cheap, it was about fast. If you’re VC funded, time is everything, and developer velocity above all else to hyperscale and exit. That time has passed (ZIRP), and the public cloud margin just doesn’t make sense when you can own and operate (their margin is your opportunity) on prem with similar cloud primitives around storage and compute.

Elasticity is a component, but has always been from a batch job bin packing scheduling perspective, not much new there. Before k8s and Nomad, there was Globus.org.

(Infra/DevOps in a previous life at a unicorn, large worker cluster for a physics experiment prior, etc; what is old is a new again, you’re just riding hype cycle waves from junior to retirement [mainframe->COTS on prem->cloud->on prem cloud, and so on])


That was never true except in the case that the required hardware resources were significantly smaller than a typical physical machine.


1. People are credulous

2. People therefore repeat talking points which seem in their interest

3. With enough repetition these become their beliefs

4. People will defend their beliefs as theirs against attack

5. Goto 1


The one convincing argument from technical people I saw, that would be repeated to your comment, is that by now, you dont find enough experienced engineers to reliably setup some really big systems. Because so much went to the cloud, a lot of the knowledge is buried there.

That came from technical people who I didn't perceive as being dogmatically pro-cloud.


I think part of it was a way for dev teams to get an infra team that was not empowered to say no. Plus organizational theory, empire building, etc.


Yep. I had someone tell me last week that they didn't want a more rigid schema because other teams rely on it, and anything adding "friction" to using it would be poorly received.

As an industry, we are largely trading correctness and performance for convenience, and this is not seen as a negative by most. What kills me is that at every cloud-native place I've worked at, the infra teams were both responsible for maintaining and fixing the infra that product teams demanded, but were not empowered to push back on unreasonable requests or usage patterns. It's usually not until either the limits of vertical scaling are reached, or a SEV0 occurs where these decisions were the root cause does leadership even begin to consider changes.


It seems that the preference is less about understanding or misunderstanding the technical requirements but more that it moves a capital expenditure with some recurring operational expenditure entirely into the opex column.


Cloud solves one problem quite well: Geographic redundancy. It's extremely costly with on-prem.


Only if you’re literally running your own datacenters, which is in no way required for the majority of companies. Colo giants like Equinix already have the infrastructure in place, with a proven track record.

If you enable Multi-AZ for RDS, your bill doubles until you cancel. If you set up two servers in two DCs, your initial bill doubles from the CapEx, and then a very small percentage of your OpEx goes up every month for the hosting. You very, very quickly make this back compared to cloud.


But reliable connectivity between regions/datacenters remains a challenge, right? Compute is only one part of the equation.

Disclaimer: I work on a cloud networking product.


It depends on how deep you want to go. Equinix for one (I'm sure others as well, but I'm most familiar with them) offers managed cross-DC fiber. You will probably need to manage the networking, to be fair, and I will readily admit that's not trivial.


I use Wireguard, pretty simple, where's the challenge?


I am referring to the layer 3 connectivity that Wireguard is running on top of. Depending on your use case and reliability and bandwidth requirements, routing everything over the “public” internet won’t cut it.

Not to mention setting up and maintaining your physical network as the number of physical hosts you’re running scales.


Except, almost nobody, outside of very large players, does cross region redundancy. us-east-1 is like a SPOF for the entire Internet.


Cloud noob here. But if I have a central database what can I distribute across geographic regions? Static assets? Maybe a cache?


Yep. Cross-region RDBMS is a hard problem, even when you're using a managed service – you practically always have to deal with eventual consistency, or increased latency for writes.


Does it? I've seen outages around "Sorry, us-west_carolina-3 is down". AWS is particularly good at keeping you aware of their datacenters.


It can be useful. I run a latency sensitive service with global users. A cloud lets me run it in 35 locations dealing with one company only. Most of those locations only have traffic to justify a single, smallish, instance.

In the locations where there's more traffic, and we need more servers, there are more cost effective providers, but there's value in consistency.

Elasticity is nice too, we doubled our instance count for the holidays, and will return to normal in January. And our deployment style starts a whole new cluster, moves traffic, then shuts down the old cluster. If we were on owned hardware, adding extra capacity for the holidays would be trickier, and we'd have to have a more sensible deployment method. And the minimum service deployment size would probably not be a little quad processor box with 2GB ram.

Using cloud for the lower traffic locations and a cost effective service for the high traffic locations would probably save a bunch of money, but add a lot of deployment pain. And a) it's not my decision and b) the cost difference doesn't seem to be quite enough to justify the pain at our traffic levels. But if someone wants to make a much lower margin, much simpler service with lots of locations and good connectivity, be sure to post about it. But, I think the big clouds have an advantage in geographic expansion, because their other businesses can provide capital and justification to build out, and high margins at other locations help cross subsidize new locations when they start.


I agree it can be useful (latency, availability, using off-peak resources), but running globally should be a default and people should opt-in into fine-grained control and responsibility.

From outside it seems that either AWS picked the wrong default to present their customers, or that it's unreasonably expensive and it drives everyone into the in-depth handling to try to keep cloud costs down.


if you see that you are doing it wrong :)


AWS has had multiple outages which were caused by a single AZ failing.


Yup, I was referring to, I guess, one of these,

- https://news.ycombinator.com/item?id=29473630: (2021-12-07) AWS us-east-1 outage

- https://news.ycombinator.com/item?id=29648286: (2021-12-22) Tell HN: AWS appears to be down again

Maybe things are better now, but it became apparent that people might be misusing cloud providers or betting that things work flawlessly even if they completely ignore AZs.


My company used to do everything on-prem. Until a literal earthquake and tsunami took down a bunch of systems.

After that, yeah we’ll let AWS do the hard work of enabling redundancy for us.


> It makes me wonder: how do people get so sold on a thing that they'll go online and fight about it, even when they lack facts or often even basic understanding?

I feel like this can be applied to anything.

I had a manager take one SAFe for Leaders class then came back wanting to implement it. They had no previous AGILE classes or experience. And the Enterprise Agile Office was saying DON'T USE SAFe!!

But they had one class and that was the only way they would agree to structure their group.


The problem with your claims here is they can only be right if the entire industry is experiencing mass psychosis. I reject a theory that requires that, because my ego just isn't that large.

I once worked for several years at a publicly traded firm well-known for their return-to-on-prem stance, and honestly it was a complete disaster. The first-party hardware designs didn't work right because they didn't have the hardware designs staffing levels to have de-risked to possibility that AMD would fumble the performance of Zen 1, leaving them with a generation of useless hardware they nonetheless paid for. The OEM hardware didn't work right because they didn't have the chops to qualify it either, leaving them scratching their heads for months over a cohort of servers they eventually discovered were contaminated with metal chips. And, most crucially, for all the years I worked there, the only thing they wanted to accomplish was failover from West Coast to East Coast, which never worked, not even once. When I left that company they were negotiating with the data center owner who wanted to triple the rent.

These experiences tell me that cloud skeptics are sometimes missing a few terms in their equations.


"Vendor problems" is a red herring, IMO; you can have those in the cloud, too.

It's been my experience that those who can build good, reliable, high-quality systems, can do so either in the cloud or on-prem, generally with equal ability. It's just another platform to such people, and they will use it appropriately and as needed.

Those who can only make it work in the cloud are either building very simple systems (which is one place where the cloud can be appropriate), or are building a house of cards that will eventually collapse (or just cost them obscene amounts of money to keep on life support).

Engineering is engineering. Not everyone in the business does it, unfortunately.

Like everything, the cloud has its place -- but don't underestimate the number of decisions that get taken out of the hands of technical people by the business people who went golfing with their buddy yesterday. He just switched to Azure, and it made his accountants really happy!

The whole CapEx vs. OpEx issue drives me batty; it's the number one cause of cloud migrations in my career. For someone who feels like spent money should count as spent money regardless of the bucket it comes out of, this twists my brain in knots.

I'm clearly not a finance guy...


> or are building a house of cards that will eventually collapse (or just cost them obscene amounts of money to keep on life support)

Ding ding ding. It's this.

> The whole CapEx vs. OpEx issue drives me batty

Seconded. I can't help but feel like it's not just a "I don't understand money" thing, but more of a "the way Wall Street assigns value is fundamentally broken." Spending $100K now, once, vs. spending $25K/month indefinitely does not take a genius to figure out.


> Spending $100K now, once, vs. spending $25K/month indefinitely does not take a genius to figure out.

If you multiply your month payment for 1/i, where i is the interest rate your business can get, you will get how much of up-front money it's worth.

... that is, until next month, when the interest rate will change, a fact that always catches everyone by surprise, and you'll need to rush to fix your cash-flow.

So, yeah, I don't understand that either. Somehow, despite neither of us understanding how it can possibly work, it seems to fail to work empirically too, adding a huge amount of instability to companies.

That is, unless you decide to look at it from the perspective of executive bonuses, that are capped to 0, but can grow indefinitely. So instability is the point.


you forgot cogs

it's all about painting the right picture for your investors, so you make up shit and classify as cogs or opex depending on what is most beneficial for you in the moment


> The problem with your claims here is they can only be right if the entire industry is experiencing mass psychosis.

Yes. Mass psychosis explains an incredible number of different and apparently unrelated problems with the industry.


There's however a middle-ground between run your own colocated hardware and cloud. It's called "dedicated" servers and many hosting providers (from budget bottom-of-the-barrel to "contact us" pricing) offer it.

Those take on the liability of sourcing, managing and maintaining the hardware for a flat monthly fee, and would take on such risk. If they make a bad bet purchasing hardware, you won't be on the hook for it.

This seems like a point many pro-cloud people (intentionally?) overlook.


> The problem with your claims here is they can only be right if the entire industry is experiencing mass psychosis.

What's the market share of Windows again? ;)


You're proving their point though. Considering that there are tons of reasons to use windows, some people just don't see them and think that everyone else is crazy :^) (I know you're joking but some people actually unironically have the same sentiment)


> a desire to not centralize the Internet

> If I didn't already self-host email

this really says all that needs to be said about your perspective. you have an engineer and OSS advocate's mindset. which is fine, but most business leaders (including technical leaders like CTOs) have a business mindset, and their goal is to build a business that makes money, not avoid contributing to the centralization of the internet


> On the other hand, a business of just about any size that has any reasonable amount of hosting is better off with their own systems when it comes purely to cost

From a cost PoV, sure, but when you're taking money out of capex it represents a big hit to the cash flow, while taking out twice that amount from opex has a lower impact on the company finances.


Cloud is more than instances. If all you need is a bunch of boxes, then cloud is a terrible fit.

I use AWS cloud a lot, and almost never use any VMs or instances. Most instances I use are along the lines of a simple anemic box for a bastion host or some such.

I use higher level abstractions (services) to simplify solutions and outsource maintenance of these services to AWS.


They spent time and career points learning cloud things and dammit it's going to matter!

You can't even blame them too much, the amount of cash poured into cloud marketing is astonishing.


The thing that frustrates me is it’s possible to know how to do both. I have worked with multiple people who are quite proficient in both areas.

Cloud has definite advantages in some circumstances, but so does self-hosting; moreover, understanding the latter makes the former much, much easier to reason about. It’s silly to limit your career options.


Being good at both is twice the work, because even if some concepts translate well, IME people won't hire someone based on that. "Oh you have experience with deploying RabbitMQ but not AWS SQS? Sorry, we're looking for someone more qualified."


That's a great filter for places I don't want to work at, then.


I want to see an article like this, but written from a Fortune 500 CTO perspective

It seems like they all abandoned their VMware farms or physical server farms for Azure (they love Microsoft).

Are they actually saving money? Are things faster? How's performance? What was the re-training/hiring like?

In one case I know we got rid of our old database greybeards and replaced them with "DevOps" people that knew nothing about performance etc

And the developers (and many of the admins) we had knew nothing about hardware or anything so keeping the physical hardware around probably wouldn't have made sense anyways


Complicating this analysis is that computers have still been making exponential improvements in capability as clouds became popular (e.g. disks are 1000-10000x faster than they were 15 years ago), so you'd naturally expect things to become easier to manage over time as you need fewer machines, assuming of course that your developers focus on e.g. learning how to use a database well instead of how to scale to use massive clusters.

That is, even if things became cheaper/faster, they might have been even better without cloud infrastructure.


>we got rid of our old database greybeards and replaced them with "DevOps" people that knew nothing about performance etc

Seems a lot of those DevOps people just see Azures recommendations for adding indexes and either just allow auto applying them or just adding them without actually reviewing it understanding what use loads require them and why. This also lands a bit on developers/product that don't critically think about and communicate what queries are common and should have some forethought on what indexes should be beneficial and created. (Yes followup monitoring of actual index usage and possible missing indexes is still needed.) Too many times I've seen dozens of indexes on tables in the cloud where one could cover all of them. Yes, there still might be worthwhile reasons to keep some narrower/smaller indexes but again DBA and critical query analysis seems to be a forgotten and neglected skill. No one owns monitoring and analysing db queries and it only comes up after a fire has already broken out.


The real cost wins of self-hosted are that anything using new hardware becomes an ordeal, and engineers won't use high-cost, value-added services. I agree that there's often too little restraint in cloud architectures, but if a business truly believes in a project, it shouldn't be held up for six months waiting for server budget with engineers spending doing ops work to get three nines of DB reliability.

There is a size where self-hosting makes sense, but it's much larger than you think.


Also, by the way, I found it interesting that you framed your side of this disagreement as the technically correct one, but then included this:

> a desire to not centralize the Internet

This is an ideological stance! I happen to share this desire. But you should be aware of your own non-technical - "emotional" - biases when dismissing the arguments of others on the grounds that they are "emotional" and+l "fanatical".


I never said that my own reasons were neither personal nor emotional. I was just pointing out that my reasons are easy to articulate.

I do think it's more than just emotional, though, but most people, even technical people, haven't taken the time to truly consider the problems that will likely come with centralization. That's a whole separate discussion, though.


...but your post reads like you do have an emotional reaction to this question and you're ready to believe someone who shares your views.

There's not nearly enough in here to make a judgment about things like security or privacy. They have the bare minimum encryption enabled. That's better than nothing. But how is key access handled? Can they recover your email if the entire cluster goes down? If so, then someone has access to the encryption keys. If not, then how do they meet reliability guarantees?

Three letter agencies and cyber spies like to own switches and firewalls with zero days. What hardware are they using, and how do they mitigate against backdoors? If you really cared about this you would have to roll your own networking hardware down to the chips. Some companies do this, but you need to have a whole lot of servers to make it economical.

It's really about trade-offs. I think the big trade-offs favoring staying off cloud are cost (in some applications), distrust of the cloud providers,and avoiding the US Government.

The last two are arguably judgment calls that have some inherent emotional content. The first is calculable in principle, but people may not be using the same metrics. For example if you don't care that much about security breaches or you don't have to provide top tier reliability, then you can save a ton of money. But if you do have to provide those guarantees, it would be hard to beat Cloud prices.


> What's particularly fascinating to me, though, is how some people are so pro-cloud that they'd argue with a writeup like this with silly cloud talking points.

I’m sure I’ll be downvoted to hell for this, but I’m convinced that it’s largely their insecurities being projected.

Running your own hardware isn’t tremendously difficult, as anyone who’s done it can attest, but it does require a much deeper understanding of Linux (and of course, any services which previously would have been XaaS), and that’s a vanishing trait these days. So for someone who may well be quite skilled at K8s administration, serverless (lol) architectures, etc. it probably is seen as an affront to suggest that their skill set is lacking something fundamental.


> So for someone who may well be quite skilled at K8s administration ...

And running your own hardware is not incompatible with Kubernetes: on the contrary. You can fully well have your infra spin up VMs and then do container orchestration if that's your thing.

And part your hardware monitoring and reporting tool can work perfectly fine from containers.

Bare metal -> Hypervisor -> VM -> container orchestration -> a container running a "stateless" hardware monitoring service. And VMs themselves are "orchestrated" too. Everything can be automated.

Anyway say a harddisk being to show errors? Notifications being sent (email/SMS/Telegram/whatever) by another service in another container, dashboard shall show it too (dashboards are cool).

Go to the machine once the spare disk as already been resilvered, move it where the failed disk was, plug in a new disk that becomes the new spare.

Boom, done.

I'm not saying all self-hosted hardware should do container orchestration: there are valid use cases for bare metal too.

But something as to be said about controlling everything on your own infra: from the bare metal to the VMs to container orchestration. To even potentially your own IP address space.

This is all within reach of an individual, both skill-wise and price-wise (including obtaining your own IP address space). People who drank the cloud kool-aid should ponder this and wonder how good their skills truly are if they cannot get this up and working.


Fully agree. And if you want to take it to the next level (and have a large budget), Oxide [0] seems to have neatly packaged this into a single coherent product. They don't quite have K8s fully running, last I checked, but there are of course other container orchestration systems.

> Go to the machine once the spare disk as already been resilvered

Hi, fellow ZFS enthusiast :-)

[0]: https://oxide.computer


> And running your own hardware is not incompatible with Kubernetes: on the contrary

Kubernetes actually makes so much more sense on bare-metal hardware.

On the cloud, I think the value prop is dubious - your cloud provider is already giving you VMs, why would you need to subdivide them further and add yet another layer of orchestration?

Not to mention that you're getting 2010s-era performance on those VMs, so subdividing them is terrible from a performance point of view too.


> Not to mention that you're getting 2010s-era performance on those VMs, so subdividing them is terrible from a performance point of view too.

I was trying in vain to explain to our infra team a couple of weeks ago why giving my team a dedicated node of a newer instance family with DDR5 RAM would be beneficial for an application which is heavily constrained by RAM speed. People seem to assume that compute is homogenous.


I would wager that the same kind of people that were arguing against your request for a specific hardware config are the same ones in this comment section railing against any sort of self-sufficiency by hosting it yourself on hardware. All they know is cloud, all they know how to do is "ScAlE Up thE InStanCE!" when shit hits the fan. It's difficult to argue against that and make real progress. I understand your frustration completely.


I agree, I run PROD, TEST and DEV kube clusters all in VM's, works great.


In the public sector, cloud solves the procurement problem. You just need to go through the yearlong process once to use a cloud service, instead of for each purchase > 1000€.


Capital expenditures are kryptonite to financial engineers. The cloud selling point was to trade those costs for operational expenses and profit in phase 3.


As someone who ran a startup with 100’s of hosts. As soon as I start to count the salaries, hiring, desk space, etc of the people needed to manage the hosts AWS would look cheap again. Yea, hardware costs they are aggressively expensive. But TCO wise, they’re cheap for any decent sized company.

Add in compliance, auditing, etc. all things that you can set up out of the box (PCI, HIPPA, lawsuit retention). Gets even cheaper.


I'm curious about what "reasonable amount of hosting" means to you, because from my experience, as your internal network's complexity goes up, it's far better for your to move systems to a hyperscaler. The current estimate is >90% of Fortune 500 companies are cloud-based. What is it that you know that they don't?


The bottom line > babysitting hardware. Businesses are transitioning to cloud because it's better for business.


Actually, there's been a reversal trend going on, for many companies, better is often on premises or hybrid now.


> If I didn't already self-host email, I'd consider using Fastmail.

Same sentiment all of what you said.


> how do people get so sold on a thing that they'll go online and fight about it, even when they lack facts or often even basic understanding?

Are you new to the internet?


> All the pro-cloud talking points are just that - talking points that don't persuade anyone with any real technical understanding, but serve to introduce doubt to non-technical people and to trick people who don't examine what they're told.

This feels like "no true scotsman" to me. I've been building software for close to two decades, but I guess I don't have "any real technical understanding" because I think there's a compelling case for using "cloud" services for many (honestly I would say most) businesses.

Nobody is "afraid to openly discuss how cloud isn't right for many things". This is extremely commonly discussed. We're discussing it right now! I truly cannot stand this modern innovation in discourse of yelling "nobody can talk about XYZ thing!" while noisily talking about XYZ thing on the lowest-friction publishing platforms ever devised by humanity. Nobody is afraid to talk about your thing! People just disagree with you about it! That's ok, differing opinions are normal!

Your comment focuses a lot on cost. But that's just not really what this is all about. Everyone knows that on a long enough timescale with a relatively stable business, the total cost of having your own infrastructure is usually lower than cloud hosting.

But cost is simply not the only thing businesses care about. Many businesses, especially new ones, care more about time to market and flexibility. Questions like "how many servers do we need? with what specs? and where should we put them?" are a giant distraction for a startup, or even for a new product inside a mature firm.

Cloud providers provide the service of "don't worry about all that, figure it out after you have customers and know what you actually need".

It is also true that this (purposefully) creates lock-in that is expensive either to leave in place or unwind later, and it definitely behooves every company to keep that in mind when making architecture decisions, but lots of products never make it to that point, and very few of those teams regret the time they didn't spend building up their own infrastructure in order to save money later.


> The whole push to the cloud has always fascinated me. I get it - most people aren't interested in babysitting their own hardware.

For businesses, it's a very typical lease-or-own decision. There's really nothing too special about cloud.

> On the other hand, a business of just about any size that has any reasonable amount of hosting is better off with their own systems when it comes purely to cost.

Nope. Not if you factor-in 24/7 support, geographic redundancy, and uptime guarantees. With EC2 you can break even at about $2-5m a year of cloud spending if you want your own hardware.


I did compliance for a fintech under heavy regulation.

If we used AWS, we could skip months of certification. If we use a custom data center, we have to certify it ourselves (muuuuuch more expensive).

From this standpoint, cloud beats on-premise.


capex vs opex




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: