At $work we run a raytracing renderfarm on AWS spot instances that are only spun up as needed. This is far cheaper than keeping the same amount of physical machines around to serve peak use. The caveat is that we could most likely do with a smaller pool of machines and queue up jobs over night yielding higher utilization.
So the price advantage only exists when the lower time-to-completion is a requirement.
> Most people using the cloud aren't really that cost concious. EC2 is about 8x the cost of bare metal...
I've heard these claims before, but I think people are not doing the correct math. I would love to be proven wrong. It is not about computing the costs of hardware + depreciation + redudant power, adding together and calling a day.
You have to add:
– All the engineers that will be maintained your baremetal datacenter. If you are using a colo, use their costs
– Downtime caused by mundane issues that you simply do not see on AWS. Yes, a single hypervisor will go bad. You fix that in _minutes_ by stopping and starting your instance. Yes, VMWare or OpenStack (with networked storage!) could do that, but know what you also won't have to do? Deal with suppliers.
– The ability to quickly scale up and down according to your load. I can fire up 30 servers with a single terraform script and tear them down before your hardware supplier even returned your call. There's noone on my end racking and stacking anything.
– Downtime when shit breaks. If you have 5 thousand engineers impacted because of a maintenance mishap knocked out power to a datacenter and cascaded to the backups, you'll quickly burn through your savings.
I could go on. There are scenarios where on prem makes sense, but one should not dismiss it outright without accounting for the additional risks you are incurring and quantifying those.
One thing that people often forget (and I have to keep reminding people about) is that a single Availability Zone on AWS is composed of multiple datacenters, not just one. And you have multiple AZs. Heck, if you are deliberate about it you can even recreate your entire stack in a different region entirely. It is not apples to apples.
You may not need them, but if you do, cloud providers offer capabilities you are unlikely to replicate yourself. Specially if it's not your core business.
More to the point: even if your 8x figure is correct, the AWS Elastic offering adds a premium _on top of that_ . So they have to offer better capabilities than you can build yourself to account for the premium – in my experience the extra cost is hard to justify (it can make sense, but it is not as clear as EC2)
8x is close to the top end, somewhere between 3x and 10x depending on a number of factors, mostly scale.
For the record I factor in:
- Hardware depreciation (36mo)
- Power
- People
- DC rent + power
- software licences / support
for on premise.
What we see now is that compute is dropping in price for on prem every year and density is improving. AMD Rome brings incredibly bang for your buck when buying at significant scale.
But it's not comparing Apples with Apples.
It's virtually impossible to accurately factor in the opportunity cost of doing all this yourself but you can potentially hire a bunch of engineers with the savings of going on prem, ymmv
You can never ever recreate the developer experience on prem regardless of your scale, on you can tell if on prem is good enough
It's difficult to put a value on being to pay as you go or suddenly be serving workloads out of a geo close to your users in Cloud where on prem there is always a lead time
Finally whilst the developer experience is better suddenly having to deal with new challenges takes a while to adjust in Cloud, outages out of your control, non predictable performance, poor support, no access to your hardware
TLDR: Cost is hard to define and isn't a zero sum game
I've seen this happen a lot. Basically it continues til it starts to eat the company and they struggle to reign it in.
I did some work at $company, they were basically profitable if not for their 7 figure AWS bill. I handed them a plan that would have cut their bill in half with a one time $75k spend. They also had static load so moving to dedicated instances would've cut a huge amount off their bill as well with basically zero effort.
The managed ES service makes sense for small things where the operational costs of self managed dwarf the overhead. But at scale, it just doesn't make any sense.
EC2 costs are so variable too due to reserve, spot, and elasticity (depending on workload). It can be hard to compare.
Is this figure for an infrastructure that requires failover capability (potentially remotely) and does it assume 100% utilization of the bare metal? That seems crazy high, but I'd also believe it.
Let me share my anecdotal evidence than with numbers. I have migrated an on-prem cluster of 150 nodes which has hadoop, elasticsearch and docker apps running the UI. We have achieved 30% saving in the year over year budget for the company which is ~600.000 USD. This is not about EC2 vs a Dell server for example in a datacenter because this comparison would be an apple to oranges one. This is the sum of all the costs on-prem vs the sum of all the costs in cloud. When people try to compare purely EC2 to a node running on-prem, the only thing is 100% crystal clear that they do not understand how the cost is structured for a infrastructure. Quite often they forget that we need networking, electricity and cooling in the datacenter. They also forget that datacenter capacity cannot be given back when not needed (auto-scaling) and few minor things. This results in the conclusion that the cloud is more expensive than on-prem which in my experience of moving several fortune 500 to the cloud is not true, quite the opposite, significant cost savings can be achieved.
I was talking about "in a datacenter", but yes, you have almost zero elasticity with datacenter buildouts. You can get metered power but that can be a mixed bag since its usually more expensive per KWH.
Most companies that "move to the cloud" also make a lot of changes to the way they do things so they can scale/up down dynamically, thats not an insignificant cost in development time.
If you have a static load AWS is really expensive.
Also GPU's are still insanely bad to do in AWS/Azure.
Lets say you need the equivalent of 60 x p3.16xlarge for a whole year, thats well over a million USD a month. You're breakeven on month 3 in a datacenter even with all the overhead. Maybe some of that is my ability to get good deals, but even if you breakeven on month 4, thats crazytown.
>> If you have a static load, AWS is really expensive.
Again without details, this a meaningless claim. My own company's only infrastructure is a website that "runs" on AWS using the free tier of Cloudfront and a little bit of the paid tier of S3. This is a static workload. It is really cheap. Without adding all the details on both sides and the workload you cannot claim that AWS (or for that matter any cloud vendor) is more expensive.
If you can find cases where its cheap, thats great, but its not the case for a lot of people with static compute load.
One of my example datacenters is:
Private servers, access controlled, etc
12 servers (E5-2438L, 128GB of ram, 6T of RAID in each, 10gbps interconnects/etc)
It costs $1200/month to run.
Just the storage cost in AWS is around $3k/monthly.
The equivalent EC2 cost is around $14k/monthly.
It requires around 1-2 hours/month of oversight and the costs are generally fixed except for the bandwidth which is billed at a fraction of the cost of AWS pricing.
If you factor in operational cost AWS can be cheaper but often isn't really.