This is a pet peeve of mine, but when you are running your own servers in a datacenter, that is not "your own cloud", it's just your own servers.
I realize the cloud is just a marketing term and it doesn't really mean something, but if I move my single heroku dyno app to my own dedicated server sitting in my basement, did I just move it to my own private cloud?
If you have a server farm that allocates VMs dynamically on demand, you have something much different than a dedicated server, and it is quite reasonable to call this a private cloud.
No, thats ridiculous. You are still paying for all the servers in the server farm, even if you have decided to waste another year of Moores law to add a bulky abstraction layer on top.
The whole point of a cloud to a business is that you can adapt to changing needs without having to pay for all the infrastructure all the time.
(You didn't think the suits talk about "the cloud" all the time because they fancy the technology, right?)
> You are still paying for all the servers in the server farm
Amazon is paying for all the servers in the EC2 server farm. Does that stop EC2 from being a cloud?
TestingBot is spinning up VMs dynamically to scale out test infrastructure on demand. Their use case is basically the ideal cloud use scenario. They've built a system of servers that enables them to do all of the things typically associated with clouds, which means they've built a cloud.
You're stuck thinking of TestingBot needing a bunch of servers, and seeing the tradeoff as dedicated servers vs cloud servers. In reality, it's TestingBot's customers who need a bunch of servers. So TestingBot built a cloud for their customers. They aren't setting aside N servers for each customer. They're building a cloud with the capacity for X servers and carving out of that on-demand.
Yes, EC2 is not a cloud for Amazon. If tomorrow everyone moves away from it, they will be stuck with many millions in sunk costs for hardware, rack space, datacenters, paid peerings.. they didn't rent that stuff by the hour. Feel free to replace Amazon with TestingBot in this description as you see fit, because right now you are confused as to what is a cloud to whom.
(Mind you, I'm still only discussing the financial side here, as this was the reason the original post cited. In technical terms, of course they are still a cloud, but so is the mainframe emulator on my Raspberry)
Somebody somewhere always has to pay for the infrastructure. I fail to see how the parent is confused between Amazon having infrastructure which they provide cloud services on, and TestingBot having infrastructure which they provide cloud services on. What's your point of distinction on the 'financial side' between the two exactly?
> The whole point of a cloud to a business is that you can adapt to changing needs without having to pay for all the infrastructure all the time.
The point of a cloud is that you can adapt to changing needs by reallocating resources based on demands. For a public cloud, that's exchanging money for more storage/compute capacity/etc. with near-zero latency. For a private cloud, its not adding total capacity, but reallocating between different uses on demand.
Having worked in large organizations still primarily concerned about allocating physical servers to tasks -- which, just like one with a private cloud, still keeps substantial excess capacity to accomodate changing needs --I can immediately see that being able to reallocate resources with minimal latency is valuable even if its from a fixed pool where increasing the total size of the pool has substantial latency.
That's a technical detail, not a financial one. Sure, you can save a few physical servers by being smarter in allocating load, but that is a far cry from the financial flexibility you get from e.g. EC2.
People were making out of band HTTP requests long before the irritating term AJAX was coined too. The name being made up after the method started being used does not mean the method can't be called the name if the properties fit (likewise as much as the marketers think otherwise, making up a new name doesn't make a thing technically revolutionary!)
You just don't like the word "cloud" at all, and that's fine, but that is a different complaint from the original, which was basically that the term is being misused when combined with the term "private". If we are going to use the term "cloud" and assign it a meaning, we can certainly use it to refer to private clouds. If EC2 is a cloud for me, is it not also a cloud when Amazon builds services on it?
If we're going to assign a meaning to the term 'cloud', shouldn't we assign it to something we don't already have a word for, so it actually describes something?
If I start two VMs on my desktop and we call that a 'cloud' then the word 'cloud' isn't terribly useful in describing anything.
I don't think the accepted definition of "cloud" is just "group of VMs". Your two VMs does not constitute a cloud. You can set up ten thousand VMs on a thousand servers and not have anything that would reasonably be considered a cloud if you did it manually. "Cloud" generally implies simplicity and automation. You could build a cloud with physical machines instead of VMs so long as you have automatic ways to allocate and manage the machines' lifecycles.
I have no problem with the word cloud. I have a problem with it being used to mean "anything I call cloud". If we are going to use the term and assign it a meaning, then lets assign it a meaning instead of using it for everything from "webmail" to "virtual machines" to "paying someone else to do something for me".
That's not at all the case. "Cloud" isn't about external hosting, its about using virtualization to allow dynamic provisioning of resources. Outsourced external hosting was around a long time before cloud computing. Its a key application of cloud computing, but they are no more equivalent than "internal compustion engine" is equivalent to "wheeled land vehicle".
Exactly, a key attribute of "The Cloud" has always been offloading the risk and costs of scaling. You get linear elasticity in scale without step functions in cost.
Once you plunk down all the money for networking and hardware yourself you've lost elasticity and will have a step function in cost the next time you want to scale. When you turn off VMs, you aren't saving anything at all. You just have expensive idle computing resources.
You're looking at this wrong. You're thinking of TestingBot as the customer, so you see one customer replacing AWS with dedicated servers and calling it a cloud. That's not the actual scenario, though. TestingBot built a cloud for their customers. They built a way for their customers to offload the risk and costs of scaling their test infrastructures. They built a simple[1], automated way for their customers to easily scale. They built a way to share the same hardware among many customers and thus lower costs and improve utilization. This is a cloud.
I also don't really agree with your claim that the key attribute of "the cloud" is offloading the risk and cost of scaling. From my perspective, "the cloud" is primarily about simplifying and automating scaling, thus reducing the human and time costs. It's not just about dumping the cost of physical machines onto someone else. I see taking less risk on machine purchases as a nice benefit, but not the core purpose.
[1] I haven't actually used TestingBot, so this isn't an endorsement. I have no idea if they actually do a good job or not.
You are not totally wrong. TestingBot seems to offer their servers as a cloud to their customers. Which means that TestingBot did not move to its own cloud (compare that to the title of the blogpost please), but they moved to their own servers (and build a cloud on them, like you said).
Their customer can scale easily, they can not. Their customers don't have to pay for all the hardware needed, they do. Hence they are not using a cloud, they did not move to a cloud - they are using servers, they moved onto their own servers. The title is wrong.
Why is that important? Not only because it is a basic misconception, but because saying "we moved to our own cloud" misrepresents their current abilities regarding their ability to scale and their cost structure.
On day one, buying equipment, labor and software to stand up a computing environment is a risk. You don't truly understand your needs.
After a couple of years of operations, you should have a better idea of your typical run state and growth patterns. At that point the risk may shift. If I can deliver compute with AWS for $X, but can also deliver that compute through some other means for $X-Y, you have a new risk/cost evaluation to do.
The key is continuing to use the cloud thought process.
"AWS is expensive, as soon as we start an instance, we’re billed for the entire hour, even if we only need to run a 2 minute test on it."
What you really needed to do was design algorithm that keeps machines running over period in time, according to a general trend. Not shutdown, and instantly bootup for every customer deciding to run tests.
I'm glad I'm not the only person that thought this, because I'm a HPC batch admin and that sort of machine packing is just what I do.
I'm glad they found a solution which seems to work out really well for their use case. Changing the algorithm to use the machine for an hour would imply that some tests may wait to be run, and that could be a bad thing for TestingBot (I'm not sure what their SLA is).
Going to their own "cloud" though will require them to either overbuy capacity or eventually make a push to increase utilization, so they may end up making that algo anyhow.
The reality is that once you reach a reasonable level of scale (multiple physical machines), the cost savings from going to your own cloud will easily allow you to afford that extra capacity and then some.
I disagree that it's the only reality. Definitely there is a scale where paying a sysadmin makes more sense than paying a provider for their sysadmin skills.
There is another situation, which I would argue is the "enterprise" situation, where you will want to pay your programmers to make the changes because they're fixed overhead rather than capital expenditures. This scenario comes because of an internal push to save money (adding significant capacity costs too much), or because adding any sort of capacity increases operational complexity more than it increases code complexity. It's a much larger scale than where TestingBot is at, but a problem they probably hope to encounter one day.
As far as I know you have to stop the instance to mount a fresh EBS volume, which means as soon as you start up the instance with the new volume you're paying for another full hour again.
I'm pretty interested in your experiences with linux on azure.
I've been to both azure and aws workshops, and I found the azure presenters to be somewhat hostile towards alternative operating systems. They had a slide saying they linux was 100% supported, but they didn't talk to it at all. In fact, it was nearly comical how the presenter didn't want to touch my mac to help with setup.
From my reading, azure seems to be pretty awesome as far as integration with visual studio, but is seriously lacking when it comes to features on their load balancer etc. Their security model is kinda weird too, where defaults open your firewall to the rest of azure commonly.
So that's just my ideas, what are your experiences?
I'm sorry you had a bad experience at the Azure workshop. There are a lot of people running those types of events both inside and outside of Microsoft and the quality is all over the map. Unfortunately a lot of technical people at Microsoft are still very insular. They primarily interact with Microsoft friendly .NET developers and rarely have to stray into the big scary world of *nix. They probably don't talk about the Linux support or help with your Mac because it would show their ignorance on the subject.
Fortunately there is also another side of Microsoft. A side which is doing fantastic work in the open source community as well as with things like the Linux support in Azure. I would be happy to talk more about Linux support in Azure and I am more than capable of helping out with any setup required, regardless of your operating system of choice.
If you're still interested, you can reach me at tistrimp@microsoft.com
Right, 10 min upfront, then by the minute. The last minute chunk is discarded from billing if it was actually used less than 15 sec.
Or so I remember from reading the ToS yesterday.
The command-line cloud utility is so much easier than Amazon's. Everything is in one place. ssh keys are managed automatically. In less than 1 hour I was able to do everything with Google, whereas it took me a couple days to feel comfortable with AWS.
Are many people using so few aws services that they are able to move to google? It seems like a hassle to use google for VMs and file storage, while having to still use amazon for all the other services google doesn't offer.
The relevant and important difference being that this is a one-time expense. It contributes almost nothing to the cost unless you do this then immediately drop the platform.
For those unfamiliar with Amazon, Amazon AWS consists of over 25 different services. This article focuses on EC2, the virtual server service.
I use S3 and DynamoDB regularly and think the pricing is better than most. I don't use it for pricing though. I use it because I don't have to worry about load balancing, adding multiple servers, running out of disk space. Set it up and forget about it.
I'm confused. Noisy neighbors and the "round up to the hour" AWS billing can both be fixed by more active instance management. It's not trivial but it's also not incredibly complex. Moving completely off of AWS has huge ramifications for your business in the long run, some positive some negative, and it seems weird to make such a bold and disruptive move based upon two issues that are both well known and fairly straightforward to address.
The real reason is his business is probably plateauing and he didn't want to say that publicly. That would make much more economic sense as a reason to move off of AWS.
Might be true. If you have stable load moving off of EC2 can save you at least 50% over reserved instances, and >90% on bandwidth costs. If you no longer need amazon's scaling advantage, it's a very easy call to make.
Of course that's assuming you haven't locked yourself into amazon's databases/load balancer/... But that would be a horrible idea anyway, right up there with running a microsoft stack licenced per year.
That's funny, I was just reading this blog post from Adrian Holovaty on some of the issues he ran into with deploying Python/Django on Heroku and how great AWS has worked out for him with Soundslice:
Anyway, I'm definitely in the wanting to avoid sysadmin stuff if at all possible camp. Does anyone have any thoughts on AWS vs Heroku for Django? Does Adrian's solution seem reasonable?
DevOps/Sysadmin here. If you want to avoid the DevOps pain use Heroku until you run into its pain points, and then migrate to AWS (or even Digital Ocean, if you don't need S3, elastic load balancers, etc).
Most developers should have no problem in AWS for small to midsize environments until you're scaling up.
RDS recently added cross region snapshot copies, and more importantly, cross region replication for at least MySQL. Those were the last two features I was waiting for to jump to RDS. In fact, I was in the process of setting up cross region backups using just the cross region snapshot copy when they released the new cross region read replicas feature.
I'm very much looking forward to moving our manually managed cross data center MySQL replication/admin to RDS. But I also don't have to scale MySQL to ridiculous levels either, I just need very high availability.
E.g. you can have a nice app running with elastic beanstalk, file storage on S3, database on RDS, full text search on CloudSearch, without worrying once about apt/yum, incompatible JVM versions, backups, stuff in /etc or setting up iptables and ssh, which is what most people think of as sysadmin work.
Bad idea. Extraordinarily bad idea. Of course, I'll gladly charge you $450 an hour to recover your data when you accidentally drop a production table. It's part of how I make a living.
> setting up iptables
i.e. setting up security groups - same solution, better gui. Someone still needs to understand what it's doing.
Elastic Beanstalk... sounds cool, but based on the "let you take over management when you're ready", it's still a EC2 instance under the hood that's just configured for you. Scale at all, and you're probably going to have to start concerning yourself with all those bits you've been able to ignore until then.
AWS' automated offerings are good, but if you're doing anything at scale, they're frequently not good enough. And my experience has shown me that the line is really really painful to cross when you get to it with a fully staffed team, let alone a sole developer.
If I accidentally drop a table RDS have automatic backups and automatic snapshots.
> i.e. setting up security groups - same solution, better gui. Someone still needs to understand what it's doing.
Yes, but like being able to install wordpress doesn't mean you know how to program, setting up beanstalk+rds is not on the same level of expertise as setting up N machines from the ground up. When I start an RDS/CloudSearch/DynamoDB instance the security groups are already setup in a sensible way.
Sure, there are plenty of cases in which what AWS offers is not good enough. Heck having an ops team is always better than not having it.
But I replied to AWS only getting rid of "someone who can physically access the box". That seems reductive.
"Noisy neighbors: sometimes instances would behave much slower than usual, because other people on the same hypervisor were using all the hypervisor’s resources"
The Xen hypervisor actually has pretty good resource allocation between virtual machines, and although there are academic attacks [1], I'm interested to hear what evidence you've seen of neighbors hogging your resources.
No the paper is saying the opposite -- that this situation can be the case -- what I was interested in was hearing their evidence that this is happening to them.
I think most "noisy neighbor" complaints actually come from the noisy neighbors. EC2 lets you use CPU resources of your neighbors if they aren't using them. So people hammer the machine and expect that level of performance, then when a neighbor does something and suddenly you are scaled back to only using the resources you actually pay for, you feel it is slow and complain.
But I think if EC2 didn't let you borrow CPU, nobody would use it because they would realize how absurdly underpowered EC2 offerings really are.
What you say might be true for CPU usage, maybe even for network usage, but definitely not for I/O. In that case it's definitely noisy neighbors causing problems for others, but in a way it doesn't matter. Whether it's noisy neighbors or hard resource limits, the variability is still there. For single-machine stuff you just have to assume the worst case. For things that are distributed among large enough numbers of nodes you can do a certain amount of averaging, but high variability still forces you close to the worst-case assumption.
I've run a lot of tests in this area, and even presented some of the results at LISA last month. When it comes to I/O, EC2 is considerably less consistent than many others[1] and that's what really hurts their users. Their prices might be only 2x someone else's if you're only looking at the average case, but it's more like 10x if you consider their extreme variability as well.
[1] Side note: this is a hint that Amazon has an unusually high oversubscription ratio. The same practice has been evident in shared web hosting, with the same effect, since that industry was created.
>But I think if EC2 didn't let you borrow CPU, nobody would use it because they would realize how absurdly underpowered EC2 offerings really are.
yeah, that's pretty much the crux of virtualization; it's a lot like buying bandwidth. Yes, your upstream is oversubscribing. Yes, if they do this right, 99% of the time, you won't notice the oversubscribe; you will get more service for less money vs. something that isn't oversubscribed. But, oversubscription needs to be managed carefully.
One thing I've noticed? If your upstream has a 1000mbps line, and sells 1000 unlimited 10Mbps ports, you are almost never going to see contention, even though it's a 10x oversubscribe.
If your upstream has a 1000Mbps port, and sells 10 1000Mbps ports? that's the same 10x oversubscribe. But I /guarantee/ that you will hit contention at least once a week. Probably more often. If people expect 1000Mbps reliable off that, they will be very unhappy. (Of course, if you setup your QoS properly, and tell the customers that it's 100Mbps CIR and up to a gigabit of best-effort burst? it can work out just fine. But nobody is going to get a reliable full gigabit out of that deal.)
90% of your users are using like 10% of your resources. But you've always got a few who are running torrents (or, in the case of CPU, mining primecoins) In the 1000 10Mbps port situation? it doesn't really matter if you've got a few bittorrent users. In the 10 users who can all completely fill the pipe situation? it matters a /lot/
That's the thing about CPU sharing, though; most of the time you don't put that many guests on one machine, and often you give each guest the ability to use the whole machine (when it's otherwise idle) - so you are in the situation of selling 10 1000Mbps links when you only have 1 1000mbps uplink, which ends in tears if anyone actually expects a reliable 1000Mbps uplink. (Now, if everyone understands that it's actually 100Mbps CIR that can burst to 1000Mbps, then sure, people can be happy. but you have to be careful with those expectations. With the 1000 10Mbps links on a 1gbps uplink, customers can treat their 10Mbps link as 10Mbps dedicated, and 99% of the time, they will get what they expect.)
"If your upstream has a 1000Mbps port, and sells 10 1000Mbps ports? that's the same 10x oversubscribe. But I /guarantee/ that you will hit contention at least once a week. Probably more often. If people expect 1000Mbps reliable off that, they will be very unhappy. "
I like that example.
Reminds me of account receivable and bad debt.
Better to have 1000 customers that owe you $30 each rather than 10 customers that owe you $3,000 each. I'm not factoring into this example the cost of billing or customer service. Strictly that if you have 1000 customers it's much less aggravating and you don't loose sleep at night worrying about a big customer that doesn't pay a bill.
Yup. And either way, you can solve the problem with capital. (Having a bunch of money laying around to cover the shortfall in the case of A/R, or having a very large burstable uplink with a small commit in the case of overselling bandwidth.)
Of course, burstable uplinks all suffer from the 'best effort' issue... if you go beyond your CIR, well, there's usually headroom, but not always.
Looks like "their cloud" couldn't handle Hacker News' traffic. Had they set up their blog in an AutoScaling group in EC2 this would not have been a problem :)
it's real seamless and nice just ends up being expensive after a while if you are trying to do enterprise stuff & their add-on services aren't particularly cheap. The customizability of AWS & private stuff means you have to do a bit of server admin but it is generally cheaper & can give better performance.
EDIT: Oh also note that (my biggest gripe) I see big performance swings / queue-ing issues that aren't really correlated with traffic. Plus they introduce some platform/API changes intermittently that make their admin UIs kinda buggy. Or you have to change your workflow to integrate with their services (though the APIs for this are usually ok). I don't know, I feel like they make a lot of money draining people on threads since the performance of RoR overall is kinda questionable, along with the performance of their platform. I'm going to try with Play framework soon & see if there are less of these issues.
nothing's wrong with it (although it's expensive). I think the point was the Heroku is a service built on top of EC2. Switching from Amazon EC2 to Heroku isn't switching your platform at all... it's just paying for convenience. Personally I think Heroku's convenience is worth the difference. The question is more if the base EC2 pricing (which Heroku adds to) is worth it for your needs.
Also, OpenVZ has faulty POSIX support in some cases. I'm particularly aware of how the lack of xattr support has bitten many GlusterFS users, but the same lack could affect anyone who wants to use SELinux, ACLs, or just xattrs for their own sake.
You're missing the point. EC2 specifically didn't work for them because they were constantly spinning up new instances and killing old ones, so they were repeatedly incurring the minimum one hour cost. If your business model doesn't call for that sort of a workflow, the article is irrelevant to you.
I realize the cloud is just a marketing term and it doesn't really mean something, but if I move my single heroku dyno app to my own dedicated server sitting in my basement, did I just move it to my own private cloud?