Hacker News new | past | comments | ask | show | jobs | submit login

Hmm I would like to hear more about these problems that prevent people from spinning up instances. Is it a frequently occurring problem or only happens rarely (e.g. when APIs are down). Also are they managing the instances themselves or using EC2's AutoScaling cluster? I run a dynamically scaling cluster on EC2 and have not run into the problems like you mentioned, so I would like to hear more about them if possible. Maybe they are spinning up and down too rapidly and exceeds the API rate limit? You'd have to have a bugged/bad provisioning system to accomplish that though...



In the past, when an AZ goes down, everyone tries to transfer to another AZ and makes lots of API calls.

Consequently AWS has to rate-limit API requests to avoid taking another AZ down.

Maybe this has changed though, but we'll probably have to wait for another disaster to know!

---

But of course, an AZ shouldn't be your infrastructure's single point of failure!


I'm aware of the API rate limit when an AZ goes down, but the original comment was about problems that people run into for highly-variable traffic sites that scale up and down on a regular basis (e.g. spin up 20 instances during the day with high traffic, at night shutdown 15 of them to save cost, on a daily basis). This of course is not related to any AZ/API downtime, and I am not aware of any problems that could interfere with the normal usage of APIs during normal service operations, which is why I wanted to hear more about the details of those problems.


It's due to rarely occurring problems such as APIs down/unresponsive, insufficient/incorrect instances available, and a third major class of problem that's frustrating my memory at the moment. In addition to not scaling up and down (automatically or manually), many of these apps are architected to require as little from AWS as is possible to reduce their exposure. For example, they'll refrain from using ELB because ELB depends on EBS.

These decisions were made by companies that started off believing fully in the promise of elasticity, and gradually shifted to less elastic architectures as they experienced issues. That said, it's worth noting that these are firms with very high costs of downtime, so the magnitude of failures was very high.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: