It's not just a question of speed. If your machines are slow, that means you need more machines to handle your throughput, which means you are paying for that 20 ms slowdown in actual dollars.
Sure, but the whole thing is predicated on the 20ms slowdown coming from a slow machine, not network latency. And that's a pretty good assumption. Due to RAM limitations and abysmal performance, I could maybe push 15 concurrent requests on a c1.medium running a Rails app in Passenger with a non-CoW Ruby. Forking is terribly slow on EC2. An m1.small was out of the question.
I'm working on a web service (build on top of Scala and the JVM) that's handling between 1500 and 3000 reqs per second per c1.medium instance, with an average time per request of under 15ms. This is real traffic, with the web service receiving between 16,000 and 30,000 total requests per second during the day. A c1.xlarge can do 7000 reqs per second or even more, but for the moment I felt like the difference in pricing is too big and it's cheaper and safer just starting more c1.medium instances (with auto-scaling based on latency), but in case we'll need more RAM, then we'll probably switch to c1.xlarge.
If scalability matters, you should have picked a better platform. Ruby/Rails/Passenger is a terrible platform for scalability / performance. And even if AWS is slower than other solutions, the first problem you have is your own heavy-weight app and the platform you've chosen. 15 concurrent requests per second makes me chuckle.
I just wanted to add -- since you're not the first to point out the Rails part -- that I've also run a 42 node Cassandra cluster on a m1.xlarges and did a fair bit of CPU-bound operations (encryption and compression) on hundreds of TB of data on cc2.8xlarge. I just used the Rails one as an example.
In the case of Cassandra, disk I/O was a constant issue. So, we grew the cluster much larger than would be necessary on another provider. We also lost instances pretty regularly. If we were lucky, Amazon would notify us about degraded hardware, but usually the instance would stay up but do things like drop 20% of its packets. Replacing a node in Cassandra is easy enough, but you quickly learn how much their I/O levels impact network performance as well. Nowadays Cassandra has the ability to compress data to reduce network load, but you then run into EC2's fairly low CPU performance.
The CPU-bound application I mentioned wasn't so bad, but we paid heftily for that ($2.40 / hour - some volume discount). At the high end the hardware tends not to be over-subscribed.
Performance, price, and reliability were all issues in all cases. Those are not EC2's strong suits and haven't been for a while.
I don't entirely disagree. All I can say is REE and Rails 2.3 were far lighter weight and faster than Ruby 1.9 and Rails 3.2. Given it's a 3.5 year old app, the landscape was pretty different back then. I looked at Lift and didn't like it. Django was still in a weird place. And ultimately Rails looked like the best option for a variety of reasons.
Things evolve and whole hog rewrites are difficult. Nowadays we run in JRuby and things are quite a bit better. But we can't run on anything smaller than an m1.large. The low I/O and meager RAM in a c1.medium preclude its use. (BTW, that's where a lot of the original 15 came from -- with a process using 100 MB RAM and only 1.7 GB available, it's hard to squeeze much more out of that).
But the larger point is with virtually any other provider you can pick a configuration that matches the needs of your app (rather than the other way around), don't have to fight with CPU steal, don't have to fight with over-subscribed hardware, and don't have to deal with machine configurations from 2006. Yeah, Rails is never going to outperform your Scala web service. But if the app would run just fine on the other N - 1 providers, then it's disingenuous to gloss over the execution environment as well.
Run it on top of JDK 7 and use the CMS garbage collector, as JRuby (and Scala) tend to generate a lot of short-term garbage and experiment with the new generation proportion (something like -XX:+UseConcMarkSweepGC -XX:NewRatio=1 -XX:MaxGCPauseMillis=850). You can also profile memory usage (make sure you're not stressing the GC, as that can steal away CPU resources) and for that I believe you can use Java profilers (like YourKit which is pretty good).
Also, try to do more stuff async, like in another thread, process or server. Use caching where it's easy, but don't over do it, as dealing with complex cache invalidation policies is a PITA.
That's one way to look at it. Another is when this app started 3.5 years ago, Rails & the app had a drastically different performance profile and Amazon didn't have super-over-subscribed hardware. Not that it matters much, but there's nothing convenient about having to engineer around EC2. And doubling your capacity or constantly upgrading instance sizes is not cheap, nor a scalable solution in any practical sense.
Pick your language though. With terrible forking performance, any process-based execution environment is going to have similar issues. And I found running a servlet container on anything smaller than an m1.large to be an utter waste. 1.7 GB RAM isn't enough for many JVM-based apps and threading could easily overwhelm the system. Anything less than high I/O capacity just can't keep up.