For a post titled "How Ravelry Scales to 10 Million Requests Using Rails" the only scaling advice they mention are the technical specs of the site like:
Tokyo Cabinet/Tyrant is used instead of memcached in some places for caching larger objects. Specifically markdown text that has been converted to HTML.
and this one tip:
The database is the problem. Nearly all of the scaling/tuning/performance related work is database related. For example, MySQL schema changes on large tables are painful if you don’t want any downtime. One of the arguments for schemaless databases.
It should be illuminating that a site of this size doesn't need to have a lavish description of arcane scaling strategies. It's fairly straightforward so the how they make the site becomes most interesting.
I suspect that was prior to the initial launch. The architecture has presumably evolved to the state described in the HS article since then.
Edit: yes, Casey says in the Tim Bray interview that "As soon as we could, we got alpha testers in to try it out... 4 months later, we had a site that we were ready to announce."
10 million server requests per day sounds kind of impressive, until you actually do the math.. divided by how much physical iron they're using, that's a little less than 9 requests per second per server.
It makes me wonder: if they were using something other than rails, would they need that much iron?
I strongly suspect a flaw in your statistics: I'm willing to put money on this site having a spikey workload, not a constant workload. There are probably hours in a row that 6 of those servers sit idle.
Rant: I wish technical sites would stop using req/day as a metric. It leads to the op type of analysis. At the very least, such articles could use a format of "X req/day peaking at Y/s". Maybe if the NYT was writing it would be ok to use req/day but a sight who's tagline is:
"High Scalability Building bigger, faster, more reliable websites." should know better.
Sorry for that late reply, IMO, that title was fine, it did it's job well. My rant, etc, was about the stats section in the article itself. It still uses a flat time model, on the scale of N things/day instead of a more representative N things/day (X things/(smaller than day time unit) at peak).
How much development time can you afford to spend to save the cost of four or five servers?
(You can't save the sixth server if you want your site to be up while the seventh one is rebooting or being replaced.)
If the other comments are to be believed, this site was built by one person, working part-time, in four months. He can't afford to lavish time on unimportant problems, like desperately trying to conserve server resources that he could otherwise afford and that cost far less than a programmer's time is worth.
The "small team" is his wife and some community managers -- no other engineers or admins. He builds & runs the whole stack himself, and he has duties outside of the admin/development pieces of the business. So he's still essentially a part time developer/part time admin.
If this were a Java app, I definitely would have been able to get away with less (mostly because of less memory consumption, but less CPU consumption wouldn't hurt either)
However, I'd probably still want 2 machines for redundancy.
Wait... they have Nginx out front passing requests to HAProxy and THEN to Apache + mod_rails? That just seems like a bit much given that mod_rails can be installed with Nginx straight up. Why would you want a set up like this?
Having Nginx in front is a lot more flexible that just having HAProxy listen on port 80. For example, it can serve static files and do redirects both of which don't need to pass through the whole load balanced stack.
They could use Nginx -> HAProxy -> Nginx(w/ passenger), but the Apache version feels slightly more mature (e.g. it has some config options that the Nginx version lacks) and it's likely they were already using it before the Nginx version came out.
It's better at load balancing. For example it handles app servers that have gone down more gracefully. Also it generates an awesome stats page that gives you way more info about what's going on than you can get from Nginx.
Yep! We used nginx's fair balancing module before switching to haproxy. It also helps me do rolling restarts/hot deployments in a nice way.
It's really a great piece of software. Kudos to Willy.
PS - you're also correct about the nginx->haproxy->apache. nginx makes a fabulous front end and I just plugged in Apache/Passenger where Mongrel used to be. I like that 1) I can easily plug in something else in the future and 2) Passenger on Apache is very stable. Nginx support is newish and I'm running stripped down Apaches that only do Passenger, so I'm not too fussed about it.
Purely anecdotal, but does anyone else notice while browsing around even the (largely static) unauthenticated pages that the generation times are a bit lousy?
Tokyo Cabinet/Tyrant is used instead of memcached in some places for caching larger objects. Specifically markdown text that has been converted to HTML.
and this one tip:
The database is the problem. Nearly all of the scaling/tuning/performance related work is database related. For example, MySQL schema changes on large tables are painful if you don’t want any downtime. One of the arguments for schemaless databases.
Not much "how" in that.