How to Survive a Slashdotting on a Small Apache Server

chronomex · on Jan 3, 2011

I'm a staff member on a community site (ticalc.org), the biggest fish in a small pond. We get about 250,000 visitors in a normal month, averaging about 3 pages per visit. For the past 5 years or so, the site was hosted off a dual Pentium Pro; before that we had a 486. Currently it's a VM living in Germany. Postgresql, Apache, Linux.

We've been Slashdotted several times, without any appreciable slowdown. How does that work? The whole site is static content. Dynamic content is either rendered out to disk when it changes, or is a couple of static page fragments that are combined at serving time--and this is only for logged-in users.

(Granted, the last time the site was redesigned was 2001. I'm working on a new design right now. I have no plans to make it any slower.)

CitizenKane · on Jan 3, 2011

If you want to save yourself the headache just use nginx, it responds much better under load than apache. If you're using a dynamic backend such as PHP, RoR, django or something else you can really boost the number of page loads with reasonable caching strategies.

jrockway · on Jan 3, 2011

I just let Varnish do the hard work. I make sure my web app sets caching headers correctly, and then I just point Varnish at it. Instant 12,000 requests per second. I am sure if I spent time tuning the machine and setup, I could do even more. But my users make about 12,000 requests per decade :)

iamjustlooking · on Jan 3, 2011

Nginx also has good http caching built in. I use it in a heavy load scenario and haven't had a single problem.

http://wiki.nginx.org/HttpProxyModule#proxy_cache

towndrunk · on Jan 3, 2011

Maybe off topic but could you explain what the correct caching headers should be for a case like this?

alnayyir · on Jan 3, 2011

Varnish is great, and it's something I'd recommend to many people in many cases, but most people don't have their caching headers set granularly enough to not get caught off-guard by unexpectedly stale content at some point.

Gotta be careful, but if you know what to expect, Varnish is great.

jrockway · on Jan 3, 2011

Yeah, "web developers" should really take a few minutes to learn HTTP.

alnayyir · on Jan 3, 2011

Silly, how could they possibly find time to learn HTTP when they have to keep up with at least three blogs of mountebank victory bloggers in addition to learning the latest Drupal plugin?

yummyfajitas · on Jan 3, 2011

In that case, you are already screwed. There are likely to be other proxies between you and some of your users.

fexl · on Jan 3, 2011

Although my Loom server code is completely standalone and does not use Apache, I'll take this opportunity to discuss Keep-Alive policy since the article mentions it.

The Loom server automatically scales down the Keep-Alive interval as the server load increases. Each child process monitors its own life span, basically like this:

  # These parameters are configurable.

  my $max_children = 128;
  my $min_life = 10;   # seconds
  my $max_life = 600;  # seconds

  # Now compute the lifespan.

  my $num_children = get_current_number_of_child_processes();
  my $free_slots = $max_children - $num_children;

  my $cur_life = int($max_life * ($free_slots / $max_children));
  $cur_life = $min_life if $cur_life < $min_life;

At this point $cur_life is the maximum number of seconds this child process should live. If the child has been alive that long or longer, it voluntarily exits.

An instance of this server code is running at https://loom.cc . You can find the source code via the News page at https://loom.cc/news . The relevant function is Loom::Sloop::Client::check_lifespan .

cstross · on Jan 3, 2011

There's one thing you can do pre-emptively that'll help: if using a CMS of any kind, use it to build your content as static HTML wherever possible. Dynamic content sucks when your wee Athlon box is trying to field 100 requests per second.

(My blog's currently quiet, fielding no more than 15,000 http requests per hour at any time this year so far(!), although it's been more than an order of magnitude above that in the past month. Srsly, unless you've got massive clustering mojo you are not going to be handling that load gracefully unless you're serving static content.)

smutticus · on Jan 4, 2011

Very good advice. How often does a slashdotted page need to accept user input? And if it doesn't need user input it most likely shouldn't need to be generated afresh with every hit.

  1) Render the page
  2) Save it to file
  3) Point everyone to it
  4) Enjoy your 15 min of fame

0xbadcafebee · on Jan 3, 2011

Hopefully-constructive criticism/additions:

* If your ssh connection took forever, it doesn't really matter if it's a load-issue or bandwidth-issue. The site is ungodly slow and so will be your admin access time. Kill the site, put up an "under construction" sign, fix the performance issue, bring the site back. Link to a google-cache, web archive, or other version if possible in the meantime.

* Don't waste time running a ps. If you're running apache, just grep for MaxClients in the error log. Actually, there's really no point in checking because if you're being slashdotted, you hit MaxClients, I guarantee.

* Top isn't going to give you an accurate amount of memory used per process. You need to check smaps and some other things and all of that will take too long. Remove all modules except what you need to serve whatever static content you want to get out there during the slashdotting. You definitely do want to reduce the memory any way you can if you're constantly swapping and its loading you to hell (confirm with vmstat/mpstat/iostat).

* `killall -9 httpd` works faster.

* Your estimates of ram per client are going up when they should be static (25MB for 512 RAM and 54MB for 4G?). If you're lucky your app won't even actually use up all this valuable memory - Copy-on-Write will save as much space as it can unless the individual process needs to reserve some anonymous memory in the process. Once you unburden yourself of extra modules (run 'ldd' on the individual apache modules if you want to see all the shit they can load into your box at runtime) run apache with one or two processes to test and look at the memory use, and go from there.

* I'm kind of on the fence about this one, but in some circumstances it can help a little to reduce MaxReqPerChild to something stupidly low, like 100-1000. You risk overloading with i/o when your process reaps and a new one loads up, but if your processes keep swelling up with more memory as they run (hi mod_perl!) killing them off and starting new may help you.

* Honestly, in a slashdotting situation, use 'wget' or 'curl' to take a snapshot of your dynamic page(s) and put those in place as static files to be served to users. If you don't have a proper caching layer don't even worry about your database because you will almost invariably kill it with queries, which will kill your webservers. If you want a 'dynamic' version of your site to show to people that updates regularly, set up a cron job to wget the dynamic pages every 1-2 minutes and overwrite the static copy (but for god's sake make it back up the old copy and only move it if it's not empty or an error page).

* Looking for the biggest files is good. You can also grep and sort the apache logs to see which files are being requested the most, and staticize/shrink them however possible (css/jsp can have excess whitespace removed with some tools, images can be shrank with 'convert', dynamic pages can be made static as above, etc). `cat $LOG_DIR/access_log | sed -e 's/.*] "/"/' | sort | uniq -c | sort -g | tail` (sort doesn't print the count with 'sort -u' ... somebody should add that). Oh yeah, and anything that prints a log? You should disable that now before /tmp or /var fills up.

sambeau · on Jan 4, 2011

If your server is "business critical" and you have budget you should take a good look at Zeus's Traffic manager.

http://www.zeus.com/products/traffic-manager/index.html

It's faster than anything out there, easier to set-up than anything out there (except the one-button Apache on a Mac) and can accelerate Apache up to 100x just by sitting in front of it. It will do the job of Ngix + Varnish as well as controlling a whole server farm. It's a truly amazing piece of software that sadly gets very little mention in the world. For instance, this is the software that runs the Firefox download sites and the BBC news site. Joyent and Amazon Web Services use it to.

(Disclaimer: I used to work for Zeus 4 years ago. I don't have anything to do with the company other than having friends in the dev team. ZTM is still my baby, though)

As many of HN's readers run web-app companies I recommend you take a look and at least play with the downloadable VM. The software, while expensive by free standards, is remarkably affordable by business standards.

sambeau · on Jan 4, 2011

IF you want to see some of the fun things you can do with it and get some idea of it's power and flexibility (especially the TrafficScript programming language it comes with)take a look here:

http://knowledgehub.zeus.com/articles

The article explaining why TrafficScript was created it especially good: it explains the inner workings of the software.

lazyant · on Jan 3, 2011

why wait to being slashdotted to implement all these changes (limit apache memory use, CDN, caching etc)?

Apache (MaxClients etc) shouldn't be configured to ever take more memory than the server's RAM and during slow hours or on a separate test server you can use 'ab' or any other tool for stress-testing.

If you are 'slashdotted' basically you just want to flip a version of the site with static cached page(s).

Also for web site performance YSlow and Page Speed should be mentioned; before these changes you want to make sure you are not missing on any low-hanging fruit like not compressing pages etc.

dholowiski · on Jan 3, 2011

I just finished pushing out an application to Heroku. I plan on the traffic ramping up sharply along with a couple of slashdottings, HN's and reddits along the way. It gives me a warm fuzzy feeling when I realize I can just "Crank up the dynos" and go have a beer.

zrail · on Jan 3, 2011

Don't forget to crank them down when the traffic is over. There's no refunds for accidentally leaving 1000 dynos running for a few hours longer than necessary.

If/when I actually deploy something worthy of traffic I plan on incorporating something that automatically scales my app, both dynos [1] and workers [2] (up to pre-defined limits, of course)

[1]: https://github.com/ddollar/heroku-autoscale [2]: http://blog.darkhax.com/2010/07/30/auto-scale-your-resque-wo...

rahoulb · on Jan 3, 2011

Don't forget to set your headers - Heroku uses Varnish on the front end so you your app may not even be touched (and hence no extra dynos needed) in a lot of cases.

alexwestholm · on Jan 3, 2011

Why not do this stuff pre-emptively? I understand that moving content out to a CDN when it's not necessary might not be the smartest plan, but shouldn't most sites be optimizing the number of apache procs as a rule of thumb?

_4vyi · on Jan 3, 2011

tl;dr run production with swap off

raz0r · on Jan 3, 2011

Remove Apache and install a real web server.