Serving small static files: which server to use?

unshift · on June 8, 2011

i don't really see the point of these micro-benchmarking articles at all.

so what if nginx can serve a theoretically higher number of static files/second more than something else. are you actually serving that much traffic with no headroom in terms of extra servers and load balancers? do microseconds of computational time per request really matter when your outbound packets can get delayed by milliseconds in the network or dropped completely?

there are plenty of reasons to like one server over another, but is .0000000001 seconds/request overhead really one of them? http servers can have wildly different behaviors regarding HTTP streaming, working models, extensions, etc. how about the fact that varnish is a caching proxy that doesn't really replace something like nginx, lighttpd, apache?

he's also backing varnish with a ramdisk that takes 25% of his memory (for a 100b file, no less!) when comparing it to the others. probably not the best designed test out there.

> Again, keep in mind that this benchmark compares only the servers locally (no networking is involved), and therefore the results might be misleading.

i don't know why anyone would publish "misleading" benchmarks

i know it's less fun and there are no numbers involved, but what about a real rundown of some of the subtle differences between the servers and some of their more unique features (besides async/threaded)? that's something i would find useful reading, but i guess it's not as easy as firing up ab.

drtse4 · on June 8, 2011

And in the end, the only meaningful benchmark for you is the one performed on your deployment environment and with your application. These generic tests (if the conditions in which they were performed are clear) can only be used to skim through all the available options to identify those who clearly under-perform (agreed, identify why could be even more useful than the test itself).

unshift · on June 8, 2011

they're not even worth skimming.

a while back when tornado (for python) came out, there was a whole slew of benchmarks comparing it to twisted. as a guy who uses twisted a lot, i was interested. all the benchmarks said tornado was faster by maybe a hundred or a couple hundred requests/sec and therefore was the superior framework.

the big question is, what difference does it make? not a single article i read mentioned the fact that twisted has a really awesome streaming API, has great TCP and UDP level support, has support for a ton of other protocols (and writing your own), or is insanely useful for non-web projects. i never read about a single feature tornado had either or why one is worth investing time in than another.

same thing for this "benchmark". i might as well write off varnish, since it serves fewer requests/sec than nginx, right? wrong! it's a different thing all together -- no mention of that anywhere in the article though. the author just says (in not so many words) it's a piece of crap compared to nginx.

qjz · on June 8, 2011

If you're trying to identify and eliminate bottlenecks, benchmarks like this are tremendously helpful. If the theoretical limit of a component exceeds the practical limit of other resources, I know I can look elsewhere to improve performance.

0xbadcafebee · on June 8, 2011

your non-webserver overhead is your non-webserver overhead. it'll be the same no matter what webserver you use, so it doesn't matter what the network is like.

the whole point of micro-benchmarks is to show one single use case and which option comes out the fastest. now if you ever need to serve small static files really god damn fast you know which web server to use.

unshift · on June 8, 2011

they're all very fast. what do i do with the information presented by the article? what have i learned that will help me make better decisions about what to use for a given project? nothing, really.

yid · on June 8, 2011

Maybe you should read the title before clicking the link then?

unshift · on June 8, 2011

according to the title and article i should be using G-WAN for my static files, which i've never heard of before. i know nothing about it other than it apparently serves more files per second than nginx, lighttpd, and apache. but, i should be using it.

do you really recommend operating like that?

drtse4 · on June 8, 2011

Old post, as a side note, i've performed some tests with nginx and his configuration 1-2 weeks ago on linode and the results on the smallest linode were nearly 10-15% less than what the author report in his post (quite good imo).

If someone with a less optimized configuration is wondering what in his test configuration allows him to obtain those results, here is a brief recap:

1- Tests performed with ab with keepalive enabled on both the client and server

2- open_file_cache or similar options: this enable file caching, so basically the server is no more i/o bound

3- Furthermore, enabling tcp nodelay (that disables nagle alg, usefull when we have small tcp responses) and disabling access logging (this depends on how logging is implemented, if non-blocking and on a separate thread (not a worker) disabling it doesn't improve the results) could help a bit.

Being a cpu-bound test, having the client on a separate machine would have likely increased the results but i doubt it would have changed the performance ratio among them, after all in every test we had the same client with the same overhead.

ltbarcly3 · on June 8, 2011

I don't know what open_file_cache does specifically, but if you have enough memory to cache the file, then linux would have enough memory to cache it as well? In that case, aren't you really overcoming being IO bound, but rather avoiding the file opening/closing overhead?

drtse4 · on June 8, 2011

The open_file_cache_* options in nginx (but other server have similar options) allows to cache in memory the file so the disk is used only when the cached value is no more valid (timed to 30s lifetime on the linked tests). After a quick refresh of how the buffer cache works, i'd say that you are correct and that with the option above the server is also using a simpler data structure to retrieve those cached pages (no blocks list to traverse, no access locks). The difference in performance with and without this option was huge in a quick test, enabling the server caching i saw the number of req/s increase 400%-500%.

robtoo · on June 8, 2011

previously: http://news.ycombinator.com/item?id=2629631

Also, "The client as well as the web server tested are hosted on the same computer", which is pretty poor design, to be honest.

palish · on June 8, 2011

"Doing a correct benchmark is clearly not an easy task. There are many walls (TCP/IP stack, OS settings, the client itself, …) that may corrupt the results, and there is always the risk to compare apples with oranges (e.g. benchmarking the TCP/IP stack instead of the server itself)."

By hosting the web server and the client on the same computer, he is testing one aspect in isolation of others. This is a good thing, and is generally the way in which scientific tests advance knowledge.

robtoo · on June 8, 2011

Except he isn't just testing one aspect in isolation of others. He's actually introducing an entirely new aspect (the client) which is a substantial load and won't be there in production.

0xbadcafebee · on June 8, 2011

the extra load incurred is the same across each test (same client, same args, same tuning) thus the extra load is the same across all tests, so the results are still valid, just not the highest performance possible.

robtoo · on June 8, 2011

Complex systems have complex interactions. You can't just just hand-wave this away by claiming that the interactions will be identical for all cases without actually demonstrating it.

0xbadcafebee · on June 8, 2011

Ok, so as the client processes more requests with the server the resource use increases, so in theory the higher the benchmark numbers the faster the server would actually respond without the extra load of the client. So (in theory) the server with the highest performance actually performs better than perceived (assuming that the tester is hitting resource bottlenecks somewhere on his server during the test, which isn't shown).

Luckily this benchmark is incredibly simple. It's not a complex system as the test is using a single set of data with two pieces of software in a single contained environment; the only thing that changes is one piece of software and one configuration: the server. Separate the server/client and your test is still the same, only with extra resources for the server and client to take advantage of (and less network bandwidth and higher latency). Knowing how http clients work, and knowing how http servers work, is it possible that the client or server could be utilizing resources in such a different way after being separated as to skew the results in a significant way?

I don't believe so. Even if you saturated a 1Gbps network link, you will see differences in CPU time between processing of requests and differences in memory use, and unless they are all fast enough to saturate that link you will see some servers process more requests than others. If you want to verify this you can follow the benchmark's set-up and try on two separate machines and let us know if there's a significant difference.

bxr · on June 8, 2011

>he is testing one aspect in isolation of others. This is a good thing, and is generally the way in which scientific tests advance knowledge.

Yes, maintaining constants and only altering one variable. But this test has an 2nd variable, that is altered into a special state for all of the experimentation, and then we make the leap into assuming that these results are anything other than pretty to look at when the 2nd variable is in any other position.

This is a good test for seeing which server runs best with a load testing client competing for resources on the box and using the loopback network. Thats it.

The problem is you can't test without a network stack at all, you can only test with a different network stack. Whats to say the particulars of the loopback network aren't causing more issues to be introduced to the validity of the results than the full TCP/IP stack you're trying to avoid influencing the results? Nothing, that's what.

palish · on June 8, 2011

I'm happy people have begun to see that scientific tests generally have massive assumptions hidden deep within them, or quietly brushed underneath them.

Personally, I find this to be a welcome change from the typical "oooh-look-at-the-shiny-graphs" mentality that has become so pervasive on HN with regards to performance testing.

Keep up this spirit of questioning, and you'll discover a lot of interesting secrets. (For example, a placebo is 87% as effective as the 5 major antidepressant brands. Also, the CIE 1931 "color space" is a flawed system which is only roughly accurate.)

In general, anyone who presents statistics (of anything) should be subjected to a lot of skepticism. Scientific knowledge can only advance by truly testing one variable and only one variable.

timc3 · on June 8, 2011

Yes, a complete waste of time.

skbohra123 · on June 8, 2011

why it is making to the front page again and again. seems like a foul play.

joss82 · on June 8, 2011

Look at those memory usage graphs, does this means those servers are leaking memory?

With the notable exception of nginx, of course.

Luyt · on June 8, 2011

He writes:

"Regarding the resources used by each server, Nginx is the winner in term of memory usage, as the amount of memory does not increases with the number of concurrent clients."

So I guess the memory consumption is caused by the number of concurrent clients, and not by a memory leak.

joss82 · on June 8, 2011

You are right. Then it would be interesting to see how much of the memory is freed after all the clients disconnected.

ltbarcly3 · on June 8, 2011

I think benchmarks like this are very harmful. How many small static files you can serve per second is just one (not very important) criteria when choosing one of these servers.

I think more important criteria are:

1. Stability. How often are you woken up in the middle of the night because your web server is shitting the bed.

2. Configuration. Can you configure it to do all the things you will need it to do? Have others who have come before you been happy with it throughout the entire life of their product, or have they outgrown it?

3. Simplicity. Can you set it up to run efficiently without weeks of study on how this server is properly deployed? Is it easy to mess up the configuration and take your site down when making a change?

4. Generality. Are you going to need something else to sit in front of your dynamic pages, if you require them? This is also a factor in stability, if you have 2 server solutions, all else being held constant, that is twice as likely to break down or get broken during a configuration change as just one. Actually, it is much more than twice as likely, since you are spreading your competency to learn the ins and outs of 2 pieces of software, so you are less capable on each than you would have been if you just had one server solution to worry about.

So, given all this, my advice to anyone trying to make an initial decision on what webserver to use is: (Apache|nginx) (pick one only) should be your default until you believe you have a compelling reason to use something else. Both are capable of doing more or less everything you need, have lots of extensions, are widely used, and have comprehensible configuration. Once you have mastered whatever one you use, you will be able to tune it, debug performance problems, and spend the minimum possible amount of time doing server configuration and testing, and maximum time implementing features and supporting customers.

qjz · on June 8, 2011

I've tested and admired the performance of G-WAN, but the closed-source nature of the project may be a bit of a showstopper for some. Development appears to be narrowed to Debian derivatives, making successful installation of the binary on other Linux/UNIX platforms challenging. It would be nice to be able to inspect and modify the source in order to optimize and compile it for the desired platform.

mseebach · on June 8, 2011

These results are valid if your static content all fit in memory. I would expect interesting diversions in performance if a certain proportion of requests would have to hit the file system.

Also an interesting piece of noise missing is slow clients holding on to the connection. If you're serving up multi-megabyte files, I would guess this could become a major factor.

0xbadcafebee · on June 8, 2011

part of why you use caching servers or proxies is to avoid hitting the filesystem, since it's very costly compared to memory (or even network to an extent). the ramdisk in this case is performing the function of the caching server but at much increased speed and much reduced overhead.

afaik the only way you could get over a couple thousand RPS while reading from the filesystem is due to inherent caching of the VFS and disk buffers by the operating system.

and in an asynchronous web server, connections should not be held up by slow clients. if they use a model where one thread or process handles each client connection you could definitely get starvation of resources or connections, but asynchronous models should just process stuff as it comes in and not "wait" on a slow client.

mckoss · on June 8, 2011

Off topic, but WordPress has made blogs unreadable for iPad users. I can't even scroll through this article without the screen jumping erratically past many pages of content.