>is the fastest static HTTP/1.1 server in the world As you're the author and you...

rkeene2 · on Nov 2, 2017

This claim is based on two parts: benchmarks users have performed and the fact that nobody has called me on the claim :-)

Seriously though if you have the time to benchmark it and compare to your favorite alternative please post the results !

wtarreau · on Nov 2, 2017

Well, I didn't know it so I tried it on my laptop but sadly that doesn't reflect my observations. Injecting here with 2000 concurrent connections results in a very fluctuating load oscillating between 2000 and 31000 requests/s (column ^h/s) for a small object like mime.types :

   hits ^hits hits/s  ^h/s     bytes  kB/s  last  errs  tout htime  sdht ptime
  32871 32871  31515 31515  11997915 11503 11503     0     0 19.1 92.2 16.5
  41915  9044  20957  9450  15298975  7649  3449     0     0 23.4 128.0 30.8
  43968  2053  14651  2050  16048320  5347   748     0     0 8.3 59.1 9.8
  65994 22026  16494 22026  24087810  6020  8039     0     0 60.4 393.2 66.5
  69176  3182  13832  3182  25249240  5048  1161     0     0 70.1 393.2 109.8
  69176     0  11527     0  25249240  4207     0     0     0 0.0 0.0 0.0
  69176     0   9880     0  25249240  3606     0     0     0 0.0 0.0 0.0
  69746   570   8717   570  25457290  3181   208     0     0 941.4 1847.7 835.2
  69746     0   7748     0  25457290  2828     0     0     0 0.0 0.0 0.0
  69746     0   6973     0  25457290  2545     0     0     0 0.0 0.0 0.0

And after that it totally stops responding until I restart it. On 404 it's more around 37000. This is using 2 threads.

I looked at the code and saw a select() in use so that limits the number of concurrent connections to ~512 on recent libcs (1 fd for the socket, 1 fd for the file, 1024 total). Reducing the number of concurrent connections seemed to help a bit (it delayed the occurrence a bit).

Also it doesn't seem to support keep-alive so we can't get more performance on the client side by reducing the kernel-side work in the TCP stack.

I'm getting the same level of performance out of a single process on thttpd.

Doing the same stuff with haproxy gives me 72000 requests/s in close mode as well delivering a small object with the errorfile trick, at only 80% CPU (my load generator reached its limit), so I guess there's still room for improvement since it's possible to achieve twice the performance using 80% of a single thread on the same machine, and it reaches 88k using the cache. I don't know how other servers compare though, but I definitely think that some might get better results.

rkeene2 · on Nov 2, 2017

select() should definitely never get used -- it's only even compiled in to the executable if you enable the non-default option of "FILED_NONBLOCK_HTTP" which you almost certainly don't want. Did you compile filed in this way ? It sounds like you grep'd the source for select() but didn't actually see that it was #ifdef'd out.

The number of concurrent connections is limited by your resource limits and the fact that file descriptors are cached. You can tune the cache, but you should update your resource limits if so, this is documented here in the man page available here: http://filed.rkeene.org/fossil/home?name=Manual

It does support keep-alive and I have no idea why you think it doesn't ( http://filed.rkeene.org/fossil/artifact?ln=1130-1132%201240-... )

Using only 2 threads (way less than the default) is also very sub-optimal since most of those threads will be waiting for the kernel to deliver the file. On average filed does about 2 system calls before asking the kernel to send the file to the client -- One read() of the HTTP request, one write() of the HTTP response header, and then the sendfile() of the contents. There's no reason not to use more threads than 2 other than to limit I/O (since they do not increase CPU utilization significantly).

wtarreau · on Nov 2, 2017

Also, thanks for the link to the code, I found the issue with keep-alive (in fact there are two such issues). The first one is that you're looking for the Connection header with the Keep-Alive value, but this is needed only for HTTP/1.0. In 1.1 it's by default so you will not always get it, it will depend on the browsers, proxies etc. The second thing is that checking for headers and values this way is very unreliable as the connection header can contain other stuff, like "TE" to indicate that the TE request header field is present and must not be forwarded to next hop. In this case, the Connection header tokens are delimited by commas and you don't know either if it will match. One could argue that all these cases are not very common in field but it's the difference between being spec-compliant thus interoperable and working most of the time :-)

wtarreau · on Nov 2, 2017

So I could inject in keep-alive at 100 concurrent threads but it was very slow (2200 req/s at 12% CPU), idling on I-don't-know-what. And at 1000 threads (still 100 concurrent connections), it immediately ate all my memory and got killed by the OOM killer :

Out of memory: Kill process 7134 (filed) score 779 or sacrifice child Killed process 7134 (filed) total-vm:7680976kB, anon-rss:7140908kB, file-rss:1988kB, shmem-rss:100kB oom_reaper: reaped process 7134 (filed), now anon-rss:7143588kB, file-rss:1980kB, shmem-rss:100kB

7 GB for 100 connections is a bit excessive in my opinion :-)

There are definitely still a number of issues to be addressed before it can be used in production, you definitely need to have a more robust architecture and request parser first.

wtarreau · on Nov 2, 2017

So you need as many threads as you're delivering parallel files ? That reminds me the late 90's when everyone started to discover how easy it was to deal with blocking I/O using threads until they discovered that threads are very slow since you have to suffer from context switches all the time. As a rule of thumb, you must never run more threads than you have CPU cores, or you'll get the taste of context switches.

And it's really hard to scale using a model requiring one thread per connection. You'll hardly stand one million concurrent connections this way, and this can definitely happen for large objects.