Nginx sucks at SSL, here's what to do instead

randomstring · on July 11, 2011

Without any reference to the nginx options

  ssl_session_cache
  keepalive_timeout
  keepalice_requests

and whether or not the other webservers mentioned support such options and if they are enabled in the tests. There is no way to know if this is alarming or just FUD. Also missing from this comparison is a vanilla Apache SSL benchmark (which will suck, but would serve as a reference point).

Given nginx's track record, I prepared to give nginx the benefit of the doubt and assume that it's a invalid test.

That said, I'm going to run some benchmarks of my own.

wumpus · on July 11, 2011

Mr. randomstring, your test results before blekko launched were much faster than this guy's number. Want me to quote from your email?

pilif · on July 11, 2011

If I see such bad numbers, usually, before I write a post about it, I try to investigate and share the results of that with the post.

I'll try to reproduce these results tomorrow, but if I had to guess, I'd say ssl_session_cache was left to its default (off) which means that every connection has to do the expensive SSL handshake.

yid · on July 11, 2011

Sigh. Sometimes posts like these make me want to go back to academia (it's not perfect, but they generally believe in this little thing called rigor).

From TFA:

I tested nginx as a proxy, serving static files, and serving nginx-generated redirects. I tried changing all the relevant ssl parameters I could find. All setups resulted in the same SSL performance from nginx. I even tried the setup on more than one server (the other server was quad-core nginx got up to 75 requests per second).

So "all the relevant ssl parameters I could find", no details about what those involve, and the surprising result that it made no difference.

In the same situation, I might think I was doing something wrong...

And then this overarching statement:

Never let nginx listen for SSL connections itself.

seiji · on July 11, 2011

Rigor is an interesting point. Do we prefer to have a flawed, but slightly useful, post now - or - do we prefer to wait a month or two for a squeaky clean post with all issues worked out?

whirlycott1 · on July 12, 2011

In two years, people will still be saying "nginx + ssl = bad" because of this post even though the problem may well be fully addressed. Google will continue to surface this article even though it may be totally wrong at some future date. That sucks.

vog · on July 12, 2011

If it was really that easy to spread this questionable message for years, it would be as easy to spread other articles as well.

So it wouldn't need more than a few articles, like "nginx + ssl = works like a charm", or "nginx has better SSL support than Apache". It would not matter whether those were actually correct, just a single article of questionable quality would be sufficient.

vog · on July 13, 2011

... and here it is:

"nginx does not suck at ssl"

http://news.ycombinator.com/item?id=2759596

vog · on July 11, 2011

Why not doing both? First a small article about the surprising phenomenon, which announces a more thorough analysis next week.

That way, it is possible to get some initial feedback and maybe even some good hints that help speeding up the analysis. In the best case, the announced analysis could become a collaboration by multiple authors.

seiji · on July 11, 2011

As Randy says: "The best way to get tech support online isn't to ask for help -- just visit a chat room and declare 'Linux sucks! It can't do X!'"

btilly · on July 11, 2011

That reminds me of the old trick of asking a reasonable question. Then getting a friend to give a wrong answer to that. The real answer is likely to be somewhere in the flood of corrections that you see.

eli · on July 12, 2011

If the config files had been posted, the reader could work out the issues on their own and everybody benefits.

va_coder · on July 12, 2011

I agree. What if we all waited until 2008 for an academic to publish a 30 page paper about Ruby and Python and how they can be useful for building web apps?

vacri · on July 11, 2011

Academics share and discuss findings in casual terms before formal publishing all the time. Two academics meeting over coffee aren't going to demand the rigor you get with a published article.

_delirium · on July 11, 2011

That's true, but the "damage" tends to be limited, because it's shared in person with a handful of people who understand the preliminary nature of the results--- not potentially tens of thousands of people clicking on the front page of Hacker News, who see a very definitive-sounding statement ("Nginx sucks at SSL").

I do think academics overcorrect on this, and should share more early results, possibly via things like blog posts (this is slowly starting to happen). But erring in the opposite direction is also quite common among tech bloggers. In particular, if you're going to publish anything that looks vaguely like a benchmark, it might be worth taking at least a few days to check out possible problems before sending it out into the world (not months or anything, but a few days).

palish · on July 11, 2011

But... that would take work! And he only works four hours a week. </tonguecheek>

antihero · on July 11, 2011

IIRC I get pretty awful AB performance with ssl_session_cache on or off.

That said, whilst RPS is low, whilst I'm hammering it with AB it seems to have little problem being responsive in my browser :S

msumpter · on July 11, 2011

Try adding the -k parameter to ab to use keep alive requests and see if you notice an uptick. If you a generating a new session on each request it doesn't matter if you used the SSL session caching or not in Nginx.

SaltwaterC · on July 11, 2011

Most of the servers suck at SSL without some sort of caching. Also, properly configured nginx setups have a fair share of keep alives.

mmaunder · on July 12, 2011

Thanks! I didn't know nginx sucked at SSL. You may have increased our revenue. Many businesses like us have our conversion pages on SSL. Our front-end server is doing 2000 to 4000 http requests per second and we get over 3 million uniques on the main site where we sell stuff via SSL. If SSL is this slow, it probably impacts performance on our secure pages which affects revenue. Where do I send the beer?

On a 4 core Xeon E5410 using ab -c 50 -n 5000 with 64 bit ubuntu 10.10 and kernel 2.6.35 I get:

For a 43 byte transparent gif image on regular HTTP:

Requests per second: 11703.19 [#/sec] (mean)

Same file via HTTPS with various ssl_session_cache params set:

ssl_session_cache shared:SSL:10m; Requests per second: 180.13 [#/sec] (mean)

ssl_session_cache builtin:1000 shared:SSL:10m; Requests per second: 183.53 [#/sec] (mean)

ssl_session_cache builtin:1000; Requests per second: 182.63 [#/sec] (mean)

No ssl_session_cache: Requests per second: 184.67 [#/sec] (mean)

The cache probably has no effect because each 'ab' request is a new visitor. But I'd guess the first https pageview for any visitor is the most critical pageview of most funnels.

newman314 · on July 12, 2011

Choice of cipher as well as openssl version + features used make a difference too. See http://zombe.es/post/5183420528/accelerated-ssl for some examples.

Use "openssl speed -elapsed" to test performance on your system.

stock_toaster · on July 11, 2011

The post didn't mention if ssl_session_cache was enabled in the nginx config or not. In fact, I didn't see any configs posted. :(

Also, the article author apparently added support to stud (in his own fork) for x-forward-for. I don't think this is required any longer, due to this fairly recent stud commit: https://github.com/bumptech/stud/commit/9d9b52b7d3ce90fa84c6...

seiji · on July 11, 2011

I hear the same performance happens with our without the session cache enabled (for benchmarks). The http(s) benchmarking tools don't resume sessions. It's simulating a horde of new clients who never come back or request other resources.

It would be interesting to see stud with a session cache too.

jamwt · on July 11, 2011

I'm reading over Matt's work carefully, but my initial inclination is not to merge the bulk of this into stud mainline. I'd rather keep stud simple and protocol-naive and have HAProxy do the HTTP work.

jamwt · on July 11, 2011

Which is to say indirectly that I think the right answer is for nginx (and daemons generally) to support the PROXY protocol, or some other agreed-upon standard for a naive upstream proxy to indicate host/port information.

sciurus · on July 12, 2011

Could someone link to a description of the PROXY protocol?

stock_toaster · on July 20, 2011

http://haproxy.1wt.eu/download/1.5/doc/proxy-protocol.txt

stock_toaster · on July 11, 2011

I really agree with this. I think keeping stud as simple as possible is a great goal.

seiji · on July 11, 2011

I'm reading over Matt's work carefully

Thanks for the consideration!

initial inclination is not to merge the bulk of this into stud mainline

I agree. The HTTP stuff is still too integrated. ifdefs are ugly.

The solution is to do what showed up when I was 99% done working on XFF -- the nice PROXY protocol addition. We just need to get PROXY support into nginx now to obviate my XFF machinations.

tptacek · on July 11, 2011

I don't know what the limits are in the nginx HTTP parser Matt's using, so this is probably moot, but code that does things like "realloc(ptr, size + newsz)" or "malloc(size + 1)" expecting things to be fine gives me the howling fantods.

seiji · on July 11, 2011

I don't know what the limits are in the nginx HTTP parser

You're correct in assuming the library enforces its own size limitations. It operates on length of received SSL data which is capped by the static receive buffer at 32k. Nice and tiny.

(Also, you are, of course, painfully correct about lack of bounds checking and lack of return value checking on the malloc/realloc calls. If I ever graduate the branch to production status, the six malloc calls and three realloc calls will be wrapped in proper checks.)

tptacek · on July 11, 2011

For what it's worth, don't check the retval of malloc/realloc/strdup; instead, rig them so they blow up if the allocation fails.

oconnore · on July 12, 2011

Why? Can't you just bail on whatever you are currently doing? How is the entire process compromised by a failed malloc? Resources are limited, sort of by definition, shouldn't good code be able to handle this possibility?

tptacek · on July 12, 2011

If malloc is rigged to explode when it fails, you can't accidentally forget to check; sometimes, malloc failures can end up being exploitable. It's not like most code does anything particularly smart when memory runs out.

neilc · on July 12, 2011

An exception to this is library code, which should typically check for malloc failures and return an error to the client code.

EdiX · on July 12, 2011

Handling an out of memory error in any way other than terminating the entire process is very very hard, because the effects of memory exhaustion are felt by all your threads at the same time usually preceded by massive slowdown due to swapping (which will cause other symptoms if there is any real time constraints put on the process).

I'm not saying there are no cases where recovering from allocation errors would be possible but it's not the general case. It's usually easier to treat any allocation error as a fatal error and insure your programs so they don't run out of memory through other means.

__rkaup__ · on July 11, 2011

I'm quite new to C programming. What's wrong with doing those things?

tptacek · on July 11, 2011

Numbers can't grow indefinitely; they wrap (usually at the size of the register).

_3u10 · on July 11, 2011

"malloc(size+1)" is a sign you may have one off errors in your code. If you need to store a string of size s you need s+1 bytes allocated. The plus one is for null termination. If you want an array of t you can either pass the size around with the array or null terminate the array like strings, but then you can't store any null values in the array.

Also, there's no bounds checking on size so in certain conditions such as a 2GB/4GB allocation you may allocate zero bytes or -2GB bytes.

nginxorg · on July 12, 2011

It would be nice it Matt provided full details on the testbed, including the client. In a test scenario it is very important to understand what gets tested in the end. I liked that "Russia" tag too :)

wumpus · on July 11, 2011

For a different point of view, check out:

http://web.archive.org/web/20090619214443/http://www.o3magaz...

The author benchmarks nginx at 26,590 TPS on a quad-core 2.5 ghz amd system.

mleonhard · on July 12, 2011

I suspect that they are re-using connections in that benchmark. SSL connection setup is CPU intensive. Once a session is set up, an SSL connection uses only slightly more CPU time than an unencrypted connection.

spydum · on July 11, 2011

That benchmark is using an SSL Accelerator card and nginx. Can honestly say I have seen very few people swing for SSL Accelerator cards.

wumpus · on July 12, 2011

The "open source SSL Accelerator" mentioned in the blog posting is a quad-core server running Linux and nginx.

spydum · on July 12, 2011

whoops! my apologies -- the capitalized "SSL Accelerator" set my mind into thinking dedicated hardware device.

cshesse · on July 11, 2011

A friend of mine worked on some large scale SSL deployment, he wrote up the results of his tests here: http://zombe.es/post/5183420528/accelerated-ssl

He's concerned about the raw speed of the SSL calculations, not requests per second, but if you're actually concerned about SSL speed and you have enough requests per second to justify optimizing SSL speed, it could be pretty useful.

piotrSikora · on July 11, 2011

No configs, no methodology, no graphs... Great "benchmark".

wpietri · on July 11, 2011

If only there were some way you could create those things that you want and he didn't care about doing!

damncabbage · on July 11, 2011

Calling out a tool as having some terribly negative characteristic places the burden of proof on he or she doing the calling.

auxbuss · on July 11, 2011

I needed to use SSL on nginx and got great results from following a number of pieces of advice. I jotted down my noted here: http://auxbuss.com/blog/posts/2011_06_28_ssl_session_caching...

It made a significant performance difference to me.

fexl · on July 11, 2011

I've seen good results with the "pound" front-end:

http://www.apsis.ch/pound/

I haven't done extensive benchmarking on it, but very knowledgeable people vouch for it.

getsat · on July 12, 2011

Pound consumes an absurd amount of memory if you have lots of concurrent connections (due to thread stacks).

fexl · on July 12, 2011

Interesting, thanks -- I'll watch for that. Until now I haven't paid much attention to it since I'm not responsible for that part of the configuration.

ominous_prime · on July 12, 2011

I'm not sure why he would be getting numbers that low. The only setup I have at the moment which would give useful numbers for ssl req/sec is a small single core VM, running one nginx worker process, and that pumps out 135 new req/sec. Add a few cores, workers, put it on real hardware, and I don't see how this couldn't push well over 400 req/sec.

This is using nginx strictly as an ssl termination, where I need to do some header manipulation that I couldn't do in stunnel/stud.

ominous_prime · on July 12, 2011

Self reply, since I waited too long to edit:

I remembered I had an older 8 core server sitting unused at the moment. I configured nginx with 8 workers, and ran `ab` against it. From a single (VM) host, I can get 680 connections per second (maxed the cpu on the host running the test). From 4 hosts, each host got > 290 connections per sec, so I got nginx up to over 1190 new connections per second, and can likely push it further.

[EDIT] got it to peak at 1535 requests per second with 4 hosts testing.

sigil · on July 11, 2011

Where's the benchmark code? I'd like to see how ucspi-ssl [1] performs.

[1] http://www.superscript.com/ucspi-ssl/sslserver.html

foobarbazetc · on July 13, 2011

Not to rain on the parade here, but we handle several thousand connections per second on nginx + SSL per 8 core Westmere machine.

The article needs way more detail.

ck2 · on July 12, 2011

If anyone does benchmarking, please include litespeed as I am curious. I suspect it's much faster than nginx at ssl. Even with the connection limit on the free version I suspect it will still be feasible for testing.

nginxorg · on July 14, 2011

more nginx ssl testing http://nginx.org/pipermail/nginx/2011-July/027960.html

schiptsov · on July 12, 2011

Let me guess - it is not nginx code which is slow but one from openssl? ^_^

schiptsov · on July 14, 2011

I won - http://matt.io/technobabble/hivemind_devops_alert:_nginx_doe...

ams6110 · on July 12, 2011

Quoting:

    (on an 8 core server...)
    haproxy direct: 6,000 requests per second
    stunnel -> haproxy: 430 requests per second
    nginx (ssl) -> haproxy: 90 requests per second

Yet Matt Cutts tells us that SSL is not computationally expensive anymore. Based on these results it's still an order of magnitude slower.