Long term php guy (I maintained APC for years, slowly given up now), so I've worked a lot with ~2k/3k request-per-second PHP websites.
The real trick here is async processing. A lot of the slow bits of PHP code is people not writing async data patterns.
If you use synchronous calls in PHP - mc::get or mysql or curl calls, then PHP absolutely sucks in performance.
Nodejs automatically trains you around this with a massive use of callbacks for everything. That is the canonical way to do things - while in PHP blocking single-threaded calls is what everyone uses.
The most satifying way to actually get PHP to perform well is to use async PHP with a Future result implementation. To be able to do a get() on a future result was the only sane way to mix async data flows with PHP.
For instance, I had a curl implementation which fetched multiple http requests in parallel and essentially lets the UI wait for each webservices call at the html block where it was needed.
There was a similar Memcache async implementation, particularly for the cache writebacks (memcache NOREPLY). Memcache multi-get calls to batch together key fetches and so on.
The real issue is that this is engineering work on top of the language instead of being built into the "one true way".
So often, I would have to dig in and rewrite massive chunks of PHP code to hide latencies and get near the absolute packet limits of the machines - getting closer to the ~3500 to 4000 requests per-second on a 16 core machine (sigh, all of that might be dead & bit-rotting now).
Something like gearman queues basically take the asynchronous processing out of the web layer into a different daemon. There were things like S3 uploads and fb API calls which were shoved into gearman tasks instead of holding up the web page.
Some of the stuff is very design oriented, for instance in most of my memcache code, there are no mc-lock calls at all - all of them are mc-cas calls. A lot of the atomicity is done by using add/delete/cas which involve no sleep timeouts. A bit of it was done using atomic append, increment and decrement as well.
SQL queries are another place where PHP doing actual work sucks for the web apps. A bunch of the mysql/postgresql functionality within a lock is actually moved onto stored procedures, instead of being driven by PHP.
So the code above is horribly written because you can't parameterize table names or column names in PL/SQL. But that essentially cuts down the involvement PHP has with the backend's locked sections.
Also a lot of the stats data was flooded onto apache log files instead of being written out from the PHP code directly using an fwrite.
This uses apache_note() function in PHP to log stuff after the request is done & the connections are closed. That gets into the log files as %(<name)n fields in the access log.
You can see there that every single access log has an associated user, the HMAC of the request and peak memory usage. All collected at zero latency to the actual HTTP call.
The thing to avoid though is pcntl - it absolutely messes up all of apache/fastcgi process management code.
This is not all of what I've done. I am sorry to say some of my best work in this hasn't been open-sourced & has perhaps been killed since I left Zynga.
PHP backends I built using these methods were handling approx ~6-7 million users a day on 9 web servers (well, we kept 16 running - 8 on each UPS).
Ah, fun times indeed - too bad I didn't make any real money out of all that.
The real trick here is async processing. A lot of the slow bits of PHP code is people not writing async data patterns.
If you use synchronous calls in PHP - mc::get or mysql or curl calls, then PHP absolutely sucks in performance.
Nodejs automatically trains you around this with a massive use of callbacks for everything. That is the canonical way to do things - while in PHP blocking single-threaded calls is what everyone uses.
The most satifying way to actually get PHP to perform well is to use async PHP with a Future result implementation. To be able to do a get() on a future result was the only sane way to mix async data flows with PHP.
For instance, I had a curl implementation which fetched multiple http requests in parallel and essentially lets the UI wait for each webservices call at the html block where it was needed.
https://github.com/zynga/zperfmon/blob/master/server/web_ui/...
There was a similar Memcache async implementation, particularly for the cache writebacks (memcache NOREPLY). Memcache multi-get calls to batch together key fetches and so on.
The real issue is that this is engineering work on top of the language instead of being built into the "one true way".
So often, I would have to dig in and rewrite massive chunks of PHP code to hide latencies and get near the absolute packet limits of the machines - getting closer to the ~3500 to 4000 requests per-second on a 16 core machine (sigh, all of that might be dead & bit-rotting now).