Long term php guy (I maintained APC for years, slowly given up now), so I've wor...

Osiris · on Nov 12, 2013

What are best practices for writing async code in PHP?

gopalv · on Nov 12, 2013

A lot of extensions expose async modes, use them.

On the extension APIs

curl-multi - http://php.net/manual/en/function.curl-multi-select.php

memcached-getdelayed - http://us2.php.net/manual/en/memcached.getdelayed.php

mysqli-reap_async - http://us2.php.net/manual/en/mysqli.reap-async-query.php

postgres-send_query - http://www.php.net/manual/en/function.pg-send-query.php

gearman doBackground - http://www.php.net/manual/en/gearmanclient.dobackground.php

Something like gearman queues basically take the asynchronous processing out of the web layer into a different daemon. There were things like S3 uploads and fb API calls which were shoved into gearman tasks instead of holding up the web page.

Some of the stuff is very design oriented, for instance in most of my memcache code, there are no mc-lock calls at all - all of them are mc-cas calls. A lot of the atomicity is done by using add/delete/cas which involve no sleep timeouts. A bit of it was done using atomic append, increment and decrement as well.

SQL queries are another place where PHP doing actual work sucks for the web apps. A bunch of the mysql/postgresql functionality within a lock is actually moved onto stored procedures, instead of being driven by PHP.

https://github.com/zynga/zperfmon/blob/master/server/schemas...

So the code above is horribly written because you can't parameterize table names or column names in PL/SQL. But that essentially cuts down the involvement PHP has with the backend's locked sections.

Also a lot of the stats data was flooded onto apache log files instead of being written out from the PHP code directly using an fwrite.

https://github.com/zynga/zperfmon/blob/master/client/zperfmo...

This uses apache_note() function in PHP to log stuff after the request is done & the connections are closed. That gets into the log files as %(<name)n fields in the access log.

You can see there that every single access log has an associated user, the HMAC of the request and peak memory usage. All collected at zero latency to the actual HTTP call.

The thing to avoid though is pcntl - it absolutely messes up all of apache/fastcgi process management code.

This is not all of what I've done. I am sorry to say some of my best work in this hasn't been open-sourced & has perhaps been killed since I left Zynga.

PHP backends I built using these methods were handling approx ~6-7 million users a day on 9 web servers (well, we kept 16 running - 8 on each UPS).

Ah, fun times indeed - too bad I didn't make any real money out of all that.