Threads work pretty darn well, actually. Depending on the libraries you use the ...

mrshoe · on April 4, 2011

I guess that depends on whether you define "high concurrency" as ~50 or more like ~5000.

I'm guessing your "half-assed" comment is in reference to some of the sames things David Beazley mentioned in his PyCon talk, but it's worth linking to anyway: http://www.dabeaz.com/GIL/

Basically threads work pretty darn well until you throw some real load at them.

ianb · on April 4, 2011

Well, practice shows threads work pretty darn well with real load, as there are a huge number of running examples of high-load threaded Python applications. There is the potential of failure due to race conditions, but IMHO it's not as bad as it's often made out to be.

For high concurrency, there is of course the question of what level in the stack you want to handle the concurrency at. There's not much use to doing 5000 things at the same time. But you might want to handle 5000 things at the same time (say, open sockets). The question is how much you want to twist your code around to do this -- with threads you might have worker pools, and you might have to be sure your chunks of work are of reasonable size (and often it's tricky to do some things incrementally). With cooperative concurrency... well, you do all the same things ;) But sometimes when threads are enough you don't have to contort your code, and in some error cases threads will help a lot (like you happen to not partition your work sufficiently).

I really just want to get rid of those tweaky code things for all cases, so that it's not a matter of these different choices with different tradeoffs, but just one really good choice that applies much more widely.

mattbillenstein · on April 5, 2011

Threads for python webapps are fine, but they break down when doing things like chat (using long polling or websockets) where you have a lot of open mostly idle connections. Coroutines and libraries like gevent and eventlet let you elegantly use greenlets without contorting your code -- ie, you get async without callbacks or deferreds -- and you have the benefit of being able to handle thousands of connections in a single python process. Of course you don't get processor concurrency, but you don't really get that with straight CPython because of the GIL anyway -- and if you're deploying a largish web app, you're going to have a load balancer (haproxy ftw) and you can just run one process per core anyway. This also mostly transparently lets you scale out to N boxes with M cores per box with very little architecture changes at the inbound request to web-server level.

Which brings me to the microprocess / shared module ideas -- RAM is pretty cheap -- 1GB or 2GB per core isn't anything fancy, and if you're using more than a couple gig in a python webapp process, something is probably wrong. As far as sharing state between microprocesses, I don't see how that really solves anything -- you're going to have processes on different boxes if you have any sort of traffic or fault tolerance and you're going to be putting that shared state into a cache or database somewhere anyway.

Raw compute speed is important to the scipy/numpy crowds, so I think things like Cython and PyPy make a lot of sense there -- webapps are I/O bound, so you spend 90% or more of the time it takes to service a request waiting, which async is really good at. PyPy isn't going to beat CPython by 20x on some django benchmark, but they will on some compute-intensive ones.

So anyway, I think my point is, you can have your cake and eat it too -- coroutines without pain (gevent) and processor-level concurrency (load balancer) for web applications using off the shelf production-ready technology on commodity hardware...