Hacker News new | past | comments | ask | show | jobs | submit login

Python's multithreading is insanely inefficient, because of the Guido von Rossum Memorial Boat Anchor. Anything in Python can mess with the innards of anything else at any time, including stuff in other threads. (See "setattr()"). There's no such thing as thread-local data in Python. This implies locking on everything. CPython has one big lock, the infamous Global Interpreter Lock. Some other implementations have more fine-grained locks, but still spend too much time locking and unlocking things. One Python program can thus use at most one CPU, no matter how many threads it has.

This basic problem has led to a pile of workarounds. First was "multiprocessing", which is a way to call subprocesses in a reasonably convenient fashion. A subprocess has far more overhead than a thread; it has its own Python interpreter (some code may be shared, but the data isn't) and a copy of all the compiled Python code. Launching a subprocess is expensive. So it's not a good way to handle, say, 10,000 remote connections.

Now there's "asyncio", which is the descendant of "Twisted Python". That was mostly used as a way for one Python instance to service many low-traffic network connections. The new "asyncio" is apparently more general, but hammering it into the language seems to have created a mess.

After the Python 3.x debacle, which essentially forked the language, we don't need this.




> There's no such thing as thread-local data in Python.

There is. threading.local in all aspects is thread local data.

> Now there's "asyncio", which is the descendant of "Twisted Python". That was mostly used as a way for one Python instance to service many low-traffic network connections. The new "asyncio" is apparently more general, but hammering it into the language seems to have created a mess.

I think the mess was created before 3.5. Had the whole thing started out with the async keywords we might have been spared `yield from` which is a beast in itself and a lot of the hacky machinery for legacy coroutines. I do think however we can still undo that damage.


threading.local in all aspects is thread local data.

You can still pass data attached to threading.local to another thread. Another thread may be able to get at threading.local data with setattr(). There's no isolation, so all the locking is still needed.

This is a hard problem. There's real thread-local data in C and C++, but it's not safe. If you pass a pointer to something on the stack to another thread, the address is invalid and the thread will probably crash trying to access it. C++ tries to prevent you from creating a reference to the stack, but the protection isn't airtight. In Rust, the compiler knows what's thread-local, as a consequence of the ownership system. Go kind of punts; data can be shared between coroutines, but the memory allocation system is mostly thread-safe. Mostly. Go's dicts are not thread-safe, and there's an exploit involving slice descriptor race conditions.


> You can still pass data attached to threading.local to another thread.

You can in most languages. Only rust I know has enough information to prevent that.


On what are you basing your statement that multithreading in Python is insanely inefficient? Despite the fact that the GIL prevents multiple threads from running in parallel, using multiple threads can give you a huge boost if IO is your bottleneck.

I think that it's irresponsible to make a blanket statement like this, because there are many use-cases for multiple threads in Python. Sure, one of the obvious ones (parallel processing) doesn't work, but besides that threads can be extremely useful.

I'm also unclear on what you mean by "no such thing as thread-local data" when there is `threading.local()` that does exactly that.

Lastly, I don't think multiprocessing was created as a workaround for threading per-se. Rather it was a workaround for the global interpreter lock.


And in either case my view is concurrency is best done in a language where there is proper support for it, whether the model is threaded or processor based concurrency.


Not that I disagree with (m)any of the points you raise, but with due respect I think it's a bit off-topic from my comment.

Were it my choice, any time the need for concurrency comes up at my job I'd prefer to use a statically typed, compiled langauge like C++ or Java (or, once I've familiarized myself with Rust's implementations, that language), and this kind of discussion wouldn't even come up. I like python as a rapid-prototyping language. For the kinds of numerical computations and data-laden I/O bound work I do I find it sorely lacking, and consider it an unfortunate choice for production work.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: