An Inside Look at the (Python) GIL Removal Patch of Lore

justinsb · on Aug 12, 2011

I think this patch could certainly do with revisiting:

1) reference counting used atomic operations on Windows but on Linux used a full pthread mutex; if that was necessarily back when the patch was written it's definitely not necessarily today.

2) Garbage collection could eliminate the reference counting problem entirely.

3) There's been a lot of work on immutable data structures that could address the dictionary locking issue.

I do wonder if this is really a deeper issue - programming isn't nearly so simple once you have multiple threads concurrently modifying your data structures and you have to start locking etc. A lot of the appeal of Python is that it's so simple, and I think threads and locks would greatly affect that.

atombender · on Aug 12, 2011

Yeah, a mutex to protect refcounts? That's ridiculous. (If you want low-overhead locks, David Bacon's thin locks (http://www.research.ibm.com/people/d/dfb/thinlocks-publicati...) is a good place to start.)

I also question the logical premise offered here that data structures must be thread-safe. It's not something guaranteed by most languages (eg. Java), so why should Python guarantee it?

dfox · on Aug 13, 2011

"all datastructures would have to be thread-safe is pretty common argument used against usage of native threads (as opposed to green threads) or removal of global locks in various VMs. While it's true that they don't have to be thread-safe from user's point of view, they should be thread-safe in the sense that they cannot get corrupted by race condition in user code in a way that could crash (or deadlock) the VM.

koenigdavidmj · on Aug 13, 2011

Every Python object has a backing dictionary, so if any object has to be shared between threads in a threadsafe fashion, then dictionaries need to be threadsafe.

raymondh · on Aug 13, 2011

This is tricky because dictionaries need to call __hash__ which can run arbitrary Python code, so it's non-trivial to make the dictionary threadsafe.

dfox · on Aug 13, 2011

It's perfectly possible and even reasonable to call __hash__ before acquiring lock on the dictionary (and cache keys' hashes for data in the dictionary).

On the other hand, when almost every object is backed by dictionary, the locking has to be very fast, which seems to me like almost unsolvable problem.

kingkilr · on Aug 13, 2011

__eq__ as well

atombender · on Aug 14, 2011

Just make __hash__() wrap the underlying dict in a thread-safe one that delegates the calls.

justinsb · on Aug 13, 2011

Right. I think that backing dictionary could probably be replaced with a more thread-friendly implementation, even if we have to offer reduced guarantees.

swolchok · on Aug 13, 2011

Or every Python object's backing dictionary needs to use the special threadsafe implementation.

atombender · on Aug 14, 2011

That's an argument for revamping the backing dictionary, really. If __hash__() is a problem, make it return a thread-safe proxy that delegates to the object's dictionary.

lukesandberg · on Aug 13, 2011

Atomic operations (atomic ind/dec) can still cause major performance issues because depending on the architecture it can cause a lot of bus traffic and cache invalidation which can greatle increase memory pressure for previously cached operations.. a better solution would be to use a more cache friendly counter that puts multiple counter on different cache lines and lazily reconciles them so you only need to synchronize one out of n operations, the problem is that this can cause a big increase in memory usage so its only suitable for highly contended locks. The ideal situation would be for the intepreter to analyse contended locks and then switch to this approach only when neccesay. The technique is called sloppy counter and has been used in linux smp kernels.

justinsb · on Aug 13, 2011

I think garbage collection is an even better answer, because you don't pay a penalty when copying pointers.

I hope someone does a good port of Python/Jython to the Java 7 JVM - it seems the easiest way to get a good garbage collector (and JIT compiler).

MostAwesomeDude · on Aug 13, 2011

Yes, threads are pains in the ass, but many people appear to enjoy them nonetheless. That's pretty much the entire reason that languages have threading support.

justinsb · on Aug 13, 2011

I definitely agree. But the Python crowd seems very opposed to threads on many levels, and I think there's a place for a language which deliberately excludes the complexity of threading.

MostAwesomeDude · on Aug 13, 2011

Our opposition mainly comes from the idea that threading is not worth it; the gains in parallelism don't make up for the maintenance and debugging costs.

rbanffy · on Aug 13, 2011

I like the idea of exploring other models. Shared data and threads is not the only approach to parallelism.

And, usually, when you are stuck solving a hard problem, it often pays to take a step back and make sure we are failing to solve a problem you should solve. The problem we want to solve is not how to get rid of the GIL, nor improve Python performance with threads but how to use Python more effectively on multi-core/thread architectures and gain performance from that.

This is not a problem only Python has. The machines I work on most of the time (a Core i5 laptop and an Atom netbook) rarely experience loads larger than 2. There are simply not enough threads to keep them busy.

That's not to say they never get slow - they do - but I'd like to emphasize that the limiting factor here is that we are not extracting parallelism from the software already written. We stand do gain a lot from that.