In the last few hours, this has been silently added to the post, and I cannot po...

old-gregg · on June 29, 2012

...the standard model of threading programming that real-world programmers have been using for several decades...

What standard? And whose "real world"? The need for threads has always been controversial even among OS kernel devs. UNIX/Linux/BSDs have twisted and non-trivial threading histories peppered with religious wars similar to this one. And which "several decades" are you talking about?

There is no such thing as "standard threading model". To some, a thread is just a flavor of fork() with a wrong parameter and plenty of "real world" programmers continue to believe that kernel-level threads is a hack. And please, do not make it sound like Python threads are useless. Far from it.

Python threads are not what you are used to. That's pretty much TL;DR of your comment.

...Python's pragmatism has always appealed to me, so the ivory tower reaction to the practical concerns around the GIL really seem dissonant...

I feel like they are being dragged into it though. The original motivation behind GIL support has always been a pragmatic one: removing GIL will make the entire codebase more complex, harder to hack on and will complicate and slow down the development/maintenance of the libraries. That's pretty pragmatic.

But a fairly vocal groups of users started to claim, similarly to you, that programming with threads is supposed to work like they expect it to work according to make-believe "threading standard", to which GIL supporters (correctly, IMO) replied that shared memory + locks is not the only/best approach to concurrency. It is easy to be offended by this answer but it doesn't invalidate their point.

dmbaggett · on June 29, 2012

Python threads are not what you are used to. That's pretty much TL;DR of your comment.

Correct. I am used to programming language threads that work the way computer scientists and programmers have typically described them -- for example, as in this (I hope uncontroversial) Wikipedia article:

http://en.wikipedia.org/wiki/Thread_(computer_science)

When I say "standard model of threading", I am not talking about nuances of call conventions to the underlying OS thread primitives. I am talking simply about running multiple streams of instructions, bytecodes, or other units of computation in parallel, within a single OS process.

haberman · on June 29, 2012

That Wikipedia article defines threads in terms of operating systems. Only one small part of that article concerns how threads are exposed to programming languages.

You can't talk about running multiple streams of instructions or bytecodes in parallel without talking about the nuances of how they share memory. Semantics of a multithreaded memory model are a highly "opinionated" thing -- there are lots of possible ways to define it, and the definition can have widespread effects on efficiency, ease of programming, and the guarantees that the runtime can provide. For example, an important aspect of a Python memory model would be that no Python program can SEGV the interpreter due to a race condition.

I recommend the following reading to get an appreciation for how much really goes into a memory model and how far from "simple" or "standard" it is:

  http://en.wikipedia.org/wiki/Memory_model_(computing)
  http://en.wikipedia.org/wiki/Java_Memory_Model
  http://www.kernel.org/doc/Documentation/memory-barriers.txt

Python is a lot harder to define a good memory model for than say Java, because in Python lists and dictionaries are primitive objects. If you say:

  x['A'] = 1

...that is a single operation that must not corrupt the dictionary, even if multiple concurrent threads are mutating it. In practice, this means that you need to either make every such mutation wrapped in a lock (which adds a lot of locking overhead) or you need to use lock-free data structures (which are still relatively experimental and architecture-specific).

dmbaggett · on June 29, 2012

I agree that a so-called dynamic language like Python is at something of a disadvantage because it must make atomicity guarantees that lower-level languages like C need not.

I still don't think it's reasonable to conclude that typical programmers are fine with their threads not really running in parallel, or that the GIL isn't worth bothering to fix, even though fixing it would be hard. In my original post yesterday, I pointed out that as the language footprint has grown, Python's disadvantage in this respect has increased: it is much harder to remove the GIL now than it was in, say, the 1.5 era when there actually was a (problematic) GIL removal patch.

We've gotten way off track, but the original point I was trying to make was that 1) the GIL really is a problem for not-purely-theoretical programs written by competent developers, and 2) that the 2->3 transition, by complicating the language and increasing the workload for the alternative implementations, has made it less likely than ever that the GIL problem would be resolved.

And, indeed, Nick explicitly confirmed this by saying the GIL is basically a dead issue for the CPython devs. His post made many good points about the merits of the 2->3 transition, and in particular pointed out some ways that 3 has reduced work for the alternative implementations, but I remain unconvinced overall. And not out of ignorance or incompetence, as he implied.

haberman · on June 29, 2012

I still think your position is unreasonable, because your inherent assumption is that the GIL is a "problem" that needs a "fix." This terminology is appropriate for a situation where the status quo could be improved without giving up any of the benefits of the current implementation. But this is not the case; removing the GIL in the way you advocate would add CPU and memory overhead that everyone would pay, even in the single-threaded case. And this is to say nothing of the practical problems of maintaining compatibility with existing C extensions.

The GIL is not a bug, it's a threading model. You wish the threading model was something else. You insist on your particular vision of an alternative threading model without acknowledging its downsides. You make no indication that you have actually considered or tried the alternative concurrency models that CPython does support, like multiprocessing, greenlets, or independent processes. You make no objective arguments for why your desired threading model is better than the ones that are currently available, except that you could avoid changing your code. You accuse Python of failing to live up to some accepted standard for what a "thread" should be, when in fact no such standard exists, especially for high-level, dynamically-typed languages like Python. If anything, newer languages are moving away from shared-state concurrency; see Erlang, Go, and Rust.

I don't think you have malicious intentions, but I urge you to reflect on what you are demanding and whether it is reasonable. What may look to you like "obvious" brokenness that demands an "obvious" fix is really a lot less clear-cut than you seem to think it is. I feel for the Python developers who have to deal with this complaining all the time.

comex · on June 30, 2012

To Python-level code, Python's threading model is pretty much exactly the same as that supported in all "fast" languages such as C and Java (even Go, ever pragmatic, has locks). Given that Jython already allows true multithreading and PyPy is trying to emulate it with STM, it's reasonable to see the GIL more as an implementation bug that won't be fixed for practical reasons than as a threading model... even if Python also supports alternate threading models that are perhaps better for most applications anyway (if strictly less powerful).

haberman · on June 30, 2012

> To Python-level code, Python's threading model is pretty much exactly the same as that supported in all "fast" languages such as C and Java

Yes, but Python also exposes higher-level operations like table manipulation as language primitives.

> Given that Jython already allows true multithreading

That may be, but as I mentioned this has an inherent cost, both in CPU and in memory. Therefore it is not a strict improvement over CPython, just a different direction.

Flimm · on June 29, 2012

"They should work as advertised, using native OS threading primitives and taking advantage of the native OS thread scheduler."

Just to be clear, Python does use native OS threading primitives, and it does make use of the native OS thread scheduler. Also, Python does support, from a practical standpoint, multithreaded programs, but only if the program is not CPU-bound. I think you could rewrite that paragraph to make the role of the GIL clearer.

dmbaggett · on June 29, 2012

Fair enough;the distinction is subtle. Python uses native system threads and of course such threads are scheduled by the OS. But in practice, the GIL allows only one thread to run at a time unless the GIL is subverted via C extensions or similar. I get why this is, given Python's architecture and the legacy of the GIL (which dates to the late 90s, when multicore machines were relatively rare and expensive).

But it's still not the case that multiple threads "normally" run in parallel, with the OS ensuring fairness, which is what (I think) most programmers would expect threads to do in a general-purpose threaded language.

slurgfest · on June 29, 2012

It simply isn't true that "the GIL allows only one thread to run at a time."

"Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck."

The sky isn't falling, in the worst possible case you can still use Jython or whatever

dmbaggett · on June 29, 2012

Right, that's why I qualified my statement with ...unless the GIL is subverted via C extensions or similar.

The sky isn't falling, in the worst possible case you can still use Jython or whatever.

Sigh. I could port to C++ or whatever, too.

slurgfest · on June 29, 2012

The entire interpreter is written in C. Using facilities written in C, as documented, is not "subverting" anything. That's ridiculous hyperbole and it really doesn't help your credibility.

Flimm · on June 29, 2012

"unless the GIL is subverted via C extensions or similar."

I would hardly call I/O with Python built-in functions to be a subversion of the GIL.

cpeterso · on June 29, 2012

Python could avoid GIL problems with atomicity by implementing actor-based concurrency. Like Erlang or Rust (and unlike Go), actors could send deep copies of objects to each other using channels. Each actor would have its own GIL. To maintain compatibility, C extensions would only run on the primordial actor thread. After some testing, C extensions could opt-in to be multi-actor safe.

If Guido doesn't want to implement actor concurrency within the Python interpreter, someone could write their own Python host that allows multiple, independent instances of the Python embedded interpreter. The host would implement a C extension that allows interpreter instances to send deep copies of objects to each other.