Python Generators in Depth

haberman · on Feb 26, 2013

Another very nice and mind-bending presentation on coroutines, that eventually takes things so far that he creates a mini "OS" with coroutines as tasks, complete with system calls and a scheduler: http://dabeaz.com/coroutines/

manaskarekar · on Feb 26, 2013

The talk by the same David Beazley on Co-routines: http://blip.tv/pycon-us-videos-2009-2010-2011/a-curious-cour...

The talk from David Mertz on Co-routines: https://www.youtube.com/watch?v=b7R3-_ViNxk

mapleoin · on Feb 26, 2013

Wow that David Beazley presentation is annoying. He spends the first 10 minutes in the introduction, complaining about how he won't have enough time to go through his presentation.

d23 · on Feb 26, 2013

I didn't watch the video, but going through the slides I was absolutely fascinated. I've been craving something new and advanced like this out of python for a while.

webexcess · on Feb 26, 2013

Thank you, I hadn't seen that. I've added that link to the bottom of my article.

beagle3 · on Feb 27, 2013

You use "assert gen.next() is None", but asserts might be dropped by optimizations, see e.g. http://docs.python.org/release/2.5.2/ref/assert.html - and your code will break.

assert should only be used for debugging, and only for things that should never happen. side effects on assert is a no-no. (Also in C)

11001 · on Feb 26, 2013

Thank you, great write up! I'd rather see more posts like this on the front page than reviews of the latest tablets.

Xcelerate · on Feb 26, 2013

Somewhat off-topic, but how are generators different than returning an object with a closed over environment? For instance, his first example in CoffeeScript would be:

    runningAvg = ->
            count = 0
            total = 0

            send: (value) ->
                    total += value
                    count += 1
                    total/count
                
    r = runningAvg()
    console.log r.send 10
    console.log r.send 5
    console.log r.send 0
    console.log r.send 0

thejsjunky · on Feb 26, 2013

That's essentially what they are. What generators are is making this construct a first class citizen in the language which makes it much easier to manage.

Your code as is isn't bad in terms of readability/complexity...but it'd be cleaner as a generator. Then consider how much more complicated it would be if instead your generator needed to alternately return total/count and count/total (or whatever). You'd have to duplicate the two lines that increment total and count (or factor them out), have another send function, and then have code to start swapping @send to the appropriate function...or have some boolean flag to figure out which to do..etc. It gets messy if you introduce even more. Instead with generators you could write something like:

    function mySillyGenerator (c) {
        var count = c || 0;
        var total = 0;
        while (true) {
            count = yield count/total;
            count = yield total/cound
        }
    }

Generators are a way to make stuff like this simple. In JS you can roll your own iterators, but in many cases it will be cleaner just writing a generator.

csense · on Feb 28, 2013

Generators can restore you to any point in the control flow, whereas your closure-based solution (or an object-oriented solution where variables are class fields) always has to start from the top of the send() function. For simple examples, closures can present an easy alternative implementation.

But here's an example where you need to do a lot of extra work with a closure. It reads a file with '#' style comments, separates comment from non-comment strings, and keeps track of a few counters which a caller in the same file can access as global variables (technically "module-level" variables in Python).

    comments = 0
    uncommented_lines = 0
    commented_lines = 0

    def tokenize_comments(src):
        # Point A
        for line in src:
            p = line.find('#')
            if p < 0:
                yield line
                #Point B
                uncommented_lines += 1
            else:
                yield line[:p]
                #Point C
                comments += 1
                yield line[p:]
                #Point D
                commented_lines += 1
            #Point E
        return

Try to write a functionally identical implementation of this code with closures that preserves the behavior of the counters.

The API could be much improved, for example, we might want to wrap this in a class to get rid of those global variables. But that's not the point. The point is that rewriting functionally equivalent code will require you to keep track of whether the code is at Point A, B, C, or D.

I purposely included the counters in this example to make it impossible to unify different control paths, which would make a closure-based implementation a lot easier. For example, if it wasn't for the counters, points B, D, and E would be identical.

_euac · on Feb 26, 2013

Hm, that is an interesting question. In fact, you should be able to always rewrite a generator as an iterator, either by using a class like the one you propose or by using a particular language construct (what Python does with the __iter__/next methods.)

From a conceptual point of view, I think the main difference is where the perceived 'driver' of the action is. At least when I use generators I think of them as streaming building blocks for another, more complex, step being built (and driven) on top of the stream. In other words, I think from the bottom up. It allows me to forget about keeping important state, because that step is going to be taken care "above me". Let me give you a simple example.

Let's say you want to write an interpreter for a simple language. The base building block for the parser will be reading characters from a file ('input'). After that step, comes breaking that stream into tokens (say, at the first whitespace), and after that comes parsing the grammar by identifying what tokens are being returned by the tokenizer. If the tokenizer returns 'exit', then we should exit. You can think of the parser as a function composition like this:

exit_status = interpreter . tokenizer . char_reader (input)

The beauty of this is that the responsibility for each step is clearly defined by the level you are in, and 'input' could be infinite for all we care, the interpreter will still work (think streaming, persistent-connection protocols like XMPP.)

If you go the other way around and write this as an Interpreter class encapsulating a Tokenizer, which in turn encapsulates a CharReader, then each class has to keep more state (the 'input' has to be passed down) and know a lot more about each other. Interpreter needs to tell Tokenizer what to tokenize, and Tokenizer needs to ask CharReader to read.

Well, at least in my mind. Obviously it is completely subjective at this point. At least I like when I can drive all the logic or my (possibly infinite) loop from a single loop statement :)

webexcess · on Feb 27, 2013

Yes, your code is the same as the first example generator with a single yield in a loop.

The same conversion for the later examples with multiple yield statements, try: except: or try: finally: will be much hairier, though.

zeekay · on Feb 26, 2013

Not much different, really. The biggest practical difference is that Python generators implement the iterator protocol which means it's possible to easily iterate over them with for loops, etc.

enjo · on Feb 27, 2013

In that case? Not very much different at all, but that's missing the point. Think of it this way (I apologize, this is going to get obtuse):

Imagine you're a barista. You clock in and start doing barista things. You have three responsibilities. You have to take orders from customers and you have to pour coffee. When there are not customers you have to clean the shop.

The rules:

- You must take at least two orders if there is a line, but no more before you start pouring coffee. You should never have more than two outstanding orders at a time.

- If you have no customers in line and no coffee to pour, you must clean the shop.

Lets do it with generators (again pseudo code):

  function day_of_work() {
    generator take_orders() {
        orders = array()

        while(during_shift()) {
           while(customers_in_line()) {
             if(length(orders) >= 2) {
                yield orders
             } else {
               orders.push(take_customer_order())
             }
           }

           yield orders
        }

    generator clean_shop() {
        unpack_coffee()
        yield
        
        clean_espresso_machine()
        yield

        take_out_trash()
        yield
    }
 
    while(orders = take_orders.next()) {
        if(length(orders) > 0) {
          pour_coffee(orders.pop())
        } else {
          clean_shop.next()
        }
    }

There are lots of cool things going on here.

First notice that these generators not only have variable state, but they also have control flow state and THAT is why they're different. Notice that take_orders will yield control if more than two orders exist in the queue. It will also yield control if no more customers are in line. So our main program loop is really simple. It gets the current set of orders from "take_orders". Once the order queue is full it starts pouring coffee.

Notice what happens when no orders are present. Clean shop is executed once per iteration. It first unpacks the coffee, and then yields control. Now we go back and check for more orders, if we find them we fill them until we don't have any more again. Then it's back to cleaning shop, but we pick back up RIGHT WHERE WE LEFT OFF! So we just start cleaning the espresso machine... this goes on and on until the work day ends at which point take_orders cleanly exists and orders is null. Then we pack up and go home.

Now you might go: Well I can implement that in Javascript! You'd be right! Closures + callbacks make it totally possible. Lets take a stab at it:

  orders = [];
  current_task = -1;

  while(during_work_day()) {
       take_orders(
            orders,
            function(order) { pour_coffee(order); },
            function() { current_task++; clean_shop(current_task); }
       );
  }

  function take_orders(orders, orders_full_callback, no_orders_callback) {
      if(customers_in_line()) {
          if(orders.length > 2) {
              orders_full_callback(orders.pop());
          } else {
              orders.push(take_customer_order());
          }
      } else {
          no_orders_callback();
      }
  }

  function clean_shop(current_task) {
      switch(current_task) {
        case 0:
            unpack_coffee()
            break
        case 1:
            clean_espresso_machine()
            break
        case 2:
            take_out_trash()
            break
        default:
            do_nothing()
      }
  }

So that will work. Every iteration we look for customers, if the queue is full we callback to start pouring coffee. We clean the shop if nothing is going on.

Notice that our callbacks are closures. However, the callbacks do not have execution state. This means we have to manage state externally in the form of current_task.

Now for a trivial example like this, that might not be a big deal. However, imagine that we had a whole list of things to do when there weren't customers. Lets make it even harder, and have those tasks depend on state.

For instance what if we actually had a bunch of separate tasks we had to do:

- unload the truck

- draw something cool on the chalkboard

- clean the shop

What if the order those should be done in depended on state?

Before 1:00 PM the priority is:

#1 draw something cool

#2 unload the truck

#3 clean the shop

After 1:00PM the priority changes:

#1 unload the truck

#2 draw something cool

#3 clean the shop

After 3:00PM the priority changes again!

#1 clean the shop

#2 unload the truck

#3 draw something cool

Now think about how hard this is going to be in Javascript. You're going to have to maintain state for three separate jobs (where am I in that job), but also deal with making sure you're working on them in the right order.

What makes generators so damn handy is that the state is internalized. I only need to change the order I invoke the generators. As long as I keep calling next() until there aren't any items left...they'll just keep churning through whatever they're doing in the first place. They become more expressive as the complexity of the state they are managing increases. Things that are hard to do in languages that don't support them, are often pretty easy with generators. Do some work, yield, do some more work, yield... becomes a very simple pattern to reason about while not having to manage all of the state you have to do in other languages.

I hope that helps clear it up a bit:)

bjudson · on Feb 27, 2013

Great example. Thank you.

nickporter · on Feb 27, 2013

Unrelated, but I find that the technique used to blur the logo is really cool! He uses two images, one normal and the other blurred. Both images are using fixed positioning. The normal image is set as the `body` background, and the other is set as the background of the content panels (`div.bgover`). So, when you scroll over the panels, the blurred logo is layered over the normal logo, giving it that cool blurred transparency effect.

webexcess · on Feb 27, 2013

I stole that from here http://meyerweb.com/eric/css/edge/complexspiral/glassy.html

nickporter · on Feb 27, 2013

Very cool, thanks.

hadem · on Feb 27, 2013

I found the logo to be incredibly distracting. I left the site nearly immediately after I started to scroll the page.

pekk · on Feb 26, 2013

These are great. If only one could write async network code with these that wasn't vastly more complicated than the equivalent gevent code.

masklinn · on Feb 26, 2013

Tulip coroutines[0] look rather good. As with C#'s async/await you have to propagate the "asyncness" to the event loop (using `yield from` with a future or another coroutine), which is not necessary in gevent, but that should be the extent of it.

[0] http://www.python.org/dev/peps/pep-3156/#coroutines-and-the-...

pekk · on Feb 27, 2013

The implementation described in that PEP is massive. Seems pretty hard to explain...

Marazan · on Feb 27, 2013

That link to Generator Tricks for System Programmers ( http://www.dabeaz.com/generators/index.html ) has been enlightening. For the first time I've totally and fully groked why generators are good at a _practical_ rather than just theoretical level.

wookietrader · on Feb 26, 2013

If they could only be pickled...

webexcess · on Feb 26, 2013

That's a feature!

Seriously though, not being able to define a serialization method can be a limitation. However, it is always possible to turn a generator into an iterator class with whatever methods you like, but the code may end up looking totally different.

wookietrader · on Feb 26, 2013

To me it only looks like a limitation.

So yeah, you can write a custom class. But you will need to add a custom "save state" and "load state" in your class' __iter__ method. So all the benefits go away.