Co-routines as an alternative to state machines (2009)

near · on May 1, 2014

At one point I was attempting to simulate multiple processors running in parallel. It turns out this behavior can get really finely grained.

So a processor executes instructions, and an instruction/opcode is made up of several cycles. Say for an "INCrement" instruction, there would be a cycle to fetch the opcode prefix, another to fetch the location to increment, another to read the value at that address, another to increment the value, another to write the value back to that address.

There can arise a case where one CPU is writing to memory right as another is reading that same address. And depending on whether you synchronize the two CPUs between opcodes or between cycles, the one reading can end up seeing a different value. This can, in certain cases, break emulated software if not done correctly.

It gets even hairier when you factor in that each cycle consists of several clock ticks. There is time required to hold a value on a bus before a read or write is acknowledged.

So as I continued to try and increase the precision, I found myself nesting state machines within state machines. Each CPU core had one for the instruction table to select the active instruction. And then each instruction had one for the active cycle. And then each cycle had one for the active clock. And then there was also an interrupt-driven DMA unit that could trigger within cycles that had to be accounted for. It was just too complex to collapse into one giant, 100,000 case state machine inside of a single function. Imagine trying to implement an x86 CPU inside one switch table inside one function.

So you would have to traverse through 3-4 state machines to do something as trivial as "increment an integer", and then return all the way back up the stack frame and switch to the other CPU. The code ended up being around 90% state machine maintenance to 10% actual simulation code. And it was painfully slow.

This led me to the idea of having a separate cooperative thread (coroutine+stack frame) for each CPU. Whenever one would read from or write to something the other CPU could see, it would make sure it was 'behind in time' to the other CPU. Otherwise, it would switch the stack frame and resume where the other thread left off. When that CPU stopped, it would resume right at our first CPU's pause. Very highly reciprocal.

The end result was a code reduction from around 350KB for a CPU core to around 35KB for the exact same code. And thanks to what I'll call "just in time switching", it was possible to run one CPU through hundreds of instructions when it wasn't communicating with the other, greatly reducing host CPU cache thrashing. I ended up with a huge speed bonus to boot.

The one problematic area ended up being serialization. Basically, this model moves the state machine into the host CPU's stack frames. But you can't write that out portably, so it becomes a real problem to capture an exact point in your program, write it out to disk, and then resume at that exact point in a future run. That took a long time to solve, and required some serious trickery involving "checkpoints" for serialization. So it's something to keep in mind if you want to use coroutines/cooperative threads and also want to serialize/unserialize things.

What was refreshing was that the core logic for switching between two threads is surprisingly simple. For x86, it is simply:

    thread_switch(to = ecx; from = edx):
    mov [edx],esp; mov esp,[ecx]; pop eax
    mov [edx+ 4],ebp; mov [edx+ 8],esi; mov [edx+12],edi; mov [edx+16],ebx
    mov ebp,[ecx+ 4]; mov esi,[ecx+ 8]; mov edi,[ecx+12]; mov ebx,[ecx+16]
    jmp eax

Basically: save the old stack pointer and volatile registers, swap to the new stack pointer, restore the volatile registers and return. The function design is the program-equivalent of a palindrome.

(push/pop tends to be slower on Athlon chips than indirect memory accesses; and getting the return address into eax instead of relying on ret allows x86 CPUs to start caching the new instructions quicker.)

So, like everyone else who has discovered this concept, I of course wrote my own library for it in C89, which can be downloaded here: http://byuu.org/programming/libco/

Instead of trying like other libraries to build in complicated schedulers and hundreds of functions, mine is just four functions taking zero to two arguments each. No data structures, no #defines. The idea was for the user to write their own scheduling system that works best for their intended use case, instead of trying to be one size fits all.

userbinator · on May 2, 2014

Very interesting. If you're going for size instead of speed, then you can do that thread switch in 5 instructions:

    pusha
    mov [edx], esp
    mov esp, [ecx]
    popa
    ret

(This might be actually be faster too on recent CPUs.)

near · on May 2, 2014

That will definitely work, and be more portable to esoteric ABIs and calling conventions. But it was more than 3x as slow on the Pentium 4, Athlon 64 and Core 2 Duo E6600. I haven't benchmarked since then. But you're pushing and restoring a whole bunch of volatile registers in vain.

Another fun detail, I tried using xchg r32,m32 to swap the stack pointer out in one instruction. Turns out that on the Pentium 4 (and probably others), the instruction is implemented in microcode now. Plus it's a guaranteed atomic operation. The benchmark I wrote ran at least 30x slower than with two mov instructions. I was absolutely blown away by that. People used that all the time in the 8086/80286 days to save a bit on code size (a much bigger deal back then.) Yet that same code, run today, can end up being substantially slower. Not knowing what opcodes will become slower in the future becomes a fairly compelling argument against writing inline assembly for speed.

ARM has nice register lists that you can use to mask out the volatile regs. So an optimal implementation is something like:

    push {r4-r11}
    stmia r1, {sp,lr}
    ldmia r0, {sp,lr}
    pop {r4-r11}
    bx lr

Moving on to amd64 ... Microsoft ignored the SystemV ABI (rbp,rbx,r12-r15 are non-volatile) and instead made xmm6-xmm15 non-volatile as well. This makes it more than twice as slow to perform a safe thread switch there. Even their own fibers implementation ignores xmm6-xmm15, unless a special flag is used.

Probably the strangest was the SPARC CPU. It has register windows for fast register saves/restores on leaf functions. Pretend your 16 regs were a block of memory. It gave you 16 blocks of that memory, and you could change one value to move to a new block of memory. When attempting threading, you couldn't know if you would recurse enough to exhaust this window. So you had no choice but to save and restore every single register window. Context switching became immensely demanding. So much so that GCC offered a compilation flag to not use register windows in binaries it produced.

The choice of volatile vs non-volatile is really fascinating. The less non-volatile you have, the faster both cooperative and preemptive task switching is. But it also means you have less registers that remain safe to use after function calls.

There's also caller vs callee non-volatility: either the caller has to back up the regs it thinks the callee will trample (or all of them); or the callee has to back up the regs it knows it will trample (but may end up backing up regs the caller doesn't actually care about.)

stuaxo · on May 1, 2014

Very nice !

Is it worth having the code on github where people can do pull requests, track issues etc ?

It seems like this could use a bit more exposure.

near · on May 2, 2014

I've never had any luck getting exposure to my code. I'm not good at marketing. If anyone wanted to contribute, we could set one up. I'm always happy for more ASM backends. The setjmp implementation abuses the internal state of jmpbuf, but tends to work everywhere. Yet native ASM backends tend to be twice as fast on average.

Cooperative threading is also really niche. I personally think it's an incredibly transformative model for a lot of tasks. But when preemptive threading came around, everyone abandoned it as old hat.

I also wrote a toolkit abstraction layer based around C++11 (full support for lambda callbacks.) 100% encapsulated using private implementation, so it doesn't leak platform headers into your global namespace at all. Amazingly consistent API, even moreso than Qt. It has backends for Win32, Cocoa, GTK+ and Qt. Full support for layouts and auto-resizing windows. Goes to the trouble of doing things that are insanely hard on specific toolkits sometimes (try hiding a menu item on Windows: it's not supported. So you have to destroy the entire menu and rebuild it without adding that one item. This wrapper does that for you transparently. Or try working with frame geometry on Linux. The toolkits and WMs will fight you to the death, yet it's a common idiom on Windows.) By targeting all APIs, I had to target the least common denominator, so don't expect a web browser widget or a Ribbon or a floating dock. But the end result is that with around 100KB of wrapper code, you can build the exact same app on Windows, OS X, Linux and BSD; and it will be 100% native. Unlike Qt or GTK+, you won't need 40MB of run-time DLLs to distribute it on Windows. And since it's so much smaller, it's far less buggy than Qt tends to be. It also insulates you from having to learn/write Objective-C for Cocoa, C for GTK+, moc and qmake for Qt, and message loops for Windows. And the best part is, if a new killer API emerges, it's small enough that in two weeks you could write a new wrapper and run your apps on a new platform. Try porting Qt to a new target.

I also wrote a lighter-weight version of SDL. It gives you a raw video buffer, audio sample writer, and input manager for keyboards, mice and gamepads+rumble. There's about 30 API drivers in there (OpenGL, Direct3D, X-video, XAudio2, DirectSound, ALSA, Pulse, OSS, XInput, RawInput, Carbon, amusingly SDL itself, etc.) Meant to integrate into existing applications rather than manage a window, so it binds a child window inside your own app for rendering, and does hardware scaling + multi-pass pixel shaders where available. Doesn't try and implement drawing/scaling functions, image conversion, window management, music file playback, etc. Extremely low level, so that adding a driver takes around 10 minutes if you know the API for it.

My current ambition is to design a low-level, minimalist, object-oriented programming language. Statically typed, strongly typed, with absolutely zero undefined behavior, as close to LL(1) and context-free as possible (must be fully parseable with recursive descent), etc. The goal would be to allow compilation to native binaries (initially via conversion to C, later via LLVM backend if it gains any traction), or to run inside of an existing application as a scripting language. Low-level enough for C-level performance, yet safe enough to be truly portable, yet simple enough to be easily embeddable in other languages. A very lofty goal, but it'll be a fun learning experience if nothing else.

userbinator · on May 1, 2014

A few observations:

* I've found that state machines are much easier to understand when they consist of actual gotos, with labels for each state; the current state being where execution is.

* Coroutines are one of those things that I think are easier to understand and write in Asm - a yield looks much like a call but you pop the return address off the stack and save it somewhere, so the next call basically becomes a jump to the resumption point.

* Yielding between coroutines is very similar to switching between threads.

* It helps a lot when understanding coroutines if you forget any existing notions of what functions are and have to do, and just focus on how the execution flows.

groovy2shoes · on May 1, 2014

I agree with all of these points.

As much as Dijkstra liked to cry foul on goto, there are some instances where they really can make imperative code more readable [1]. State machines are a fine example of this. In languages that lack goto but do have tail call optimization (such as many functional languages), you can safely substitute mutual recursion. Then functions become your states, and transitions are tail calls. Lambda is, after all, the ultimate goto [2].

Edit: forgot the links -_-

[1]: http://web.archive.org/web/20090320002214/http://www.ecn.pur...

[2]: http://dspace.mit.edu/bitstream/handle/1721.1/5753/AIM-443.p...

ufo · on May 1, 2014

Since this is the internet, I would like to nitpick and point out that Dijkstra never said that gotos are always evil. From his considered harmful letter:

> In [2] Guiseppe Jacopini seems to have proved the (logical) superfluousness of the go to statement. The exercise to translate an arbitrary flow diagram more or less mechanically into a jump-less one, however, is not to be recommended. Then the resulting flow diagram cannot be expected to be more transparent than the original one.

The important thing is to make sure that the state of the program can be modeled by something as static as possible and as close to the source code as possible. Ideally you should be able to tell what is going on just by looking at what line of code you are at and you want to avoid those times where the only way to understand a program is to go back to "main" and keep track of the whole code path that was taken to reach the point you are currently at.

So a state machine with gotos is fine since the state machine states are strongly correlated to the source lines of code. On the other hand, an unstructured program translated to a goto-less one via the introduction of lots of "flag" variables is just as hard to understand as the version using gotos.

---

By the way, the "GOTO considered harmful" letter is smaller than your average blogpost amd is very easy to understand. I highly recommend that everyone should go ahead and read it if you haven't done so already :)

http://www.u.arizona.edu/~rubinson/copyright_violations/Go_T...

groovy2shoes · on May 1, 2014

> Since this is the internet, I would like to nitpick and point out that Dijkstra never said that gotos are always evil.

To further nitpick your nitpick: I never claimed Dijkstra did say that gotos are always evil. I only said that he cried foul on goto, which he did:

> More recently I discovered why the use of the go to statement has such disastrous effects, and I became convinced that the go to statement should be abolished from all "higher level" programming languages (i.e. everything except, perhaps, plain machine code).

As an aside, the Jacopini paper [1] doesn't really seem to have anything to do with eliminating jumps. In fact, all of their example normalizations given in figures 20-22 clearly contain jumps.

[1]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.119...

Edit: After further reading the Jacopini paper, I think I see what Dijkstra was hinting at: abstracting the jumps into 2 basic operations which are less powerful than a generic jump -- a conditional jump and an iterative jump. Jumps remain in the normalizations but are more restricted, and in some sense "abstracted away". However, the paper does explicitly say that not all flow diagrams can be decomposed this way.

mtdewcmu · on May 1, 2014

I suspect that the programming landscape has changed so much that the goto debate of the 70s is no longer relevant. People used to use gotos because they were necessary; naturally, bad code got written; a theory emerged that goto caused bad code. Fast forward to the present. Gotos are so feared that most programmers have never even seen one. Naturally, bad code still gets written. What we now know for absolute certain is that it's possible to write any kind of code without gotos (including awful code). If you wanted to revive goto today, you'd have to force people to use it, because no one would know what to do with it. There's a goto-free solution for everything. Bad code would still get written, but I doubt it would look like the bad code of the 70s.

edoloughlin · on May 1, 2014

On the other hand, an unstructured program translated to a goto-less one via the introduction of lots of "flag" variables is just as hard to understand as the version using gotos.

Thanks for this (also to the parent). I've just realised that goto has been so purged from my consciousness that my code is full of the 'flag' pattern. I haven't even considered goto as part of any language I've used in the last 20+ years. I guess I have some assumption-questioning to do.

ufo · on May 1, 2014

Replacing flags with break or return statements can often be a good way of making things a bit clearer (of course, this is a bit subjective). As a rule of thumb, gotos that jump forward are not as bad as gotos that jump backwards.

mtdewcmu · on May 2, 2014

You can get the same effect as goto without using the dreaded goto statement by combining a while loop and a large switch. Instead of 'goto nextstate', you just assign the state to the variable that controls the switch and fall through the loop one more time. A switch is basically syntactic sugar for a jump table.* This seems to be a fairly standard pattern when implementing, say, a virtual machine.

* Of course, with optimization, the switch may turn into something completely different.

groovy2shoes · on May 2, 2014

ufo had already mentioned this technique a couple levels up. Though it's equivalent, I think it's worse than just using gotos.

Suddenly you have extra mutable state to worry about: the "next" state and a loop control variable. At that point, you've removed a layer of abstraction and introduced complexity. You've said "let the next state be foo, then repeat this loop, then if the next state is foo, then enter state foo" when you really meant: "go to state foo".

goto in this case is practically declarative, insofar as a state machine diagram is declarative -- code structured in this way is a one-dimensional representation of the diagram.

mtdewcmu · on May 2, 2014

Though it's equivalent, I think it's worse than just using gotos.

I'm not sure I could judge what it would look like just using gotos, because I've never seen it that way. I think in virtual machine implementations, this may actually be the most rational structure for solving the problem at hand.

I agree with your general point, though. When you remove something essential from a programming language, you are forced to reinvent it; actually, more complex version(s) of it.

groovy2shoes · on May 2, 2014

Using gotos, it would look something like this:

    state1:
      /* do some stuff */
      if (/* something or other */)
        goto state2;
      else
        goto state3;
    state2:
      /* do some stuff */
      goto state1;
    state3:
       ...

The labels represent states, and the gotos represent transitions.

You're right that this isn't particularly good for implementing virtual machines, but for more "fixed" types of state machines this can be a pretty clear way of representing them in code. Mike Pall wrote a good overview of various virtual machine techniques here: http://lua-users.org/lists/lua-l/2011-02/msg00742.html (while he knocks the performance of the switch-based approach, I'd like to note that it's the most portable).

> When you remove something essential from a programming language, you are forced to reinvent it; actually, more complex version(s) of it.

That's a good way of phrasing it. It almost sounds like a lemma to Greenspun's Tenth Rule :)

nitrogen · on May 2, 2014

A switch-based state machine can be as simple as something like this:

    enum { STATE0, STATE1, STATE2, STATE3 } state = STATE0;
    while(1) {
        switch(state) {
            case STATE0:
                if(input1) {
                    state = STATE1;
                } else if(input2) {
                    state = STATE2;
                }
                break;

            case STATE1:
                // etc.
        }
    }

groovy2shoes · on May 2, 2014

break statements are goto statements with implicit labels! To really get rid of gotos, you'll need to have a loop control variable:

  ...
  int done = 0;
  while (!done) {
    ...
    done = 1;
    ...
  }

Yes, it works fine and it's not terribly compelex. The version with goto is simpler because it's a more direct translation of the corresponding diagram.

dllthomas · on May 2, 2014

Aren't loops and ifs goto statements with implicit labels?

groovy2shoes · on May 2, 2014

Heh, I suppose you're right :)

mtdewcmu · on May 5, 2014

If you want to find all the implicit gotos, just disassemble the compiled code. Your program will be littered with jmp instructions and the conditional versions of jmp. There's no getting away from them.

groovy2shoes · on May 6, 2014

Yup, I know. I just got a little carried away :)

My main point was similar to something you'd said earlier:

> What we now know for absolute certain is that it's possible to write any kind of code without gotos (including awful code).

I meant to point out the inverse, that it's possible to write good code even with gotos -- but it takes the same amount of care and diligence as writing good code in general. I don't mean to advocate the use of gotos, rather the elimination of the paranoia surrounding them.

nitrogen · on May 2, 2014

Just in case it's not clear, the break statements in my example are to break out of the switch(), not the while(). C's switch statements are like a computed goto.

groovy2shoes · on May 2, 2014

Ah, right. I forgot about break inside switch. Thanks.

You're right about the switch statements being like a computed goto, albeit a delimited, forward-only one. I still think it's silly to bend over backwards to avoid using goto in places where goto is a natural fit.

skrebbel · on May 1, 2014

(shameless plug)

> I've found that state machines are much easier to understand when they consist of actual gotos, with labels for each state

I completely agree, so I made a C# library that makes this easy: https://github.com/eteeselink/YieldMachine

The underlying trick is not entirely unlike the coroutine trick in this article. The idea of YieldMachine is that even though you have gotos and very clear state machine code, you don't have to keep a thread alive just to sit and wait in a current state, like you would have to if you just have a bunch of ifs and gotos in plain old code.

The big disadvantage is that unfortunately there's no simple/DRY way to ask the state machine in which state it is, because in C#, goto labels aren't first-class citizens.

to3m · on May 1, 2014

I think ideally a yield should be very much more than just very similar to switching to another thread? It really ought to switch to a whole nother stack entirely, along the lines of Win32 fibers. Otherwise, you can't have a coroutine that calls a function that does the yielding for it.

If you do it properly, I suspect the correct calling convention for the yield function is undecidable. Which might be why C# doesn't do this...

ufo · on May 1, 2014

I am a bit confused here. Threads should already have separate stacks don't they?

As for the calling a function that does the yielding for you, people tend to call that "stackful" coroutines. The biggest obstacle with doing stackful coroutines is that you need to have separate stacks for the coroutines. In particular, your coroutines cannot use the main stack. This imposes some restrictions in how the runtime is implemented.

You might want to check out the "Revisiting Coroutines" paper from the people who implemented coroutines in Lua. It gives a good overview in the different kinds of coroutines: http://www.inf.puc-rio.br/~roberto/cvpub.html

userbinator · on May 1, 2014

Actually if you are mindful of saving and restoring the right state at the right points, and restrict the "yield usage level", you don't need anything more than one stack that all the coroutines share at different times. I've written code (parser-related) that involves a few coroutines manipulating some shared data on the stack ("shared local variables", if that makes any sense). This sort of thing is almost impossible to implement (efficiently) in an HLL, but pretty easy in Asm.

near · on May 1, 2014

If you wanted two separate threads to share data on the stack of one thread, why not just pass the second thread a pointer to the first thread's actual stack data, instead of relying on stack-relative addressing?

For your ASM example, instead of relying on RSP, keep the first thread's stack address in RBP during execution of the second thread. Then the second thread can have its own entirely separate stack via its own RSP value.

userbinator · on May 2, 2014

> why not just pass the second thread a pointer to the first thread's actual stack data, instead of relying on stack-relative addressing?

That would probably be the only way to do it in a high-level language, but otherwise there really isn't a need to, just like I didn't see a need to setup another stack. I also was not thinking in terms of threads - it was like "swap between these blocks of code", and there were more than 2 of them. Being able to think about and do things like this is why I'm using Asm in the first place - I could restrict myself to writing code using the more limiting conventions of HLLs, almost like a compiler would generate, but I feel that rather defeats the point of using Asm (why not just use a compiler?) Not to say that I don't appreciate using a thread abstraction when it's appropriate, but at this level there are no threads, no functions, no strict relationships between calls and returns, nothing other than a series of instructions.

The "fast-stack-switch" using registers is also something I remember doing before, though it wasn't on x86.

near · on May 1, 2014

> Threads should already have separate stacks don't they?

There are two type of threads: cooperative and preemptive/parallel. Coroutines with stacks represent cooperative threading. There's only ever one actual hardware thread, so there's never any race conditions or locks. Depending on whether you have one core that has an interrupt-driven kernel switching contexts for you (preemptive), or a multi-core system that's actually running two threads at the same time; the latter model protects you against a full application/system stall if one thread stops responding (hi, OS9), and it also allows scalability for parallel tasks.

> As for the calling a function that does the yielding for you, people tend to call that "stackful" coroutines.

People have a million names for it, unfortunately. It's a simple yet immensely powerful concept, although it's not popular. So people keep reinventing it (due to not knowing it exists already) and coming up with their own names for it. You'll also hear them called cothreads, fibers, green threads, etc.

But at the end of the day, they're all variant names for cooperative threads.

> This imposes some restrictions in how the runtime is implemented.

Getting access to memory for a new stack isn't a problem in practice. Outside of tiny DSPs with dedicated call stacks (eg NEC 7725), I've yet to encounter a processor where you couldn't just allocate a new block of heap memory and use it for another thread's stack space. x86, amd64, ppc32, ppc64, arm, mips, sparc, etc.

Now you can claim that the new memory won't automatically grow at exhuastion. Well, you can mmap things and catch exceptions to expand it. But really, the default these days is 512K - 1M for your main stack thread anyway in C; and they generally only reserve a max of 8M short of special compiler flags. Just allocate a large block for your extra threads and you'll be fine.

I've also heard of some tricks with setjmp/longjmp that involve subdividing the main stack, but there's really no need for that at all.

A runtime can't safely use purely stack-relative addressing anyway, since there's no telling how much of an offset there is due to non-runtime function recursion already when the runtime functions are invoked. Unless it's a very high-level language, in which case there's a myriad of better ways to handle this anyway.

betterunix · on May 1, 2014

Actually, I think coroutines are easier to understand in higher-level languages that support call/cc -- Scheme, for example. Yield is just another continuation, no different from return, throw, restart, etc.

groovy2shoes · on May 1, 2014

I agree and disagree. Coroutines can be thought of as a degenerate case of continuations (sometimes called "single-shot continuations"). I contend that understanding continuations is at least as hard as understanding coroutines, but that if you understand one concept the other follows easily. So for people already comfortable with continuations, I agree, otherwise I disagree (in fact, I had a hard time grokking continuations until I thought of them as generalized coroutines).

pdq · on April 30, 2014

Rob Pike used coroutines for the Go Template implementation, instead of a standard lexer/parser:

Video: https://www.youtube.com/watch?v=HxaD_trXwRE

Slides: http://cuddle.googlecode.com/hg/talk/lex.html

ysleepy · on May 1, 2014

OMG that code looks horrible.

Kind of disappointing how he not uses proper parser generators and grammars.

I avoid imerative almost always if I can go declarative.

vidarh · on May 1, 2014

Very few production compilers uses "proper parser generators and grammars" for good reasons. He's even listed some of the reasons on one of the earliest slides.

Basically, the tools are generally not good enough.

happy4crazy · on April 30, 2014

Neat, I was just wondering about this. I'm using the same idea with core.async channels in ClojureScript to parse the BitTorrent peer protocol[0], but I wasn't sure how to describe it.

[0] https://github.com/happy4crazy/ittybit/blob/master/src/ittyb...

juliangamble · on April 30, 2014

You can read more about the State Machines of core.async here: http://hueypetersen.com/posts/2013/08/02/the-state-machines-...

jacquesm · on April 30, 2014

Statemachines are an excellent way to get deterministic behavior of complex systems. Doing this using co-routines would be a lot harder, if it can be done at all once you reach a certain level of complexity.

I've tried to do something very similar to this in a library I wrote two years ago, in the end we reverted back to state machines because they were so much easier to tame. They're boring, but sometimes boring is good.

kerneis · on May 1, 2014

Coroutines and state machines are indeed equivalent. Here is an example of how to (and good reasons why) translate mechanically one into the other:

http://gabriel.kerneis.info/research/files/kerneis-boutier-2...

More context in my work about Continuation-Passing C: http://gabriel.kerneis.info/research/

drudru · on May 1, 2014

Reading your papers now. Great stuff.

munro · on May 1, 2014

The blame is being placed on the wrong thing. It's not the state graph model that is the issue, but modeling it with object oriented constructs. The article actually also agrees that the the concept is succinctly modeled with the state graph diagram, and yet the object oriented version is hard to understand [1].

It's like saying Hashmaps are hard to use, when using List operations to manipulate a map stored as a list of tuples.

What the author needs is a library that can model state graphs in a much more readable way. I developed a state graph library for JavaScript when creating a turn-based game [2], which helped clean up my code immensely. There also appears to be a fairly popular Python library, with a nice looking API [3].

[1] "it’s generally difficult to understand what’s going on in the code without having some sort of a state diagram in front of your eyes." [2] https://github.com/munro/stategraph [3] https://github.com/jtushman/state_machine

userbinator · on May 1, 2014

There is nothing "object oriented" about the example, except for the fact that the code is in a class.

The "proper object oriented way" would be the State design pattern, using separate classes for each state and replacing the if/else with a virtual call, which I agree would be even more obfuscation of the control flow:

http://en.wikipedia.org/wiki/State_pattern

No, I don't think the answer is to make yet another library, if it isn't one that encodes the state graph as a table.

munro · on May 1, 2014

Yep! I completely agree, there is nothing object oriented about the example. So why use constructs meant for describing objected oriented code? Though, the fact that someone defined a pattern for designing state machines on top of object oriented constructs is interesting, thanks for bringing that up.

The other option is build a model for talking about state machines on top of Python's features (like decorators, meta classes, even coroutines!) than strict OOP to feel more like idiomatic Python, than say Java. And when someone has written the code already, I say why reinvent the wheel! :)

Too · on May 1, 2014

Classes are a way to encapsulate state, you don't have to use them together with oo. How else would you describe the example in python without using object oriented constructs? Global variables or passing around a dictionary with all the things set up in init?

munro · on May 1, 2014

A distraction from the main point, being: state graphs rock! And deserve a high level way of talking about them in any language of choice, which most languages don't include in the stdlib. So one may have to roll their own/find a library. And that the article confused code that's "difficult to understand" with state graphs being bad.

> Classes are a way to encapsulate state, you don't have to use them together with oo.

Lexical closures are another way to encapsulate state, modules are another, then there are monads! People have thought of lots of cool ways to encapsulate state.

> How else would you describe the example in python without using object oriented constructs? Global variables or passing around a dictionary with all the things set up in init?

I would build the example on state graph terminology, instead of on top of Python's class directly. Though, as userbinator pointed out, someone has described an OOP pattern for talking about state graphs [1].

Unfortunately there is no built in way for talking about state graphs in Python. To build that, Python has lots of fun features I can leverage! I wouldn't artificially limit myself. But first I would definitely research if someone has written a good library.

[1] http://en.wikipedia.org/wiki/State_pattern

JonnieCache · on May 1, 2014

In ruby we have the very mature and featureful state_machine gem:

https://github.com/pluginaweek/state_machine

I recently found out you can even choose to define states using arbitrary objects/functions, eg. date ranges. Sounds a bit crazy but I can see it solving certain problems very neatly.

dimatura · on May 1, 2014

This is very much the same reasoning behind protothreads[1]. As a bonus, the implementation of protothreads uses a neat trick reminiscent of Duff's device [2].

[1] http://dunkels.com/adam/pt/

[2] http://en.wikipedia.org/wiki/Duff's_device

hadoukenio · on April 30, 2014

The great thing about state machines is that they are easy to debug. Once you have co-routines firing off co-routines, you're in co-routine spaghetti. In fact, I'll coin the term now:

  co-routines = "asynchronous gotos".

tlarkworthy · on April 30, 2014

I wrote a state machine abstraction for Firebase [1], then implemented cross tree transactions on top, which was the formally verified to be exploit free. I think co-routines are a bit too general to make formal verification straight forward.

[1] http://tomlarkworthy.github.io/

oggy · on May 1, 2014

That looks cool. Could you explain a bit more about how the formal verification part worked? What kind of prover do you use? Are the HSMs the specification, and you want to make sure that the rules behave as specified by the HSMs, or do you specify some higher level properties on the HSMs? I couldn't figure these out scanning through your website.

tlarkworthy · on May 1, 2014

The HSMs are converted to Firebase security rules. The theorem proover is used to verify somewhat arbitrary higher level properties

So for item trade* I was specifying the that no new items could be introduced (dupe bugs) or removed from the system, and that a player could never get stuck in a state they could never leave (deadlock). The HSM abstraction is just a useful notation for developing such protocols, though the formal verification actually operated on the Firebase security rule level.

* https://github.com/tomlarkworthy/firesafe/wiki/Send-Item

EDIT: reply to oggy's child question as I can't reply underneath.

The actual spec prooving bit is secret sauce territory until I work out what the future holds for this project.

oggy · on May 1, 2014

Thanks for the info. What theorem prover did you use and how do you convert the security rules to a format that the prover understands?

erichocean · on May 1, 2014

That is a seriously cool project!

I wish more people took advantage of both HSMs and theorem provers, and you've done both. Nice. :)

tlarkworthy · on May 1, 2014

thanks a lot. Yeah, its a brutal bug exterminator combo, and somewhat easier to wrap your head around that coq or similar.

I am trying to find an organic use of the H in HSM for server side stuff, but struggling to think of examples that are not contrived if you have any.

jeremiep · on April 30, 2014

Debugging ease is highly dependant on the implementation of either state machines or co-routines, especially if there are heavy compile-time transformations of the code structure involved.

In either case, if the implementation supports the composition of state machines or co-routines, its going to be easy to reason about.

eliben · on April 30, 2014

Can you clarify why state machines are easier to debug?

xsmasher · on May 1, 2014

When I'm writing a state machine I make sure to add a method like machine.changeState( newState, "reason");

Now you can add a dlog inside changeState to see a history of state transitions and why they were made; useful for figuring out why you got where.

That's not something inherent to state machines, though.

hadoukenio · on April 30, 2014

I've worked on large state machines in the past, which were rock solid and easy to work with. As the code was built using state transition tables, state machines were automatic. I would hate to think of how to juggle these systems using co-routines.

eliben · on April 30, 2014

I'm not sure you're answering my question, though. You made a claim that state machines are easy to debug and coroutines are not. What makes you think they are not? I'm really just curious.

FWIW I've done significanly more work with state machines in the past, and I find them reasonable to debug, though not as easy as straight line code because in the latter the full state history (execution stack) is always known, while in state machines you usually need to do extra work to trace how the machine got to some state.

hadoukenio · on April 30, 2014

Sorry I forgot my why. Probably because by using state transition tables the system's whereabouts is afforded salience. Compare this to a co-routines being sprinkled throughout a large code base, essentially creating asynchronous goto-hell.

lmm · on May 1, 2014

An explicit state machine makes it very easy to observe a) what the current state is b) what the possible states are.

shoover · on May 1, 2014

Good laughs chasing links:

- Simon Chatham: "PuTTY is a Win32 Telnet and SSH client. The SSH protocol code contains real-life use of this coroutine trick. As far as I know, this is the worst piece of C hackery ever seen in serious production code." http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

- Tom Duff: "Yeah. I knew this. In my piece about The Device, I mention something about a similar trick for interrupt-driven state machines that is too horrible to go into. This is what I was referring to." http://brainwagon.org/2005/03/05/coroutines-in-c/#comment-18...

joefkelley · on May 1, 2014

This is cool for this example, but it seems like for a non-trivial state machine (I'm thinking TCP open/close or similar) the co-routine code could get very messy. It's really a small subset of state machines for which this is a good idea, or maybe even possible.

olegp · on May 1, 2014

Co-routines are great! Have been using them as part of Common Node (https://github.com/olegp/common-node) via node-fibers for a number of projects. Both stability and performance have been admirable.

For example at StartHQ (https://starthq.com) we have a framework that automatically fetches data about web apps from a number of sources, some as frequently as once every few minutes (blog RSS feeds) for all the 2K+ apps. At the same time we've had as many as a few hundred concurrent visitors on site at peak, all on an EC2 micro instance, with no noticeable impact on responsiveness.

mtdewcmu · on May 1, 2014

Theoretically speaking, any machine that keeps a fixed (finite) amount of state is a finite state machine. Once you recognize that, you will realize that you have written lots of them without even trying. If a problem inherently requires a fixed amount of state, writing a finite state machine is virtually inevitable. So it's not really a choice of whether to write a finite state machine; it's a choice of how you want it to look; it's a question of style. The state machine style assigns a name to each state, and makes the transitions very obvious. Co-routines can be finite state machines, quite clearly. You could argue that they do not follow the state machine style, though.

_qc3o · on April 30, 2014

Yup, the connection between state machines and co-routines is pretty nice. Instead of state and a bunch of goto's you get a linear flow interrupted with 'yield' statements or whatever the equivalent is in your co-routine library.

tree_of_item · on April 30, 2014

Coroutines as an encoding of state machines...?

Fasebook · on May 1, 2014

Pretty much, which basically gives state to stateless system..