I think Rust's approach here makes a lot more sense for the language than Go/CML...

jholman · on Aug 11, 2016

I think there's little or no evidence that "you can recover most of the M:N ergonomics over time via async/await style syntax", despite a decade or so of attempts. I think there's an underlying semantic concern that seems unsugarable.

Munificent's http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y... expresses the problem eloquently.

None of that means you're wrong about async making "a lot more sense for the language", though.

pcwalton · on Aug 11, 2016

> Munificent's http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y.... expresses the problem eloquently.

I have several issues with that blog post.

1. The performance concerns are glossed over. The conclusion is "threads are superior", but that ignores the reason why we want async I/O in the first place: performance.

2. "What if the new place you want to call it is blue? You’ll have to turn it red." is false. There's a way to convert blocking code to async-friendly code: use a thread pool. This is in fact what cgo does internally—see the next point.

3. You always have red functions and blue functions if you have an FFI. But this isn't a big deal because synchronous functions are easily convertible to asynchronous functions (via a thread pool) and asynchronous functions are easily convertible into synchronous functions (via just blocking). So the supposedly big semantic problem just boils down into a question of making sure you remember to switch into the other mode when necessary (which you can always easily do without restructuring your program). This is something that the language can do for you automatically—proof is that Go does it!—and I would like to see type system or other static approaches to doing this in Rust.

gpderetta · on Aug 11, 2016

This is currently hotly debated in the C++ committee. Some people want shallow C#, python style generators, while other want proper stackful coroutines a-la Lua (full disclosure: I'm on this group). A third group is trying to mediate and trying to come up with an hybrid stackful model that can be optimized as well the async/await model at least in some cases (I.e. full cps transform and fallback to a cactus stack when invoking non transformable functions).

wahern · on Aug 13, 2016

I've been writing async I/O networking software for about 15 years now. Early on most of that was in C, now it's split about 50/50 between C and Lua. Most of my C I/O code is still in C because I prefer my libraries to be reuseable outside of Lua or any particular event loop, and they often are. Lua's coroutines are usually higher up the stack, juggling more abstract state; and I use them for more than asynchronous I/O or asynchronous tasks.

The thing about async/await is that in a language like C, I can already accomplish much of that with tricks like Duff's Device and macros. It has its limitations, but IME they're not much more onerous than the limitations of async/await, especially in the context of a language lacking GC. I have to manually keep state off the stack (or otherwise copy/restore it), but you do that anyhow when you don't have lexical closures and GC, and often even when you do.

The beautiful thing about coroutines in Lua is that it's based on a threading model, but not one bound to the C stack or a kernel thread, which are completely orthogonal concerns left to the application to deal with or not deal with. And it does this while preserving functions as first-class objects. Neither callers nor callees need to know anything about coroutines. That kind of composability makes coroutines useful and convenient for many more things than simulating green threading or managing CPU parallelism. Among other things, it means I can mix-and-match functional and imperative styles according to the problem, and not whether it will be convenient to then make use of coroutines. It means that I have a single, natural call stack--not an implicit stack and an explicit stack. async/await and futures unify your data stack, but you're still manually managing the call stack through syntactic devices or otherwise formalized calling conventions. However heavily sugared, it will hinder the design of your software no less than if you had to manually manage the data stack, too.

Coroutines that aren't stackful aren't nearly as powerful in terms of problem solving. Without them being stackful, it's a horribly leaky abstraction for non-trivial uses. Most people would agree that the C preprocessor is a mess, and that functions as first-class objects are powerful. So modern languages strive to create templating systems that allow you to construct _real_ functions that are indistinguishable from any another function. But then they introduce monstrosities like futures or async/await, that beautiful symmetry is broken. It's like bringing back C's macro preprocessor--now you have regular functions and these weird things with different syntactic and control follow semantics, whether you wanted it or not. The decision is no longer yours, which means you're bending to the language's deficiencies.

Why even bother with such half-baked solutions? In almost every case it's utterly transparent that these solutions exist for the benefit of the compiler and runtime author, usually because of intentional or unintentional technical debt--a direct or indirect dependency on the C or kernel stack. For C++ it's understandably a difficult dilemma, but for every other language it's a total cop-out.

Then these solutions are sold to the public by prettifying the implementations with fancy terminology and beguiling examples showing how they can be used to implement async I/O or parallel CPU jobs. But few, if any, language features are so narrowly tailored to such specific use cases. Why? Because languages are supposed to provide simple building blocks that compose as seamlessly as possible at a much higher level of abstraction than, e.g., a slightly nicer way to implement HTTP long-polling servers. Such contrivances are as removed from the basic simplicity of the function as C's macros are from first-class functions. In both cases you can implement solutions for a certain subset of problems that superficially look convenient and nice; but in the real world the limitations become swiftly apparent, and you realize a lot of effort was spent in design and implementation for little real-world gain.

With Lua's coroutines, I can implement a futures pattern easily when it's appropriate, and it will be more powerful because the futures themselves can make use of coroutines both internally and externally. But in my use of coroutines in Lua futures are rarely the most natural design pattern. Sometime you want a full-blown CPS solution, sometimes you simply want to be able to arbitrarily swap producer/consumer control flow, for example in a lexer. Often you want a mixture of all of these. Coroutines--stackful coroutines--provide all that and more, seamlessly.

Futures only look nice and elegant in contrast to event loop oriented, callback-style programming. But that's a really, really low bar. Please aim higher, people!

domenicd · on Aug 13, 2016

> Why even bother with such half-baked solutions? In almost every case it's utterly transparent that these solutions exist for the benefit of the compiler and runtime author, usually because of intentional or unintentional technical debt--a direct or indirect dependency on the C or kernel stack.

There is a significant faction of language designers that disagree, and think that keeping coroutines shallow is important for developers writing and reading the code. This post from Dave Herman<(involved in JavaScript and Rust) sums it up: http://calculist.org/blog/2011/12/14/why-coroutines-wont-wor.... (The comment from Tom van Cutsem is also a good rephrasing: http://disq.us/p/9jcee9.) Note that the argument is not as applicable in languages with racy multithreading (like C/C++ or Java).

I don't think it's necessarily a knockout argument, but it at least helps me sleep with what we've chosen for JavaScript.

gpderetta · on Aug 13, 2016

I never understood this argument, it feels to me a post hoc rationalisation. Yes, with toplevel only yield you know all suspension points, but doesn't really buy you anything, as calling an unknown function can potentially mutate any object possibly by invoking user callbacks or other generators. If the function behaviour is well documented, then whether it is a suspension point would be as well.

domenicd · on Aug 14, 2016

The difference is that when you call a function, you can easily know what will happen: the function will execute. You can use your knowledge about the function you are calling (and the functions it calls, etc.) to ensure it does not violate any invariants you set up.

Whereas, if you have an implicit yield point that goes back to the event loop, the event loop can run arbitrary other tasks---not ones you can locally reason about or predict, but simply those that are ready to be executed.

gpderetta · on Aug 15, 2016

But if you know what a function does and call you would also know whether it is an implicit yield point or not, right?

Also, I don't know about JS, but many event loops can be reentered recursively, so even with top level only yield all bets are off.

wahern · on Aug 15, 2016

Was that a correct link? I feel like it only re-iterates my points.

My point is that thinking about coroutines in the context of green threading is totally the wrong way to think about it. That you can implement something approximating green threads with coroutines is a testament to the power of coroutines, but it's hardly the defining the feature for them.

And coroutines are not sufficient to implement green threading. You still need a way to have multiple outstanding I/O requests. That could have been done with other mechanisms. User code could wrap such mechanisms with coroutines and fashion an event loop if they desired, and no doubt most would have done that. But by leaving that up to the application people could experiment with patterns for addressing concerns regarding concurrency. And I would also note that concurrency problems related to order of operations hardly go away with futures, the preferred solution in JavaScript, or async/await. Stackful coroutines can theoretically be worse when callees can yield from any expression, but don't forget that the real problem is shared mutable state, which you're passing or otherwise making available in equal measures for each option. For that and other reasons the distinction with futures and async/await is, I think, not very meaningful.

For similar reasons, Rust's failed experiment with green threading is not an argument against the practicality of stackful coroutines. Quite the contrary--it's an example of why it's more important to focus on the abstraction and preserving the power of that abstraction than to tailor the solution for specific scenarios. Rust could easily have had stackful coroutines with zero performance cost and negligible cost to the language design. But instead they focused on coroutines as a roundabout way to ease the burden of async I/O, and, worse, they tried to refactor Rust's entire standard I/O library to work with that green threading model. It was a destined for failure, and for good reason.

When discussing coroutines, my favorite example is something like libexpat. libexpat was early on in the history of XML the most widely used XML parser. But it was a push parser. Push parsers are easier to implement and often more performant, but they're more difficult to use. All of those qualities stem from push parsers literally and figuratively pushing the burden onto the application for solving issues related to state management and buffering.

You couldn't easily refactor libexpat into a pull parser because token production relied on automatic variable stack state and internal stack structures. You'd have to copy and buffer everything it produced. No wonder so many people either forked libexpat or just reinvented the wheel.

If C had stackful coroutines, it would be _trivial_ to make libexpat a pull parser. Heck, trivial undersells how simple and elegant it would be. In that context coroutines could have provided the best of every and all worlds, for both libexapt developer(s) and direct and indirect users.

The narrative around coroutines has been poisoned by this narrow focus on async I/O and similar contemporary problems, and conflation and equivocation with fibers, kernel threads, and the low-level details of platforms' C ABIs. It's created lost opportunities for providing stronger language abstractions.

Perl 6 committed a similar sin, IMO. MoarVM technically can support stackful coroutines, but it doesn't because they're only implemented to support Perl's gather/take control structure. That coroutines could have easily been used to implement gather/take entirely within application code, but not vice-versa, should have been a strong hint that coroutines were the stronger abstraction, and gather/take should have been defined as standard API utilizing proper stackful coroutines.

Note to future language designers: coroutines are not about async I/O. Coroutines are not about green threads. Coroutines are not about map/reduce. Don't conflate the means with the ends. Stackful coroutines can be used to implement all of those and more because they're the better abstraction. Lua's stackful coroutines can be used to easily implement green-threading like async I/O, or non-trivial map/reduce patterns, not because the Lua authors bent over backwards to make that possible, but because they preserved their abstractive power; because they modified both the language language design and implementation so stackful coroutines didn't come with needless caveats.

soulbadguy · on Aug 11, 2016

> You can recover most of the M:N ergonomics over time via async/await style syntax.

While i agree with the tradeoff made by rust (although i think the approach used by C++ coroutine is better), i don't think that having async/await syntax give you "most" of the ergonomics of the Go M:N model .

The main advantage of the go model is that both asynchronous and synchronous operations are identical, with async/await you still need to model the async operation and the sync operation with different types. Not having to decide(and design) upfront which part of your computation is async and which is not is what makes go so attractives

pcwalton · on Aug 11, 2016

> although i think the approach used by C++ coroutine is better

How?

> The main advantage of the go model is that both asynchronous and synchronous operations are identical, with async/await you still need to model the async operation and the sync operation with different types.

It's more like "everything is synchronous" in the Go model. Semantically, Go doesn't have async I/O at all. It has a userspace M:N implementation of synchronous I/O. You can get the same effect in any other language by just not using async I/O.

> Not having to decide(and design) upfront which part of your computation is async and which is not is what makes go so attractives

That doesn't make sense to me. If async I/O is important in your app, why not just make your entire app use async I/O?

skybrian · on Aug 11, 2016

Defaults matter. If some people use async I/O and others don't then you get a mess when they want to share reusable libraries. It's similar to the mess you get when there is more than one string type.

I think the "what color is your function" problem could be mostly solved by making async/await the default function type - that is, most callback functions should be allowed to suspend execution by waiting on a Future.

Then you could have special-purpose "atomic" functions that are occasionally useful for pure functional code.

(Unfortunately, the default has to be the opposite in browser-based languages due to performance concerns.)

pcwalton · on Aug 11, 2016

> (Unfortunately, the default has to be the opposite in browser-based languages due to performance concerns.)

Also in Rust. Most apps that aren't servers don't want async I/O, and it causes a lot of problems when you need high-performance FFI. For example, in Servo async-everywhere would be a big problem for WebRender, which needs to be able to call OpenGL extremely quickly.

> Defaults matter. If some people use async I/O and others don't then you get a mess when they want to share reusable libraries. It's similar to the mess you get when there is more than one string type.

Given that having both is necessary for Rust (maybe a necessary evil), I think the right approach is to make jumping from one mode to the other painless. For sync-to-async, it needs to be easy to block on the result; for async-to-sync, it needs to be easy and fast to offload to a thread pool. If we can make it really easy to switch from one mode to the other, then most of the really hairy problems go away.

zeeboo · on Aug 12, 2016

Easy and error prone.

Sometimes libraries pretend to be async and accidentally are sync. Consider some library that in the normal case just does some pure computation, but logs to syslog or something on some error condition. If you use that library in an async context, it could work fine most of the time, until you hit some unexpected situation where it happens to make a network request to some syslog daemon and blocks up your worker thread. The same thing can happen with mutexes, or many other common blocking operations.

It's also the case that often async libraries depend on some sync library and so they have their own worker pool. You can easily have many libraries with their own worker pools all using more resources then they need.

You also have to worry about if your functions do any of these transformations under the hood. For example, if you have some async worker that delegates some sync task to the worker pool, and that sync task happens to use some async function and blocks on it, and that async function ALSO has a sync task and attempts to delegate it to a worker pool, and that worker pool is bounded, then you have just opened yourself up to a deadlock under high load that you probably won't find under normal operation.

On top of all that, debugging is usually much harder in these environments because you have to inspect the runtime memory state that depends on the specific internal details of the libraries being used instead of a bunch of stacks. It's extremely hard to diagnose deadlocks or stalls in these environments. It's non-trivial to provide a good debugging experience that doesn't cause extra load in production environments.

These issues are all real things that I have hit in production with Twisted. A static type system could help all these things, but I think it requires buy in from every library you might use, transitively.

pcwalton · on Aug 12, 2016

> It's also the case that often async libraries depend on some sync library and so they have their own worker pool. You can easily have many libraries with their own worker pools all using more resources then they need.

The Rust story will not be complete without a canonical single implementation of a thread pool that everybody doing async I/O uses for blocking tasks.

> For example, if you have some async worker that delegates some sync task to the worker pool, and that sync task happens to use some async function and blocks on it, and that async function ALSO has a sync task and attempts to delegate it to a worker pool, and that worker pool is bounded

I think the solution here is "don't have strictly bounded worker pools". This is what cgo does, I believe.

> It's non-trivial to provide a good debugging experience that doesn't cause extra load in production environments.

But this is the exact same problem that any M:N system will have. So I don't see any special problem for Rust's system here.

zeeboo · on Aug 12, 2016

I hope your optimism that a single kind of thread pool will service all applications is well founded. It seems like people would want to specialize them much like they want to specialize their memory allocators. The Rust team has a really great track record of innovation and technical excellence so I look forward to the design that will accomodate that and hope the ecosystem buys off on it.

Go does limit the number of threads and will crash the process if it goes over. It's also very rare to have CGo call back into Go multiple times versus libraries juggling adapters between async and sync in my experience. It's also easy to have your library have a limiter on the number of CGo calls you make, but less easy to limit the number of tasks you throw into a thread pool because you don't have the option to block. (edit: I think you can just store the closure on the thread pool and schedule it eventually at the cost of writing your own scheduler and perhaps requiring allocations?) I have a feeling that a similar crashing solution won't work in the Rust community, and what to do when the limits are hit will be punted upstream. My main point is that there are many subtle details in solving the "colored function" problem.

I don't think all M:N systems have the debuggability problem becausd the runtime has a single canonical representation of what is running: the stack traces. Since the entire ecosystem bought into the runtime, you don't have any fracturing of representations. If you're optimisitc that the entire ecosystem will buy into whatever mechanism you have to do async work, then this can be solved, but I believe that's already not the case (people hand code state machines) and is unlikely to stay the case as time goes on.

Matthias247 · on Aug 12, 2016

> Given that having both is necessary for Rust (maybe a necessary evil), I think the right approach is to make jumping from one mode to the other painless. For sync-to-async, it needs to be easy to block on the result; for async-to-sync, it needs to be easy and fast to offload to a thread pool. If we can make it really easy to switch from one mode to the other, then most of the really hairy problems go away.

That really sounds like the Task<T> type from C# TPL, which can also be used in sync and async environments and was probably also designed to be a generic solution. While it basically works there's a bigger number of pitfalls associated with that model. E.g. you can synchronously block on .Result from some tasks (that will be fulfilled by other threads), but not from others (which would be fulfilled by the same thread, because that causes a deadlock). In the Task+Multithreading world there's also always the question where continuations (attached with .ContinueWith, .then, ...) run. E.g. synchronously in the thread that fulfills the promise, asynchronously in a new eventloop iteration (but for which EventLoop?), in an explicitly specified scheduler, etc. C# uses TaskScheduler and SynchronizationContext variables for that. But as they are only partially known and even behave somewhat different for await Task and Task.ContinueWith there's quite some room for confusion.

bmurphy1976 · on Aug 12, 2016

> Also in Rust. Most apps that aren't servers don't want async I/O, and it causes a lot of problems when you need high-performance FFI. For example, in Servo async-everywhere would be a big problem for WebRender, which needs to be able to call OpenGL extremely quickly.

I don't understand why this is the case. Since async/await allows the compiler to transform the code into a state machine, why would it be not be able to optimize this?

pcwalton · on Aug 12, 2016

Because async-everywhere usually means pushing blocking FFI calls over to a thread pool, which would be unacceptably slow for e.g. OpenGL.

bmurphy1976 · on Aug 12, 2016

"Usually" is a funny word, but I see what you are saying.

I guess if you are in a single-threaded event loop scenario (ala nodejs) you could get away with it somewhat, but as soon as you introduced multiple threads of execution all bets are off.

That's unfortunate.

We have a fairly large codebase written in .NET. We wouldn't mind porting it to CoreCLR, but we'd have to migrate everything to async/await. The red/blue nature of method signatures makes this look like almost a complete rewrite. Given the difficulty of this migration so far, and the nature of the change, it's certainly caused us to explore other options and we've already rewritten a significant chunk of the code in Go.

Moving a large codebase from a single color model to a dual color model really sucks. I hope Rust can lock this down sooner rather than later otherwise a lot of people are going to feel pain down the road.

The good news is you have a good static type system. I cannot even begin to imagine migrating a Python codebase to async/await...

soulbadguy · on Aug 12, 2016

> How ? In term of allocation : When the future uses some variable present on the current function stack you have two options

1 - Waiting for the future to complete before exiting the current function (which essentially is blocking)

2 - Allocating the closure on th heap (allocation + deallocation)

In a language with coroutine support , we have a third alternative. Instead of block or allocating memory, it's possible to just suspend the current function (no heap allocation necessary, no blocking), and resume when the future completes.

In term of context switching speed : the cost of moving the state machine is essentially the cost of a double dispatch (probably double dispatch plus virtual function call), switching coroutines is closer to the cost of a function call ( i think it's cheaper than a normal function call, but tha becomes too technical)

kbenson · on Aug 12, 2016

I just watched (mostly) the CppCon talk you posted elsewhere. The coroutine approach is really interesting, but I'm confused as to how it's different. According to a source I found[1], the way coroutines are implemented is that a new stack is created on the heap and it moves back and forth between that. Isn't that the same case here? Is the compiler level implementation(as opposed to boost, as in the linked reference) different in some way?

1: http://stackoverflow.com/questions/121757/how-do-you-impleme...

zzzcpan · on Aug 11, 2016

> The main advantage of the go model is that both asynchronous and synchronous operations are identical

There is a bit of misunderstanding on your part. There are no asynchronous operations in that go model, everything is synchronous. There is no event loop underneath, despite what some people claim. And this is absolutely not an advantage in a shared memory environment. Instead it forces you to do synchronization to access memory, while different-looking asynchronous operations do not precisely because they are different-looking, otherwise it would be impossible to know what is running when and you would have to think about synchronization too.

It is not a secret that shared memory multithreading, while useful for parallelism, is the worst possible model for concurrency and this includes all flavors of coroutines as well.

soulbadguy · on Aug 12, 2016

> There are no asynchronous operations in that go model, everything is synchronous.

You are the second person saying this, and i have to admit i am really confused by this statement. From my understanding, golang does not expose an asyncio interface, but this doesn't meant that golang runtime doesnt perform io operations asynchronously. So golang expose async operation through a synchronious interface, which is what most language construct (C# async/await, F# async monad etc..) try to emulate.

From https://morsmachine.dk/go-scheduler , when a goroutine performs a blocking operation, the local scheduler the remaining goroutine are migrated to another OS thread, allowing them to continue their execution while the blocking operation completes asynchroniously.

pcwalton · on Aug 12, 2016

> From my understanding, golang does not expose an asyncio interface, but this doesn't meant that golang runtime doesnt perform io operations asynchronously.

This is also true for an OS kernel.

soulbadguy · on Aug 12, 2016

I don't understand the point your are making

dbaupp · on Aug 12, 2016

The blocking (i.e. synchronous) syscalls for IO are doing asynchronous things internally, they just wait for the event to finish before returning. The kernel can even do similar scheduling things to a language runtime, e.g. when a thread does a call to a blocking read on a socket, the thread can be switched out until the socket actually has data, allowing other code to run on that core.

soulbadguy · on Aug 12, 2016

I am still unclear on how does that correlate to the orignal discussion. The point we were debating is whether or not Golang does async io. I am not sure why the kernel behavior is important here

dbaupp · on Aug 12, 2016

The Go programming model is synchronous: the code executes an IO operation and that thread of execution halts until the operation completes. The Go runtime is implemented using asynchronous IO internally, and manages scheduling between threads itself, but that is an implementation detail.

This is exactly the same as using normal blocking operations with OS threads. That programming model is synchronous: the code executes an IO operation and that thread of execution halts until the operation completes. The kernel is implemented using asynchronous IO internally, and manages scheduling between threads itself, but that is an implementation detail.

The original point is Go's programming model is the same as OS-level blocking IO, the fact the runtime is implemented in user-space on top of async IO is an implementation detail that doesn't change the surface behaviour of the code. One could substitute OS threads and normal OS blocking calls for goroutines and the runtime's IO operations and code would behave essentially identically, just possibly with different performance characteristics.