> although i think the approach used by C++ coroutine is better How? > The main ...

skybrian · on Aug 11, 2016

Defaults matter. If some people use async I/O and others don't then you get a mess when they want to share reusable libraries. It's similar to the mess you get when there is more than one string type.

I think the "what color is your function" problem could be mostly solved by making async/await the default function type - that is, most callback functions should be allowed to suspend execution by waiting on a Future.

Then you could have special-purpose "atomic" functions that are occasionally useful for pure functional code.

(Unfortunately, the default has to be the opposite in browser-based languages due to performance concerns.)

pcwalton · on Aug 11, 2016

> (Unfortunately, the default has to be the opposite in browser-based languages due to performance concerns.)

Also in Rust. Most apps that aren't servers don't want async I/O, and it causes a lot of problems when you need high-performance FFI. For example, in Servo async-everywhere would be a big problem for WebRender, which needs to be able to call OpenGL extremely quickly.

> Defaults matter. If some people use async I/O and others don't then you get a mess when they want to share reusable libraries. It's similar to the mess you get when there is more than one string type.

Given that having both is necessary for Rust (maybe a necessary evil), I think the right approach is to make jumping from one mode to the other painless. For sync-to-async, it needs to be easy to block on the result; for async-to-sync, it needs to be easy and fast to offload to a thread pool. If we can make it really easy to switch from one mode to the other, then most of the really hairy problems go away.

zeeboo · on Aug 12, 2016

Easy and error prone.

Sometimes libraries pretend to be async and accidentally are sync. Consider some library that in the normal case just does some pure computation, but logs to syslog or something on some error condition. If you use that library in an async context, it could work fine most of the time, until you hit some unexpected situation where it happens to make a network request to some syslog daemon and blocks up your worker thread. The same thing can happen with mutexes, or many other common blocking operations.

It's also the case that often async libraries depend on some sync library and so they have their own worker pool. You can easily have many libraries with their own worker pools all using more resources then they need.

You also have to worry about if your functions do any of these transformations under the hood. For example, if you have some async worker that delegates some sync task to the worker pool, and that sync task happens to use some async function and blocks on it, and that async function ALSO has a sync task and attempts to delegate it to a worker pool, and that worker pool is bounded, then you have just opened yourself up to a deadlock under high load that you probably won't find under normal operation.

On top of all that, debugging is usually much harder in these environments because you have to inspect the runtime memory state that depends on the specific internal details of the libraries being used instead of a bunch of stacks. It's extremely hard to diagnose deadlocks or stalls in these environments. It's non-trivial to provide a good debugging experience that doesn't cause extra load in production environments.

These issues are all real things that I have hit in production with Twisted. A static type system could help all these things, but I think it requires buy in from every library you might use, transitively.

pcwalton · on Aug 12, 2016

> It's also the case that often async libraries depend on some sync library and so they have their own worker pool. You can easily have many libraries with their own worker pools all using more resources then they need.

The Rust story will not be complete without a canonical single implementation of a thread pool that everybody doing async I/O uses for blocking tasks.

> For example, if you have some async worker that delegates some sync task to the worker pool, and that sync task happens to use some async function and blocks on it, and that async function ALSO has a sync task and attempts to delegate it to a worker pool, and that worker pool is bounded

I think the solution here is "don't have strictly bounded worker pools". This is what cgo does, I believe.

> It's non-trivial to provide a good debugging experience that doesn't cause extra load in production environments.

But this is the exact same problem that any M:N system will have. So I don't see any special problem for Rust's system here.

zeeboo · on Aug 12, 2016

I hope your optimism that a single kind of thread pool will service all applications is well founded. It seems like people would want to specialize them much like they want to specialize their memory allocators. The Rust team has a really great track record of innovation and technical excellence so I look forward to the design that will accomodate that and hope the ecosystem buys off on it.

Go does limit the number of threads and will crash the process if it goes over. It's also very rare to have CGo call back into Go multiple times versus libraries juggling adapters between async and sync in my experience. It's also easy to have your library have a limiter on the number of CGo calls you make, but less easy to limit the number of tasks you throw into a thread pool because you don't have the option to block. (edit: I think you can just store the closure on the thread pool and schedule it eventually at the cost of writing your own scheduler and perhaps requiring allocations?) I have a feeling that a similar crashing solution won't work in the Rust community, and what to do when the limits are hit will be punted upstream. My main point is that there are many subtle details in solving the "colored function" problem.

I don't think all M:N systems have the debuggability problem becausd the runtime has a single canonical representation of what is running: the stack traces. Since the entire ecosystem bought into the runtime, you don't have any fracturing of representations. If you're optimisitc that the entire ecosystem will buy into whatever mechanism you have to do async work, then this can be solved, but I believe that's already not the case (people hand code state machines) and is unlikely to stay the case as time goes on.

Matthias247 · on Aug 12, 2016

> Given that having both is necessary for Rust (maybe a necessary evil), I think the right approach is to make jumping from one mode to the other painless. For sync-to-async, it needs to be easy to block on the result; for async-to-sync, it needs to be easy and fast to offload to a thread pool. If we can make it really easy to switch from one mode to the other, then most of the really hairy problems go away.

That really sounds like the Task<T> type from C# TPL, which can also be used in sync and async environments and was probably also designed to be a generic solution. While it basically works there's a bigger number of pitfalls associated with that model. E.g. you can synchronously block on .Result from some tasks (that will be fulfilled by other threads), but not from others (which would be fulfilled by the same thread, because that causes a deadlock). In the Task+Multithreading world there's also always the question where continuations (attached with .ContinueWith, .then, ...) run. E.g. synchronously in the thread that fulfills the promise, asynchronously in a new eventloop iteration (but for which EventLoop?), in an explicitly specified scheduler, etc. C# uses TaskScheduler and SynchronizationContext variables for that. But as they are only partially known and even behave somewhat different for await Task and Task.ContinueWith there's quite some room for confusion.

bmurphy1976 · on Aug 12, 2016

> Also in Rust. Most apps that aren't servers don't want async I/O, and it causes a lot of problems when you need high-performance FFI. For example, in Servo async-everywhere would be a big problem for WebRender, which needs to be able to call OpenGL extremely quickly.

I don't understand why this is the case. Since async/await allows the compiler to transform the code into a state machine, why would it be not be able to optimize this?

pcwalton · on Aug 12, 2016

Because async-everywhere usually means pushing blocking FFI calls over to a thread pool, which would be unacceptably slow for e.g. OpenGL.

bmurphy1976 · on Aug 12, 2016

"Usually" is a funny word, but I see what you are saying.

I guess if you are in a single-threaded event loop scenario (ala nodejs) you could get away with it somewhat, but as soon as you introduced multiple threads of execution all bets are off.

That's unfortunate.

We have a fairly large codebase written in .NET. We wouldn't mind porting it to CoreCLR, but we'd have to migrate everything to async/await. The red/blue nature of method signatures makes this look like almost a complete rewrite. Given the difficulty of this migration so far, and the nature of the change, it's certainly caused us to explore other options and we've already rewritten a significant chunk of the code in Go.

Moving a large codebase from a single color model to a dual color model really sucks. I hope Rust can lock this down sooner rather than later otherwise a lot of people are going to feel pain down the road.

The good news is you have a good static type system. I cannot even begin to imagine migrating a Python codebase to async/await...

soulbadguy · on Aug 12, 2016

> How ? In term of allocation : When the future uses some variable present on the current function stack you have two options

1 - Waiting for the future to complete before exiting the current function (which essentially is blocking)

2 - Allocating the closure on th heap (allocation + deallocation)

In a language with coroutine support , we have a third alternative. Instead of block or allocating memory, it's possible to just suspend the current function (no heap allocation necessary, no blocking), and resume when the future completes.

In term of context switching speed : the cost of moving the state machine is essentially the cost of a double dispatch (probably double dispatch plus virtual function call), switching coroutines is closer to the cost of a function call ( i think it's cheaper than a normal function call, but tha becomes too technical)

kbenson · on Aug 12, 2016

I just watched (mostly) the CppCon talk you posted elsewhere. The coroutine approach is really interesting, but I'm confused as to how it's different. According to a source I found[1], the way coroutines are implemented is that a new stack is created on the heap and it moves back and forth between that. Isn't that the same case here? Is the compiler level implementation(as opposed to boost, as in the linked reference) different in some way?

1: http://stackoverflow.com/questions/121757/how-do-you-impleme...