Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, as someone who has been working heavily with coroutines and continuations for decades in a number of different languages across the gamut of programming paradigms, I don't really understand why these runtimes aren't "interoperable", and am hoping I just have a different idea of what that word means than the people who talk about them in the context of Rust.

Like, right now I maintain a large almost-entirely-asynchronous C++ codebase using their new C++20 co_await monstrosity, and while I find the abstraction ridiculously wide and a bit obtuse, I have never had trouble "interoperating" different "runtimes" and I am not even sure how one could screw it up in a way to break that... unless maybe these "executors" are some attempt to build some kind of pseudo-thread, but I guess I just feel like that's so "amateur hour" that I would hope Rust didn't do that (right?).

So, let's say you are executing inside of a coroutine (context is unspecified as it doesn't matter). When this coroutine ends it will transfer control to a continuation it was given. It now wants to block on a socket, maybe managed by Runtime A (say, Boost ASIO). That involves giving a continuation of this coroutine past the point of the transfer of control to Runtime A which will be executed by Runtime A.

Now, after Runtime A calls me--maybe on some background I/O thread--I decide I would prefer y task to be executing in Runtime B. I do this sometimes because I might have a bit of computation to do but I don't want to block an I/O thread so I would prefer to be executing inside of a thread pool designed for slow background execution.

In this case, I simply await Runtime B (which in this case happens to be my lightweight queue scheduler). I don't use any special syntax for this because all of these runtimes fully interoperate: I used await to wait for the socket operation and now I use await to wait until I can be scheduled. The way these control transfers work is also identical: I pass a continuation of myself after the point of the await to the scheduler which will call it when I can be scheduled.

Now remember, at the beginning of this I was noting that something unspecified had called me. That is ostensibly a Runtime C here (maybe I was waiting for a callback from libwebrtc--which maintains its own runloop--because I asked it to update some ICE parameter, which it does asynchronously). It doesn't matter what it was, because now that "already happened": that event occurred and the continuation I provided was already executed and has long since completed and returned as I went on immediately to pass a continuation to someone else rather than blocking.

Is this somehow not how Rust works? Is await some kind of magic "sticky" mechanism that requires the rest of this execution happen in the context of the "same" runtime which is executing the current task? I have seen people try to do that--I am looking at you, Facebook Folly--but, in my experience, attempts to do that are painfully slow as they require extra state and cause the moral equivalent of a heavyweight context switch for every call as you drag in a scheduler in places where you didn't need a scheduler.

But, even when people do that, I have still never had an issue making them interoperate with other runtimes, so that can't be the issue at its core. I guess I should stare at the key place where the wording in this article just feels weird?... to me, I/O and computation are fairly disjoint, and so I can't imagine why you would ever want to have your I/O scheduler do "double-duty" to also handle "task queues". When I/O completes it completes: that doesn't involve a "queue". If you want to be part of a queue, you can await a queue slot. But it sounds like tokio is doing both? Why?



You can interchange async implementations in rust if you like, much like you can in C++ or other languages.

What becomes hard though is grappling with what that means:

- the stdlib doesn't know about async, so there are a variety of async stdlibs that may or may not be tightly coupled to an implementation.

- different runtimes may choose different threading models. Some may be single threaded-ish, some may be across threads. You could treat it all like it's across threads, but this does mean that there's another detail you need to consider when you're setting up your data.

- Io scheduling mixed with task scheduling is a choice of how an async stdlib is configured. There's advantages to having them coupled in that the runtime can sort checks on returns on the Io call as it cycles through the tasks, or put them all on a single thread queue etc... There's lots of patterns here that may have their own individual tradeoffs


They are interoperable in the most basic mechanism of futures: every executor can spawn tasks composed of any futures (just like co_await in C++ is interoperable)

But they aren't interoperable in practice because they offer different APIs

In some cases this is fixable (for example, the rust ecosystem needs some to standardize some async abstractions because currently every executor defines their own trait for async reading for example), in other cases it represents a genuine limitation of a given executor (for example, some embedded executors can only spawn a single task, and you achieve concurrency by using future combinators)


OK, so the version of "interoperable" you seem to be using sounds like like "swappable", which isn't really a property I have ever cared much about. Like, if I have code that is using ASIO's task abstraction and other code using cppcoro's and other code using my own scheduler and still other code wired up over some callback setup, I would have just used "interoperable" to mean I can await whatever I want whenever I want without complex glue code, as--at the end of the day--I am merely passing a continuation for my function to someone who will call it later. I mean, of course the APIs aren't the same: in one case I am awaiting sockets and in other case I am awaiting queue slots and in another case I am awaiting random asynchronous events but I am able to do all of it from a single asynchronous function as they are all "interoperable". It just sounds from these articles that Rust can't even do that.


No, Rust is fully interoperable in the sense you care about. But in the Rust ecosystem there's a desire for writing libraries that can run in any executor, to avoid picking a winner.

Right now what most libraries do is to write code paths for working with tokio, with async-std, etc. This is not sustainable. If we had generic APIs we could just code against that.

Anyway, the biggest source of contention is that the networking API of Tokio and async-std are different. But there's no fundamental reason for this difference and there's hope that it will eventually be possible to bring a common API to the stdlib


FWIW, the word "interoperate" fundamentally -- just taking it as inter- -operate -- means separate things being able to work together. If you can replace one thing for another thing they aren't "interoperable", they are "interchangeable".

Regardless, everyone else in this thread -- including people who seem to know what they are talking about -- seem to be defending the other normal usage of the word by talking about supposed issues with running multiple executors at once and bouncing between them.

Are you sure I can have a single async function which can in one statement await tokio and in the very next statement of that very same function await async-std without having to jump through some gnarly hoops?

https://news.ycombinator.com/item?id=24675155

^ Here is someone -- though from like two years ago -- asking this very specific narrow question and getting back a number of responses that claim this isn't possible (and so these systems are not only not interchangeable but also not interoperable).

(That said, there is one person on that thread who disagrees, but other people seem to disagree with them and the only link to any documentation provided -- but which was notably from someone else and so might simply have been the wrong reference -- is about a bunch of third-party glue.)


I think the issue in question is more mundane - if someone publishes an async database client library, its currently hardwired to a specific async runtime, so you cannot not easily use it if you are not already using that runtime. The common async abstraction being worked on, sets out to solve that.


I mean, I would hope "easily" would happen because I can always just use two async runtimes... if they were "interoperable". I can quite easily have a number of separate I/O abstractions and schedulers all happening at the same time in C++, for example, and I never think much about it: I just co_await and it, well, waits.


>hardwired to a specific async runtime

With version constraints, right? IIRC you can end up with multiple versions of multiple async runtimes in a project. I think it'd be better to only have a single one hardwired to the compiler like python's asyncio, even if it likewise sucked.


Rust libraries can have implementations for each async runtime and then you can pick between them using features. For example, when using sqlx with tokio I would have this in my Cargo.toml:

    [dependencies]
    sqlx = { version = "0.5", features = [ "runtime-tokio-rustls", "sqlite", "migrate" ] }
But I also could use async-std with:

    [dependencies]
    sqlx = { version = "0.5", features = [ "runtime-async-std-rustls", "sqlite", "migrate" ] }
So you should be able to get all your deps on a single runtime.


In practice both major runtimes have long-term stability guarantees (e.g. tokio has committed to maintained 1.0 for at least 5 years), so if you use libraries compatible with Tokio 1.0 then you're unlikely to have issues with this for some time.


Tokio 1 is the only async runtime used in production at scale, there's very little reason to use anything else. So you can seek out libraries that use tokio 1 and ignore anything else.


I haven't used much Python recently, but iirc you can just import your own runtimes, too. Twisted, gevent, that sort of thing. Having some sort of sane defaults bundled gives you a really nice baseline for interop, but doesn't preclude you from picking things that fit your use case better.

Definitely feels like one place where Rust kinda dropped the ball, at least from a user perspective in $CURRENTYEAR.


True, what I meant is that asyncio is part of the interpreter so you are very unlikely to have trouble with incompatible versions of asyncio. Twisted and gevent don't use async/await, but there's Trio which does and is saner than asyncio, but thankfully library authors aren't forcing it's usage. It's also possible to write libraries that use async but you bring your own runtime (trio or asyncio) with AnyIO.


It's not possible to have multiple 1.x.y versions of the same crate in your project, so you would need a really old library that depends on Tokio 0.2.x for that to happen. This isn't something that normally comes up in practice.


You might think of Rust's async paradigm as "half a continuation, turned upside down". With traditional coroutines, after an async operation completes, the language's runtime calls back into your code, and you actively call the next thing, "pushing" control flow down the pipe. Most languages with continuations manage this by "pausing" your function and keeping its stack frame around, which, in the general case, means your function's stack frame has to be heap-allocated, which is basically the language itself giving you a "pseudo-thread". You eventually get control back with the same stack frame, and as far as the language is concerned, how you get back there is none of your concern; that's its job.

In Rust's polling-based model, there's no "magic" saving of stack frames. You get some space to store state, but the runtime has to manage that memory itself. You can use the language to express "this is the next thing to call", but when you spawn an async I/O task and yield to it, you've already returned from your own function to the runtime, and it's the runtime's job to call your function again with the state it had stashed away. You then jump over the steps in your function that have already been handled and call into the next thing. It gets a bit more involved due to various bits of syntactic sugar, but that's the basic model. It's operating at a lower level of abstraction than many languages' coroutines or call/cc, which gives you the flexibility to customize the behavior to meet specific needs.

A runtime for generic desktop/server apps may maintain a thread pool and call back into your code on one of those threads. In WebAssembly, execution is single-threaded, but JavaScript promises may call into your runtime, and you have to dispatch that to the right Rust future. On embedded platforms, the data structures that the desktop/server runtime uses may simply not be suitable (e.g. because you have no general-purpose heap allocator), so you need to use a different approach with more constraints.

Interoperability between these runtime is possible. The key is that you need a task that's running on one runtime to be able to spawn a task on the other, with part of that task's job being to notify the first runtime that it's time to poll the "parent" task again. The mechanics vary depending on how each runtime handles task spawning.

As I understand it (from having skimmed some articles a while back), C++'s co_await isn't really all that different. Since we don't have the executors proposal as part of the standard yet, it's still a "bring-your-own runtime" sort of approach, with some kind of glue required at the boundaries between runtimes. Depending on which "flavor" of C++ coroutines you're using (e.g. push-based vs. pull-based), that interop might be easier than Rust's at the cost of other tradeoffs (e.g. more heap allocations).


> With traditional coroutines, after an async operation completes, the language's runtime calls back into your code, and you actively call the next thing, "pushing" control flow down the pipe.

I mean, with "traditional" coroutines, it isn't the "language's runtime" which calls back into my code: it is whatever code completed the event. I get that the important part of this sentence is the interest in "push" vs. "poll", but this concept of the existence of a "language's runtime" is a bit strange to me, as my mental model of a coroutine doesn't involve a "runtime" and certainly doesn't involve an "executor".

Instead, in a "traditional" coroutine, a continuation-passing transform is implemented in the compiler that changes -- in the best case of having this wrapped up in a Monad (which Rust could really use support for right about now) -- "do A and then B" into "do A while telling A to call the continuation of B when it is done, and otherwise immediately return". B isn't a "runtime" and isn't the "language"; you could argue B is an "executor" but it is unique to every call.

So if you want a no-op A it would be "call the continuation it is passed, immediately". This would result in behavior identical to the original synchronous function: we call A, which does whatever it wanted to do (in this case nothing) and then it chains through to B". As the call to the continuation is in tail position for this case, the resulting behavior should work out to being nearly identical (like the CPU won't be able to branch predict this as efficiently, but it will have similar overhead).

In a more complex scenario, the function A is going to do something mysterious and later get a callback from something -- which you might call a "runtime" but which almost certainly isn't implemented by the "language" -- on some random background thread running an I/O loop, or maybe due to a signal / handler from the operating system, or whatever random mechanism it has in place to run code later (which again: isn't part of the "language") and it will run the continuation it was passed.

This does, likely, result in some heap allocation somewhere in order to type erase the continuation in the general case. However, this seems to only be due to how the asynchronous code has been given a harder challenge of dealing with arbitrarily deep stacks with minimal overhead, while people seem totally OK with synchronous code causing random stack overflows :/. If you are willing to relax that assumption a bit then you can elide that allocation almost every time.

Like: just writing normal synchronous code also involves heap allocations as you have to allocate the stack space for the next frame every call. You can elide that in many cases by pre-allocating a bunch of memory for the stack, but a sufficiently-deep call stack will overflow the memory you allocated and break in some potentially-catastrophic manner. It is a fiction that you can write essentially anything of consequence without either heap allocations or some fuzzy understanding by the developer of how hard they can push it until it breaks.


> I mean, with "traditional" coroutines, it isn't the "language's runtime" which calls back into my code: it is whatever code completed the event. I get that the important part of this sentence is the interest in "push" vs. "poll", but this concept of the existence of a "language's runtime" is a bit strange to me, as my mental model of a coroutine doesn't involve a "runtime" and certainly doesn't involve an "executor".

Syntactically, many languages represent the operation of calling into the next continuation as a regular return (for green threads) or a regular function call (call/cc), but there's always some degree of runtime magic involved in the generated code. For instance, rather than just incrementing or decrementing the stack pointer, you've got to potentially set it to point into a totally different runtime-allocated stack. In principle, that can probably be implemented as just special-case code generation rather than an actual call into the runtime's routines, but that still leaves the need to clean up the current task's stack after it returns (or does a tail call into another stack), which will be either an explicit runtime call or rely on the runtime's garbage collector.

The real magic, though, isn't so much in the user-written continuations as it is on "async blocking" calls for things like I/O.

> In a more complex scenario, the function A is going to do something mysterious and later get a callback from something -- which you might call a "runtime" but which almost certainly isn't implemented by the "language" -- on some random background thread running an I/O loop, or maybe due to a signal / handler from the operating system, or whatever random mechanism it has in place to run code later (which again: isn't part of the "language") and it will run the continuation it was passed.

This is precisely what Rust's async runtime libraries are. They provide the event loop/callback mechanisms, which are necessary for truly async code. (Otherwise, what is there to wait for?) You can totally write and call an async function in Rust that doesn't use a runtime, but there's no way for it to "asynchronously block"; you'd just poll it and get back, "Yep, I'm done; here's my result."

> This does, likely, result in some heap allocation somewhere in order to type erase the continuation in the general case. However, this seems to only be due to how the asynchronous code has been given a harder challenge of dealing with arbitrarily deep stacks with minimal overhead, while people seem totally OK with synchronous code causing random stack overflows :/. If you are willing to relax that assumption a bit then you can elide that allocation almost every time.

It's not just the depth of the stack; it's that, once you yield, another task may take over the thread's flow control entirely. Let's say that function A spawns a coroutine B without waiting for it to finish. Now let's imagine that B allocates space on the same thread stack that A was using (on top of A's stack frame) and then yields. At the yield point, something (e.g. the runtime) has to say, "OK, B is stuck, so what do we run next on the thread?" Eventually, it's A's turn to finish running, and it returns. If it does this naïvely, it'll rewind the stack pointer, dropping the stack frames for both A and B. But B isn't done running yet; it's just blocked, so now we've got a problem because its stack just got clobbered. To avoid this, languages that use "stackful" coroutines have to allocate coroutines' stacks on the heap in many cases because the traditional single-stack model isn't just running out of space; it totally breaks down.

Rust uses stackless coroutines, which impose some restrictions on how the coroutine is structured (mostly involving unbounded recursion) so that the state the task has to store between yield points has a fixed size.


When you follow the restrictions on stackless coroutines, the coroutine can, as you mentioned, elide stack allocations for the "child" coroutines. If you want to make a call that can't have its stack allocation elided, you explicitly tell the runtime to spawn it as a top-level task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: