TLDR; *I’ve claimed a few times that our futures library provides a zero-cost ab...

rcthompson · on Aug 11, 2016

And in practice:

> ...we’ve added a comparison of minihttp against a directly-coded state machine version in Rust. The two are within 0.3% of each other.

crudbug · on Aug 11, 2016

Generating state machine out of async fluent APIs was enlightening.

How are are handling errors ? - One Error state with the context ?

hawkice · on Aug 11, 2016

This is probably better answered by how Rust handles errors generally:

https://doc.rust-lang.org/book/error-handling.html

Suffice it to say, similar magic tricks have already been at work to make handling errors not soak up a lot of space, and it'd work well with something like this, I think.

merb · on Aug 11, 2016

that sounds great but I guess that will make the compilation time bigger.

alexcrichton · on Aug 11, 2016

In Rust it's frequently the case that slow compilations are dominated by generating and optimizing LLVM IR. This codegen step (generating LLVM IR) often takes awhile just because we're generating so much IR.

Rust takes an approach with generic functions called monomorphization which means that we generate a new version of each function for each set of generics it's instantiated with. This means that a future of a String will generate entirely different code from a future of an integer. This allows generics to be a zero cost abstraction because code is optimized as if you had substituted all the generics by hand.

Putting all that together, highly generic programs will generally trend towards higher compile times. With all the generics in play, there tends to be a lot of monomorphization which causes quite a lot of LLVM IR to get generated.

As with many aspects of Rust, however, you have a choice! Rust supports what we call "trait objects" which is a way to take a future and put it behind an allocation with a vtable (virtual dispatch). This forces the compiler to generate code immediately when a trait object is created, rather than down the line when something is monomorphized.

Put another way, you've got control over compile times if you're using futures. If you're taking a future generically and that takes too long to compile, you can instead take a trait object (or quickly convert it to a trait object). This will help cut down on the amount of code getting monomorphized.

So in general futures shouldn't make compilation worse. You'll have a choice between performance (no boxes) and compile times (boxing) occasionally, but that's basically already the case of what happens in Rust today.

eddyb · on Aug 11, 2016

Do note that with MIR the focus is polymorphic optimizations - reducing the LLVM IR for all monomorphizations of a generic function, at once.

Unlike C++ templates, Rust enforces a single definition with uniform semantics, for a generic type/function (specialization going through the existing trait static dispatch mechanism), so we can take advantage of that to reduce compile times.

SideburnsOfDoom · on Aug 11, 2016

Hm. I've heard arguments that C# or Java is slow for multiple reasons, but never because of the minuscule overhead of a virtual method dispatch when using objects behind interfaces (kinds similar to trait objects).

It's interesting that this is seen as significant here. Are we dealing with much shorter timescales, or just being eager to optimise everything?

gpderetta · on Aug 11, 2016

Virtual dispatch per se is not terribly slow, as long as the branch is predictable by the CPU. The problem is that virtual dispatch prevents the sort of aggressive inlining and interprocedural opimizatios that C++ compilers are known to do. C# and Java JITers get around that via runtime analysis and speculative inlining, but that is done at runtime and eats away some of the precious little time available for optimisations.

Edit: spelling

SideburnsOfDoom · on Aug 11, 2016

Put it this way:

Cost of a branch misprediction is 10s of cpu cycles. (1) Measured in gigahertz (10^9 cycles per second).

Time to turn around a web request is, if you're very lucky and have done the work, mainly about getting a value from an in-memory cache at multiple milliseconds (2). That's 1 / (10^3) seconds.

If you're not lucky, 10s or 100s of milliseconds to generate the response.

It seems that the second duration is best case around 10^6 times longer. I would not sweat the first one.

1) http://stackoverflow.com/a/289860/5599 2) http://synsem.com/MCD_Redis_EMS/

gpderetta · on Aug 11, 2016

Contrary to popular belief, not all C++ programs (or rust FWIW) are web servers serving HTTP requests over the Internet.

SideburnsOfDoom · on Aug 11, 2016

Yep, that's why I'm asking about the use-cases in the grandparent comment.

gpderetta · on Aug 11, 2016

As an example, many real-time systems are often a giant ball of messy asynchronous code and state machines. Futures can help with that, although lately I have found that somtimes the best, cleanest, way to implement a state machine is to make it explicit.

haimez · on Aug 12, 2016

How much do you attribute that to the benefit of creating a high barrier to entry for modifying that code? Could this be summarized as: code that inexperienced devs can't understand, stays performant because they can't figure out how to change it?

gpderetta · on Aug 12, 2016

None of the teams I've worked with had such a policy and certainly I wouldn't work in a team like that.

qu4z-2 · on Aug 11, 2016

I assume at least part of it is that "zero-cost abstractions" is a fairly objective and boolean metric to calculate. "Is this performance impact significant enough to worry about?" would probably result in a lot more bikeshedding.

ctangent · on Aug 12, 2016

A tremendous amount of effort has gone into the CLR towards optimizing interface dispatch, because at one time it was slow. Interface dispatches are cached at the call site to avoid real virtual (vtable) dispatch, just like a Smalltalk or JavaScript VM would.

Manishearth · on Aug 11, 2016

Yeah, using closures in zero-cost abstractions (like iterator or now future adaptors) does make compilation slow. LLVM isn't used to this kind of code, and takes time optimizing it.

However, now that we have MIR we can optimize this on the Rust side first, before the type info has been lost. This was one of the reasons MIR was added :)

dbaupp · on Aug 11, 2016

I'm... curious about why you think LLVM isn't used to the sort of code that results from these sort of zero-cost abstractions: C++ code is also very aggressive about templates that inline away to nothing, and modern C++ especially has many similar abstractions to Rust (cf. closures, and the range proposal). Rust isn't special in that particular regard.

Maybe you mean LLVM is very low-level and thus it sees everything in the abstractions, which obviously takes time to sort through (and is especially wasteful to do multiple times for different monomorphisations). Thus, using MIR to optimise at a higher level can simplify code more efficiently before it hits LLVM.

yazaddaruvala · on Aug 11, 2016

I believe he meant, lots of closures. Which is not normal in C or C++, but is the normal in Rust because they are Zero-Cost.

Without knowing anything about LLVM's internals, I would assume it doesn't anticipate so much closure chaining, and therefore doesn't leverage the fact that they are so easy to inline.

dbaupp · on Aug 11, 2016

Lambdas in C++ are almost identical to Rust ones (each closure is a unique unnameable struct containing the captures, with an overloaded operator()), in fact, the current Rust scheme was explicitly inspired by C++11 closures. Historically (C++98), it's true that not much code used closures, because they didn't exist at a language level, but the modern language uses them much more aggressively, even pushing hard on making them more and more flexible (e.g. auto/generic closures in C++14). For instance, sort[1] can take a closure which is more efficient that way than a function pointer, and easier to use than a manually implemented custom type.

Also, closures are just structs with functions that take them as arguments, so if the compiler can handle those, it can handle closures.

[1]: http://en.cppreference.com/w/cpp/algorithm/sort

gpderetta · on Aug 11, 2016

Exactly. And hand written function objects were already common in C++98. C++1* lambdas are just (very sweet) syntactic sugar.

int_19h · on Aug 11, 2016

Lambdas in modern idiomatic C++ are also essentially zero-cost, or at least they're supposed to be - they're not heap-allocated, and are strong candidates for inlining.

adrianN · on Aug 11, 2016

Does clang directly go for LLVM or is there an intermediate form that is optimized similar to MIR?

dbaupp · on Aug 11, 2016

I believe it goes straight to LLVM IR.

Manishearth · on Aug 12, 2016

Yeah, I meant the latter, sorry.

pcwalton · on Aug 11, 2016

Compared to what?

merb · on Aug 11, 2016

compared to a non free rust future