Making Rust Fly with MIR [video]

kibwen · on Aug 6, 2016

This is actually a rather light introduction to MIR, I'm not sure if we have an ideal summary document somewhere but for now the original RFC may be more instructive: https://github.com/nikomatsakis/rfcs/blob/mir/text/0000-mir....

unfamiliar · on Aug 5, 2016

I'm interested in things like how the intermediate representation allows easier optimisations, the control flow graph optimisation algorithms, and the general optimisation passes. Can anyone suggest a good resource where I can learn this stuff well enough to implement it? Where do the people working on this compiler tech get their background/base knowledge from?

Retra · on Aug 6, 2016

Generally speaking, the Rust language is quite rich, so it's difficult to for the computer to reason about performance, but easy for it to reason about the high-level semantics. On the other end, the LLVM 'assembly' is very primitive, so it's easy to reason about performance, but not in a way that would preserve the desired semantics.

An intermediate language allows you to describe the high-level semantics in a more primitive form, so it is easier to both manipulate and reason about while preserving the high-level semantics. In this case (as far as I understand it,) MIR needs to be as primitive as possible while still modeling Rust's scoping, safety, and type rules. This is an optimization design pattern that occurs quite frequently in practice.

jakub_h · on Aug 6, 2016

"but easy for it to reason about the high-level semantics."

Is it? Does it make it possible to argue about such things as that mappings can be fused, for example? I keep wondering about the possibility of very-high-level algebraic transformations of programs (although admittedly, this could be shifted into some kinds of CASE tools instead).

Retra · on Aug 6, 2016

It makes it easy to verify that the high-level semantics are only those allowed by the language. It's fairly easy to look at an AST and tell if it is a legal construct in your language.

>Is it? Does it make it possible to argue about such things as that mappings can be fused, for example?

It makes it possible to know that you are "fusing mappings." In the assembly language, you would have a hard time knowing that this is what you were doing, since it communicates semantics with too much granularity.

pjmlp · on Aug 6, 2016

Maybe Swift's presentation about their SIL is something worthwhile for those that want to learn about this kind of stuff.

http://llvm.org/devmtg/2015-10/#talk7

steveklabnik · on Aug 6, 2016

https://blog.rust-lang.org/2016/04/19/MIR.html Has some stuff about MIR specifically, but is still pretty high level.

darksaints · on Aug 6, 2016

Are all types resolved before the Mir stage?

tatterdemalion · on Aug 6, 2016

Yes, though lifetimes are not.

darksaints · on Aug 6, 2016

Interesting. From what I understand, lifetime resolution is pretty deterministic...mallocs always happen at the beginning of a lifetime and frees always happen at the end of the lifetime. Would it be possible during the Mir stage to perform analysis and optimization to modify lifetimes? (ie. Array preallocations, chunked frees, etc.)

tatterdemalion · on Aug 7, 2016

The short answer to your question is 'possibly.'

Ownership and borrowing are actually two complementary systems, not one. Because of the ownership/initialization rules Rust has we know that mallocs occur when the object is constructed and that the data is freed when it goes out of scope, but for the borrowchecker dynamic memory has really been abstracted away - and the borrowchecker is guaranteeing the validity of pointers into the stack just as much as pointers into the heap. In fact a language could have ownership without having borrowing (at one point, Rust did).

The main thing performing borrow checking on MIR allows for is actually a more expressive lifetime system, particularly what are called single entry, multiple exist (SEME) or non-lexical lifetimes - that is, knowing that in one branch a borrow lasts longer than in another branch. The classic example of how this is useful is that the naive way to write "get or insert" for a hashmap is a borrow checker error today. With non-lexical lifetimes it won't be an error, because it will know that you are only performing the "insert" when you didn't get a reference from the "get."

Its possible that because the MIR has stronger guarantees than LLVM IR can normally assume, there will be optimization passes that make sense on it to reorder codepaths to be more optimal, like what you were suggesting, but I don't know.