What most compiler tutorials are missing is coverage of intermediate representat...

westoncb · on March 9, 2020

The AST is an intermediate format. Is it important to have an intermediate language? Otherwise why does the AST not qualify?

My understanding is that having something like LLVM’s IR, for instance, is part of its decoupling the compiler frontend/backend, but probably wouldn’t be desirable for a single language compiler. But maybe I’m mistaken, or you are referring to something else :)

unlinked_dll · on March 9, 2020

Not OP and I agree, "IR" is just the data structure that holds the AST or a human readable version of it.

That said I think what they're getting at is that the interesting bits of modern compilers are transforms between ASTs to lower from one IR to another either to perform some kind of optimization or to replace an abstraction with an implementation before it gets to codegen.

For example if you have generators in your language it's pretty easy to see how to turn a "yield" statement into an AST node. But to actually make the system work you'll probably need a compiler pass over your AST to transform coroutine definitions to subroutine definitions and a state machine to represent the execution context and a constructor/destructor for the state.

Same goes for all interesting language concepts, in the compiler the interesting bit is the pass that transforms the top level AST/IR into more explicit IR and going through the pipeline to get to codegen. Which is as complex as everything else these days when it needs to be fast.

The example talks about this but doesn't dive too deep.

agumonkey · on March 9, 2020

Not really agreeing on this one. By IR people means middle ground between various semantics. Parsing gets you an abstract structure of the semantic-world you typed in. An IR would be a something bridging the gap between concepts and register machines, or whatever.

dataflow · on March 9, 2020

> Otherwise why does the AST not qualify?

I'm sure someone will disagree with me, but to me, the point of IR is that it's a concretization, not an abstraction. Consider that your source code can look like

  int foo(void);

or

  class C;

or

  template<class T> struct vector;

then you see that all of these translate into an AST, but don't result in any code getting generated. Conversely, given a template definition for vector, vector<int> and vector<string> would result in multiple intermediate representations for the exact same chunk of AST.

Calling the parsed representations of these "intermediate representations" when they're not corresponding to generated code would render the term practically useless. You might as well call the source code itself IR at that point and claim IR has no value.