Hacker News new | past | comments | ask | show | jobs | submit login
An ultra-simplified example of a modern compiler written in JavaScript (github.com/jamiebuilds)
93 points by alokrai on March 9, 2020 | hide | past | favorite | 27 comments



9/10 times when I see a "compiler" on GitHub, it's just an interpreter, or the compilation phase is completely missing.

In this case I believe the "compiler" is an "ast printer".


Agreed.

I've written a couple of interpreters now, and even a simple maths-compiler that generated assembly-language output.

One of the things that made me postpone these projects for years was seeing so many parser/lexer examples which were just evaluating simple expressions such as "3 + 4", or "3 + 4 * 5". Going from there to a real language with functions, conditionals, etc, is a big step with no guidance.

Still this is a nice project, and it is well-documented even if it is "small".


The last stage does change the AST into code. I guess "transpiler" might be more apt than "compiler".

For an input of "(add 2 (subtract 4 2))" you get an output of "add(2, subtract(4, 2));"


Indeed. There's not even a meaningful line between compiler and transpiler. All compilers take code in one format and output it in another, and there are plenty of widely-accepted compilers that output C code or assembly, rather than machine code.


As far as I can tell you are saying "just an interpreter" and referring to this as an "ast printer" because the output language is not machine code.

I'm not sure why that would be so significant. Don't get me wrong, I'm sure targeting some real machine code presents plenty of unique difficulties—but the interesting thing about compilation to me (and I'd wager a significant proportion of other readers here) is effective ways of structuring translation processes.

In the case of this project, the goal is to be a minimal demonstration, distilling the essence of how a compiler is structured. The fact that the output language is not machine code doesn't affect the essential structure of a compiler (as far as I know).


What most compiler tutorials are missing is coverage of intermediate representations. There are a billion parsing and calculator tutorials, but not nearly as many that show the process of turning, say, C-style code, into some sort of intermediate format, before emitting machine code.


The AST is an intermediate format. Is it important to have an intermediate language? Otherwise why does the AST not qualify?

My understanding is that having something like LLVM’s IR, for instance, is part of its decoupling the compiler frontend/backend, but probably wouldn’t be desirable for a single language compiler. But maybe I’m mistaken, or you are referring to something else :)


Not OP and I agree, "IR" is just the data structure that holds the AST or a human readable version of it.

That said I think what they're getting at is that the interesting bits of modern compilers are transforms between ASTs to lower from one IR to another either to perform some kind of optimization or to replace an abstraction with an implementation before it gets to codegen.

For example if you have generators in your language it's pretty easy to see how to turn a "yield" statement into an AST node. But to actually make the system work you'll probably need a compiler pass over your AST to transform coroutine definitions to subroutine definitions and a state machine to represent the execution context and a constructor/destructor for the state.

Same goes for all interesting language concepts, in the compiler the interesting bit is the pass that transforms the top level AST/IR into more explicit IR and going through the pipeline to get to codegen. Which is as complex as everything else these days when it needs to be fast.

The example talks about this but doesn't dive too deep.


Not really agreeing on this one. By IR people means middle ground between various semantics. Parsing gets you an abstract structure of the semantic-world you typed in. An IR would be a something bridging the gap between concepts and register machines, or whatever.


> Otherwise why does the AST not qualify?

I'm sure someone will disagree with me, but to me, the point of IR is that it's a concretization, not an abstraction. Consider that your source code can look like

  int foo(void);
or

  class C;
or

  template<class T> struct vector;
then you see that all of these translate into an AST, but don't result in any code getting generated. Conversely, given a template definition for vector, vector<int> and vector<string> would result in multiple intermediate representations for the exact same chunk of AST.

Calling the parsed representations of these "intermediate representations" when they're not corresponding to generated code would render the term practically useless. You might as well call the source code itself IR at that point and claim IR has no value.


That's not the point. A compiler has many phases, tokenizing/parsing/building the AST is just the first one, and for many languages the easiest of them all, specially if performance is not a concern.


transpiler. lisp to c


for the legendary ASCIIISA


If anyone is curious, I threw together a similar (although not commented yet) example of how I would write this using parser combinators in Haskell: https://gist.github.com/neilparikh/b8a9c758ebbbb5a3ee9b2ed99...

The JS example wrote everything from scratch, so to do the same, I wrote my own parser combinator library (based on https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf), but you could use an existing library, and get rid of `Parser.hs`.


For those interested in this approach, one of the authors (Hutton) of the paper has a great video explaining functional parsing https://youtu.be/dDtZLm7HIJs


There was a person who began rewriting this using reason [1]. Sadly, he never got to complete it.

I began tinkering with F# recently and found that ML languages are perfect for parsing. Is there a version of this in F# or ocaml?

[1] https://medium.com/@javierwchavarri/building-the-super-tiny-...



I wrote one in Haskell (https://news.ycombinator.com/item?id=22523088), but I think it might be understandable for others, since I don't use many Haskell-isms other than do-notation and typeclasses.


Maybe interesting for a comparison: 21 years ago I wrote my own tiny demo compiler in JS, much inspired by Niklaus Wirth.

https://www.masswerk.at/demospace/eal/eal.htm

See the page source for the code. (Probably fun, old syntax both for HTML and JS, pre-ECMA JavaScript [meaning, the semicolon isn't a delimiter, but a separator], even the indentation style is outdated. Mind that back then, even the support of JS object literals wasn't guaranteed.)

Syntax, etc: https://www.masswerk.at/demospace/eal/syntax.htm

Byte code reference: https://www.masswerk.at/demospace/eal/pcode.htm


Reminds me of this old screencast about writing a compiler in ruby:

https://www.destroyallsoftware.com/screencasts/catalog/a-com...


I want to dispute the 'modern' claim.

I don't know my dates well enough. But is there anything in here that isn't at least 50 years old?

The high-level is tokenize->parse->ast_transform->codegen.

The low-level looks like `char = input[++current];`


What does “modern” mean in this context?


I don't know but I don't think I want any part of it. A compiler is a very foundational piece of software, where performance is important, you run it many times, often on large things. It should be written in a a language with a very high performance ceiling and quick start up time, imho.


web scale


Petty cool. Very well documented too (I only read through a few hundred lines before bookmarking it for later).


Wouldn't this be closer to a "transpiler"?


Does this parse in linear time?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: