Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yet another “compiling” course that puts all the emphasis on parsing.

Rule of thumb: parsing/lexing shouldn’t takes more than 10% of your compiler course.



This attitude bugs me a lot. It seems really common, especially in more recent texts about language design and implementation, that parsing is heavily de-emphasized to the point where practically nobody talks about it. See Essentials of Programming Languages by Friedman & Wand, the relevant sections in SICP, Programming Languages: Application & Interpretation (which goes so far as to call it a distraction).

I get that parsing is more of an implementation detail and doesn't really belong to the space-brained realm of language design per se, but it's a bit annoying that most texts refuse to give any space to the topic, and rely on your language being S-expression based or assume you're going to use a parser generator. Like, in the real world, even if one will never actually implement a fully-fledged programming language, you're still probably going to have to parse things sometimes. I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

It reminds me somewhat of the situation in analysis, where there are lots of theorems that aren't written down anywhere because literally every book states them as "easy" exercises. Maybe I'm looking in the wrong places, but I can't find much in the way of concrete guidance on implementing parsers. I'm aware of the beautiful series on parsing theory by Aho & Ullman ("The Theory of Parsing, Translation, and Compiling"), but those are more focused on theory rather than implementation


On the other hand, historically (and as the parent you're replying to points out), many compiler texts have spent a MAJORITY of their time on parsing, and rush through the actual interesting parts of compilation.

> I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

Terrence Parr's "Language Implementation Patterns" spends quite a bit of time on parsing, and parse tree->ast conversyions.


Thanks for pointing that one out -- I had written that one off before as an ANTLR book but looks like it covers more material than I gave it credit for


> Like, in the real world, even if one will never actually implement a fully-fledged programming language, you're still probably going to have to parse things sometimes.

That is definitely true, but in practice there isn't much to say about it, because sophisticated parsers turn out not to be particularly important; it works out better overall to design simple grammars, and then the parsing is easy.

- If you're a beginner, you'll write a recursive descent parser, because that's the simplest technique, and it lets you focus on your project instead of a new, unfamiliar tool.

- If you're writing a domain-specific language, or a config format, or something of that nature, you'll use whichever parser generator integrates most conveniently into your workflow, and you'll design your grammar around whatever its manual tells you to do.

- If you're writing a full-scale language compiler, you'll go back to recursive descent, because that offers the easiest way to recover from errors and report informative messages. Maybe you'll throw in precedence-climbing for operators.

> I would love a book that goes into detail about different parsing techniques and considers best practices and patterns and tradeoffs/design considerations -- would pay good money for that

I would also read such a book, but it would be more of a book about parser generators than a book about parsers.


Almost all real-world projects that are language-like or compiler-like will need a parser. A much smaller fraction of them will need register allocation, instruction selection, optimization, code generation, etc.

For every big, deep, native code compiler, there are a hundred template languages, config files, report generators, etc. all of which are real programs providing real value for actual people.

Emphasizing parsing provides the most value for the greatest number of people. The folks that do end up needing more back end depth will still have the resources available to learn it.


Contrarian take: lots of people doing parsing, has, on the whole, highly negative value, and template languages and config files are a prime example of this.

Everybody and their dog thinks it necessary to inflict some new sub-par language on us when in about 99.9% of cases they should just either have stuck to s-expressions or some suitable subset of a popular programming or existing config language with a relatively sane syntax (blaze/bazel did that right, cmake did that very very wrong).

When was the last time you looked at some config file and thought, wow I'm so glad they didn't use toml or python or whatever, but instead made up some completely new syntax nothing in the world apart from this tool itself can parse and that I can't programaticaly manipulate?

When was the last time you thought, wow I am so glad that someone invented a new templating language that creates some new injection vulnerabilities, because no one apart from the lisp people ever seem to have worked out that if you want to interpolate into something tree shaped, you should have a tree-based interpolation syntax? Because although sexps and quasiquote solve this very nicely and concisely everyone else still seems to love string-bashing plus some ad-hoc "escaping" system for this. And one reason for this is of course precisely the enormous abundance of idiotic config languages that can't be easily manipulated as anything than opaque strings.

[Edit: if you do create a new config file language, pretty please provide some means to directly query and losslessly manipulate it; for the lossless part you will either need to have first class comments unambiguously attached to a particular syntactic construct and agreed upon deterministic formatting or IDE-style complexity, the first one is probably a better idea]


Do you have a ‘best of list’ for the resources when interested in back-end topics.


I wouldn't consider myself any kind of authority on "best of", but I like the Dragon Book, and Engineering a Compiler. I've heard good things about Appel's Modern Compiler Design.


Parsing takes a weekend. The rest takes a year to get a rudimentary compiler working.


I disagree.

As opposed to most compiler articles, this one actually covers code generation for every section of its chapters, which is really great.

I also like that every chapter focuses on a specific feature and describes how to implement it end to end: lexical/syntactic parsing, AST, and x86_64 generation.

Great series!


On the other hand, parsing text could easily be a very valuable course on its own. You just have to not keep it restricted to programing languages, and include the knowledge created on this century.


anything better you'd recommend?


parsing is cool




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: