I started working through Crafting Interpreters, building up a language syntax and grammar from scratch. A lot of work and 75 pages of lex/parse logic and we now have a AST... that we can debug and inspect by looking directly at its sexp representation.
It was the ah-ha moment for me... why not express the source cost directly as that AST? Most languages require lots of ceremony and custom rules just to get here. Sexps are a step ahead (inherently simpler) since they're already parsable as an unambiguous tree structure. It's hard to unsee - reading any non-Lisp language now feels like an additional layer of complexity hiding the real logic.
Much of the complexity and error reporting that exists in the lexer or parser in a non-Lisp language just gets kicked down the road to a later phase in a Lisp.
Sure, s-exprs are much easier to parse. But the compiler or runtime still needs to report an error when you have an s-expr that is syntactically valid but semantically wrong like:
(let ())
(1 + 2)
(define)
Kicking that down the road is a feature because it lets macros operate at a point in time before that validation has occurred. This means they can accept as input s-exprs that are not semantically valid but will become after macro expansion.
But it can be a bug because it means later phases in the compiler and runtime have to do more sanity checking and program validation is woven throughout the entire system. Also, the definition of what "valid" code is for human readers becomes fuzzier.
> later phases in the compiler and runtime have to do more sanity checking
But they always have to do all the sanity checking they need, because earlier compiler stages might introduce errors and propagate errors they neglect to check.
> program validation is woven throughout the entire system
Also normal and unavoidable.
As far as processing has logical phases and layers, validation aligns with those layers (the compiler driver ensures that input files can be read and have the proper text encoding, the more language-specific lexer detects mismatched delimiters and unrecognized keywords, and so on); combining phases, e.g. building a symbol table on the go to detect unidentified identifiers before parsing is complete, is a deliberate choice to improve performance but increase complication.
> because earlier compiler stages might introduce errors and propagate errors they neglect to check.
Static analyzers for IDEs need to handle erroneous code in later phases (for example, being able to partially type check code that contains syntax errors). But, in general, I haven't seen a lot of compiler code that redundantly performs the same validation that was already done in earlier phases. The last thing you want to do when dealing with optimization and code generation is also re-implement your language's type checker.
Those rules help reduce runtime surprises though, to be fair. It's not like they exist for not purpose. It directly represents the language designer making decisions to limit what is a valid representation in that language. Rule #1 of building robust systems is making invalid state unrepresentable, and that's exactly what a lot of languages aim to do.
Note that this approach has been reinvented with great industry success (definitions may differ) at least twice - once in XML and another time with the god-forsaken abomination of YAML, both times without the lisp engine running in the background which actually makes working with ASTs a reasonable proposition. And I’m not what you could call a lisp fan.
It was the ah-ha moment for me... why not express the source cost directly as that AST? Most languages require lots of ceremony and custom rules just to get here. Sexps are a step ahead (inherently simpler) since they're already parsable as an unambiguous tree structure. It's hard to unsee - reading any non-Lisp language now feels like an additional layer of complexity hiding the real logic.