Clearly one character lookahead is not sufficient, because e.g. '0. toString()' ...

klodolph · on Sept 23, 2020

> There's no question that the lexer could in principle disambiguate with unbounded lookahead, but it would be a bit hacky, as you'd effectively be implementing part of the parser in the lexer (by attempting to figure out if if was a method call, which is really the parser's job).

This is actually not hacky. It's just a rule that the "." cannot be followed by [ \t]∗\w, which is a simple negative lookahead assertion. Replace \w with whatever you use at the start of identifiers.

It is extremely common for languages to have corner cases like this in the lexer to make the language more usable. For example, consider the rules in JavaScript or Go concerning where you can put line breaks. Or the rules for JavaScript concerning regular expression literals, which must be disambiguated from division.

> So basically, you could easily write a parser that allowed '0.toString()', but you'd either have to piece numeric literals together in the parser or add nasty hacks to the lexer.

This is factually incorrect. As I explained, you would only need one character of lookahead. There is no need to parse "0. toString()" successfully. If you wanted to parse "0. toString()" correctly, you could use unbounded lookahead, which is fairly simple in practice (speaking as a sometimes parser writer). I don't get why you say it is hacky, this is all just a bunch of regular expression stuff (in the traditional sense of "regular").

foldr · on Sept 24, 2020

>If you wanted to parse "0. toString()" correctly, you could use unbounded lookahead

Right, which is what I said. If you agree that unbounded lookahead is required then we don't really disagree, except on the somewhat subjective question of how 'hacky' that is.

If I understand correctly, you suggest that unbounded lookahead could be avoid by allowing '0.toString()' but not '0. toString()', while still allowing both '(0).toString()' and '(0). toString()' and both 'foo.bar' and 'foo. bar'. That would produce highly counterintuitive results in some instances:

    Parsed as one expression:
    {}.
      foo

    Parsed as two statements:
    0.
      toString()

But again, it is really a subjective judgment. Obviously you could modify Javascript in this way, and on that point there is no disagreement.