Very interesting read. I’ve written a number of syntax highlighters and have experienced nearly all these problems. I like the way codemirror currently does it (I basically am able to define my own parser in code and just pass the info CM needs), but will give Lezer a try. Ive found declarative forms like used by sublime and textmate to get you 80% of what you need in 1/10 the time, and then it takes 20/10 time to get that last 20%. One thing I do hope is that the new codemirror be in typescript, which would make it self documenting. Using the current code base feels a bit outdated, though it is so small and well done relatively that it’s still a joy to use.
I believe it will be in Typescript and, if you look at his last couple of posts, he mentions building a documentation generator for Typescript code specifically for this project.
”In a backtracking system, you never know when you've definitely parsed a piece of content—later input might require you to backtrack again
[…]
A GLR parser can split its parse stack and run both sides alongside each other for a while until it becomes clear which one works out.”
So, depth-first vs breadth-first, where the former is a bad idea, performance-wise, because the ‘better’ parses may be found late in the game, so that, early on, very few branches can be cut soon?