Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I might be the only other person to solve multipattern regex searching. It’s easy conceptually, and fiendishly tricksy when you try to get the tests to pass. You want to take advantage of commonality between patterns but the matches must still be independent. The engine I wrote uses a classical interpreter backend, optimized well enough to beat TRE and the old PCRE handily. I’d read about the bit parallel algorithms and play with them a bit, but the pattern sets we’d throw at any given BP algorithm would break it, as any given BP algorithm comes with several caveats.

The genius of the HyperScan team is that they threw damn near every algorithm at their pattern sets, including some they invented, and divvied up the patterns into subsets that fit the particular algorithms well. Usually getting the tests to pass with one backend is the last act of any regex author—once it ain’t broke, you’re too exhausted to fix it. Contemplating the complexity that comes with a multi-algorithm backend makes me want to cry. But HyperScan did it.

So, to put it in perspective with linear algebra, imagine a library that first spent time analyzing given matrices, and then split them into submatrices and operated on them with different algorithms according to the particular patterns detected therein. That’s kind of insane to contemplate in a linear algebra—it’s really not a domain that compares well at all—but it’s how HyperScan works... and that ignores all the crazy ISA tricks it used.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: