I thought I had seen it before. Verified you can actually see the whole chapter in a nicer (book vs web) layout in Amazon's "Look inside" feature since it's at the beginning of the book.
These posts are great for history, but regular expression implementation has moved on considerably from early Thompson implementations whether backtracking or NFAs. There is a considerable body of literature about regex implementation, including many quite convincing implementations and alternate formulations, some of which weren't even done by people at, or from, Bell Labs!
We seem to have something of an Eternal September of regex implementation knowledge (abetted by Russ Cox's amazingly selective bibliography in his posts introducing RE2).
Skimming over it, the implementation looks inefficient:
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
That can be used to match short strings, but not to grep through a filesystem. A good regex matcher has runtime O(N*M) where N is the length of the regex (typically very short) and M is the length of the scanned text.
A surprising number of regex tools used in the wild employ a backtracking algorithm like this one. Some popular features such as lookaheada, lookbehinds, and backreferences can't always be matched using a finite automaton.
Seeing code like this I can understand why the pioneers though C code could be pretty.
But now all I see there is something more akin to a demonstration of ice carving with chainsaws than something that should be used in a production system.
We software types must be masochists. It took Rob Pike an hour in the comfort of his private office to write this piece of code. How is this _at all_ like a typical software interview?
Have you seen that used for interviewing? Yes, some people can do it in a 1-hour interview, but wouldn't you rather ask something that can't be done from memory?
I know engineers who have asked it, yes. It really seems to be a test for "who was paying attention to that one lecture in CS205 near the end of the semester"
Wow. That's quite a tricky test, too -- if you study Thompson's original paper you can see plenty of pitfalls. E.g. his main code would loop forever on a star of a star, and he solves this in an appendix with a preprocessor to rewrite regexes of that kind, only it looked like that rewriting would blow up the size of the regex exponentially in the length of the stack of stars. No hire!
I have seen this one go by, which, from the perspective of someone who has been doing regex implementation for 14 years, more or less, is fairly amusing. It seems unfair, as someone aware of potential pitfalls of backtracking would presumably feel obligated to build a automata-based solution; someone aware also of the potential pitfalls of DFAs (failure to determinize quickly on some nasty patterns) might also feel obligated to build a Thompson or Glushkov NFA solution as well.
It's quite possible you might turn someone competent into a gibbering wreck (they are asking me to implement RE2 or Hyperscan in an hour!) :-)
I used to use this exact problem years ago when interviewing candidates at Google. At that time I thought it was a great question, but now I'm not that sure anymore if this approach to interviewing is the best way to assess candidates.
2013: https://news.ycombinator.com/item?id=5672875
2011: https://news.ycombinator.com/item?id=2723366
Maybe others?
p.s. these links are just for curious readers; reposts are ok after a year or so—see https://news.ycombinator.com/newsfaq.html.