This is an old famous paper which got a lot of arguments wrong, and a lot right.
The famous result is the old computed goto vs switch argument, which basically comes down to switch adds bounds-checking overhead. He didn't get that. Most commenters neither. He also cemented the poor wording "threaded interpreter"
and "indirect branching" for different concepts.
It's also a totally unimportant feature of making interpreters fast.
Indirect call overhead is significant and measurable. Jit compilers made that obsolete.
He didn't get the case against perl. perl has a huge op and data overhead, as is way too dynamic. But the dispatch loop is better than the others, as perl always returns the pointer to the next op already. There's no dispatch.
Much better interpreter optimization papers came later with lua, reiterating old lisp results. Note: not luajit. lua already had a perfect interpreter loop. lua uses the lisp architecture of small tagged dynamic data, plus a tiny opset, efficiently stored in a single word, with 2-3 arguments. python with its linearizing cfg optimizer tried to mimic it a bit, but is still too slow. Many one-word primitives allow fast stack allocation and updates, and one-word ops allow fast icache throughput.
The famous result is the old computed goto vs switch argument, which basically comes down to switch adds bounds-checking overhead. He didn't get that. Most commenters neither. He also cemented the poor wording "threaded interpreter" and "indirect branching" for different concepts. It's also a totally unimportant feature of making interpreters fast.
Indirect call overhead is significant and measurable. Jit compilers made that obsolete.
He didn't get the case against perl. perl has a huge op and data overhead, as is way too dynamic. But the dispatch loop is better than the others, as perl always returns the pointer to the next op already. There's no dispatch.
Much better interpreter optimization papers came later with lua, reiterating old lisp results. Note: not luajit. lua already had a perfect interpreter loop. lua uses the lisp architecture of small tagged dynamic data, plus a tiny opset, efficiently stored in a single word, with 2-3 arguments. python with its linearizing cfg optimizer tried to mimic it a bit, but is still too slow. Many one-word primitives allow fast stack allocation and updates, and one-word ops allow fast icache throughput.