If all of this is happening in it's own thread (and there's probably 1 thread per connection), why add the overhead and complexity of something like this?
I've looked at his code and that is not how it works. It uses boost::asio. You specify how many io_service threads to run. It is not 1 thread per connection. You could easily have 10000 connections spread across 4 threads for example. The threads that handle the connections are the same threads which run the callbacks. You wouldn't want threads being blocked by callbacks that take a long time to run. You'd want to pass those operations off to a different thread pool. So you would want to be able to do what I have suggested.
Also, you said this adds complexity and overhead. I dispute that it adds complexity. For most people: "return stuff" vs "res.send(stuff)". And I wouldn't assume that it would add any overhead either. If you disagree, let me know when you've read the code and understand how boost::asio works.
It's important to note that requiring 10000+ real threads would be a huge limitation, performance-wise. At that scale you end up with a lot of overhead from process switching.
The difference is a big reason why Apache 2.2 would slow to a crawl and eat up gigabytes of RAM at 100% processor utilization on the same load that Nginx could handle with 10Mb of RAM and 20% processor utilization. [1] (I understand more recent versions of Apache now support polling [2].)