Last multithreaded C++ app I did was high speed deep packet analysis. For our purpose, we could do packet parsing in separate threads, but needed to consult and update shared data structures to keep track of stream state. It's awkward to do that in separate processes, and it doesn't lend itself to a GPU.