I think you can take a look at the DiamondQueue, which is a Demultiplexer and a Multiplexer combined so that thread A dispatches a bunch of tasks to a fixed set of worker threads (W1, W2, W3, etc.) and then these worker threads use a Multiplexer to deliver the results to another thread B. Thread A and Thread B can also be the same thread. There is an example here => https://www.coralblocks.com/index.php/the-diamond-queue-demu...
The DiamondQueue should be soon available for free at the CoralQueue project on GitHub.
>any recommendations for a low latency work queue (with in a jvm)?
I toyed around the ring buffer pattern a decade ago, creating a unicast one (using CAS on entries, and eventually a logarithmic scan for next readable entry, not to brute-force-scan them all), but I'm not sure that its latency is much better than that of a regular ThreadPoolExecutor (the throughput could be better though).
Latency also depends on whether it spins or blocks when waiting for a slot to read or write.
Yeah, modern JVM is a true miracle and you can be x5 productive (and safe!) compared to C/C++
Do you have any recommendations for a low latency work queue (with in a jvm)?
I want to spawn millions of micro-second-tasks per second, to worker cores..
I am on a massive cache CPU so memory latency hasnt raised its ugly head yet
EDIT: not LMAX please...