I had a similar result with storage I/O to NVMe SSDs. io_uring was slightly slower than my optimised Linux thread pool at 4k random-access I/O at about 2.5M IOPS in my benchmarks, and this despite the syscall overhead in the thread pool version being measurable.
io_uring was only a little slower, and there are some advantages to io_uring with regard to adaptive performance (because Linux doesn't expose some information to userspace that's useful for this, so userspace has to estimate with lag - see Go's scheduler), but I was hoping it would be significantly faster. Then again it was good to have an alternative to validate the thread pool design.
io_uring was only a little slower, and there are some advantages to io_uring with regard to adaptive performance (because Linux doesn't expose some information to userspace that's useful for this, so userspace has to estimate with lag - see Go's scheduler), but I was hoping it would be significantly faster. Then again it was good to have an alternative to validate the thread pool design.