I know you can't use vpp/dpdk, but would you expect similar performance improvements, or maybe even easier ones if your network stack was in user space? Have you considered any hardware offloads, like TSO?
Edit: just noticed some else asked the dpdk one, so feel free to ignore that.
Edit: just noticed some else asked the dpdk one, so feel free to ignore that.