There are no ACKs inherent in the UDP protocol, so "UDP offload" is not where the savings are.
There are ACKs in the QUIC protocol and they are carried by UDP datagrams which need to make their way up to user land to be processed, and this is the crux of the issue.
What is needed is for QUIC offload to be invented/supported by HW so that most of the high-frequency/tiny-packet processing happens there, just as it does today for TCP offload. TCP large-send and large-receive offload is what is responsible for all the CPU savings as the application deals in 64KB or larger send/receives and the segmentation and receive coalescing all happen in hardware before an interrupt is even generated to involve the kernel, let alone userland.
What is needed is for QUIC offload to be invented/supported by HW so that most of the high-frequency/tiny-packet processing happens there, just as it does today for TCP offload. TCP large-send and large-receive offload is what is responsible for all the CPU savings as the application deals in 64KB or larger send/receives and the segmentation and receive coalescing all happen in hardware before an interrupt is even generated to involve the kernel, let alone userland.