For the curious, this is fresh from VDD24[1]. The 94x is mildly contested since ...

For the curious, this is fresh from VDD24[1]. The 94x is mildly contested since IIRC how you write the C code really matters (e.g. scalar vs autovectorized). But even with vectorized C code the speed up was something like >10x, which is still great but not as fun to say.

One clarification: this is an optimization in dav1d, not FFmpeg. FFmpeg uses dav1d, so it can take advantage of this, but so can other non-FFmpeg programs that use dav1d. If you do any video handling consider adding dav1d to your arsenal!

There’s currently a call for RISC-V (64-bit) optimizations in dav1d. If you want to dip your toes in RISC-V optimizations and assembly this is a great opportunity. We need more!

[1]: https://www.videolan.org/videolan/events/vdd24/