This is a trade-off with any emulation really. The more hardware accurate your emulation gets, the more you have to keep track of in virtual registers and state, and all that extra tracking adds up in a hurry.
Native virtualization dodges a lot of that performance penalty by using the hardware features directly where possible. But that's only really feasible on systems that have nearly identical hardware; x86 is pretty standardized at this point and virtualizes quite well. ARM is much more varied and harder to virtualize, even on other ARM platforms due to differing feature sets.
It's not that in this case, this is plain binary translation and not cycle accurate emulation. QEMU is super slow on x-to-x translation as well (something like 8x to 80x slowdown). Its translator is inefficient (for the sake of portability, the source native code is first translated to an IR and then the IR is compiled to target native code with limited optimisation) and it emulates all floating point instructions in software.
Native virtualization dodges a lot of that performance penalty by using the hardware features directly where possible. But that's only really feasible on systems that have nearly identical hardware; x86 is pretty standardized at this point and virtualizes quite well. ARM is much more varied and harder to virtualize, even on other ARM platforms due to differing feature sets.