Autovectorization for SIMD is, well, not great. The worst problems with it are w...

Autovectorization for SIMD is, well, not great. The worst problems with it are when you run it against already SIMDed code (it tends to mess it up), which doesn't apply here, but it also tends to fail a lot and it relies on a lot of memory aliasing info. That's why it works well in Fortran, which is much looser than C.

I think a reasonable portable bytecode would have stricter memory rules than C and so would be harder to optimize like this.

So that's why I proposed having variants, but you could also invent some abstract vector operations and then scalarize them if they're not available. That's how shader languages do it.