Yes, younger devs grown up on the myth of C and C++ being always fast, have missed the days when inline Assembly was a higher count than pure C and C++ code.
I have seen applications for MS-DOS, effectively using C as a Macro Assembler, only the data structures and high level logic was C as if Macro Assembler macros.
> Yes, younger devs grown up on the myth of C and C++ being always fast, have missed the days when inline Assembly was a higher count than pure C and C++ code.
And still is in VLC. (Okay, maybe not higher, but they do use a crapton of assembly in their decoders, and it does speed them up by a factor of 10 or so today.)
Video decoding has always been a prime example for SIMD stuff, however I wonder how much of that code VLC devs could wipeout, assuming hardware vídeo decoding being available everywhere.
Compilers beat hand written Assembly for the general use cases.
Now beating special use cases, like using vector instructions to parallel process video decoding streams is another matter.
It is no accident that after all the efforts improving Java and .NET JITs for auto-vectorization across various vendors, both platforms now expose SIMD intrinsics as well.
The choices and resulting codegen are fairly different. Only one of them works "properly" as of today. Though I'm open to be proven wrong once Panama vectors get stable in the Java land.
They will only be out of preview when Valhala ships, as per roadmap.
Then there is the whole issue when will they reach other implementations beyond OpenJDK, specially a very important alternative implementation running on many phones across the globe.
Nevertheless the need to explicitly being allowed to write vector code is there.
Thanks for the explanation. Aside from vectorization is there anything else that handwritten assembly could be better? Assuming on modern CPUs and modern compilers.
Sure, people who are good at assembly can often do register allocation and instruction selection better for small snippets of code. Or optimize based on guarantees the compiler can’t see or know about.
I have seen applications for MS-DOS, effectively using C as a Macro Assembler, only the data structures and high level logic was C as if Macro Assembler macros.