Now, instead-of hand-written vector instructions, look at programs transliterated line-by-line literal style into different programming languages from the same original:
benchmarks game uses BenchExec to take 'care of important low-level details for accurate, precise, and reproducible measurements' ….
BenchExec uses the cgroups feature of the Linux kernel to correctly handle groups of processes and uses Linux user namespaces to create a container that restricts interference of [each program] with the benchmarking host.
It's not an issue of warmup time, it's an issue of jit compilation.
On my server (AMD EPYC 7252):
1) base time of the java program from the repo is 3.23s (which is ~2 worse than the one in linked page, so I assume my cpu is about 2 slower, and corresponding best c++ result will be ~450ms
2) if you count from inside of java program you get 3.17s (so about 60ms of overhead)
3) but if you run it 10 times (inside of same java program) you cut this time to 1570ms
It's still much slower than c++ version, but it's between rust and go. And this is not me optimizing something, it's only measuring things correctly.
update: running vector version of java code from same repo brings runtime to 392ms which is literally fastest out of all solutions including c++.
update2: ran c++ version on same hardware, it takes 400ms, so I would say it's fair to say c++ and vectorized java are on par (and given "allows vectorization" comment in cpp code I assume that's the best one can get out of it).
2001 "Mastering ENVY/Developer".
https://www.google.com/books/edition/Mastering_ENVY_Develope...
reply