Interesting. I've used this same file for a test on:
- I7-7920HQ, MacOS
Rust version: 90ms (rustc 1.47.0 (18bf6b4f0 2020-10-07))
C++ version: 340ms (Apple clang version 12.0.0 (clang-1200.0.32.2))
- Xeon(R) Gold 6130 (skylake, Ubuntu 20.04)
Rust: 76ms (rustc-1.47.0)
C++: 60ms (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)), -O3, no PGO
Seems quite CPU and compiler-dependent. Odd that the results on MacOS were so horrible for C++.
And thanks - I've updated the post to include this more-reproducible benchmark, and included both macOS and linux results.
One note: I modified the single threaded one to use walkdir, but that shouldn't affect time in a major way. The macos timings were about the same.
And yes, I agree about the top N part. I deliberately tried to remain "algorithm-compatible" with the C++ example; there are lots of tricks to use to speed this up more. Getting rid of the line-at-a-time processing would be a good start, for example - it results in an unnecessary double-scan of the input.
Cool. And, yeah..mmap()ing the whole file would also auto-reject non-regular files at the OS layer fixing that TOCTTOU issue..I almost did that, but I'm sure there are various other tricks, too.
I mostly thought Nim deserved to be seen and then it happened to also be faster..perhaps giving folks a slight Bayesian update on presumptions of performance. :-)
Would you be willing to test my implementation[0] also? It'd be interesting to see the overhead of the (overkill) string normalization I do, along with the much reduced pressure on the allocator compared to yours.
Well, I got 93 ms on the same Tale Of Two Cities file, 2x slower than dga's version and 18x slower than my last 5.0 ms Nim version (done to properly avoid hanging forever if someone does a "mknod foo.txt p") mentioned elsewhere in this thread (
https://news.ycombinator.com/item?id=24822429).
Really, though, all the code for all 6 versions (two Nim, one C, two Rust, one C++) as well as the input file is available to all. So, you should/could double check yourself. As dga mentions in his updated blog there is a lot of compiler/CPU sensitivity.
And thanks - I've updated the post to include this more-reproducible benchmark, and included both macOS and linux results.
One note: I modified the single threaded one to use walkdir, but that shouldn't affect time in a major way. The macos timings were about the same.
And yes, I agree about the top N part. I deliberately tried to remain "algorithm-compatible" with the C++ example; there are lots of tricks to use to speed this up more. Getting rid of the line-at-a-time processing would be a good start, for example - it results in an unnecessary double-scan of the input.