glibc qsort's implementation is in libc.so, not in the header. GCC doesn't have anything to work with.
It's also an apples-to-oranges comparison, since std::sort and qsort implement different algorithms.
A lot of std::sort's performance is actually from using the version without any callbacks. If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback. Theoretically the compiler should make the two cases the same, but apparently GCC is too dumb (that's not a slight on GCC; I think people expect too much from compilers):
external_sort is just std::sort hidden behind an extern function implemented in a separate .o file. Those benchmarks are from sorting 1MB of random and already-sorted data (as indicated in the names). I think it's important to test such cases, because often online people benchmark code which is written all in a single file, whereas real-life C++ projects are usually organized in such a way that every little class is in its own little file, which gets compiled into a separated object file, and then it all gets linked together without LTO. And then those same people go on to claim performance benefits of their language without actually using the setup which enables those benefits, which IMO is a bit dishonest.
When I drill further down into everything I want to drill into, maybe I'll publish the source for the benchmarks somewhere.
> If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback.
This is not true. `std::sort`'s default comparator is a `std::less` object. The advantage comes from using a stateless callback functor object. If you pass a capture-less lambda instead of a function pointer, you can reap the same benefits as using the default comparator. Even if that capture-less lambda just forwards to a specific free function anyway.
In short, `std::sort(beg, end, [](auto x, auto y) { return foo(x,y); })` can be faster than `std::sort(beg, end, &foo)`.
It's also an apples-to-oranges comparison, since std::sort and qsort implement different algorithms.
A lot of std::sort's performance is actually from using the version without any callbacks. If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback. Theoretically the compiler should make the two cases the same, but apparently GCC is too dumb (that's not a slight on GCC; I think people expect too much from compilers):
external_sort is just std::sort hidden behind an extern function implemented in a separate .o file. Those benchmarks are from sorting 1MB of random and already-sorted data (as indicated in the names). I think it's important to test such cases, because often online people benchmark code which is written all in a single file, whereas real-life C++ projects are usually organized in such a way that every little class is in its own little file, which gets compiled into a separated object file, and then it all gets linked together without LTO. And then those same people go on to claim performance benefits of their language without actually using the setup which enables those benefits, which IMO is a bit dishonest.When I drill further down into everything I want to drill into, maybe I'll publish the source for the benchmarks somewhere.