Not to mention that both C++ and Rust can specialise algorithms and containers f...

dagw · on Aug 9, 2022

It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.

This was equally true back when C vs Fortran was the big debate, and something not easily captured in benchmarks. C, as written by an expert in high performance C, was equally fast as Fortran written by an expert in high performance Fortran. C, as written by a domain expert with limited programming skills, was often very much slower than Fortran written by a domain expert with limited programming skills.

mumblemumble · on Aug 9, 2022

This actually reminds me a bit of an old competition between two Microsoft MVPs comparing C++ and C#, where they went back and forth optimizing their respective versions of a model program, and discussing the optimizations they made.

The gist, as I recall it was: the initial, idiomatic, written-for-maintainability version of the C# program was significantly faster than the C++ equivalent. Up until the end, the C# version also generally needed to go through less heroics to keep up with the C++ version. Eventually, the final C++ version did end up being faster than the fastest C# one, but, considering what needed to be done to get there, it was a decidedly Pyrrhic victory.

One huge mitigating factor, though, is that the model program was doing something business-y. I doubt C++ would have had such a hard time of it if it had been a number crunching or systems program.

water8 · on Aug 10, 2022

C# is closer to Java than C++. It’s a garbage collector language and nowhere near the performance of C++

mumblemumble · on Aug 13, 2022

So, one of the things they discovered as part of the back and forth was that C#'s generational garbage collector was actually an advantage. Because it made finding memory for a new object allocation O(1), while for C++ it was O(N).

That observation was actually key to the C++ version ultimately producing the fastest version. Chen replaced malloc() with an implementation that was tailored to the problem in question.

I guess the thing that I always find lacking in these discussions is a cost/benefit analysis. Yes, C++ will let you do things like that, and they will absolutely allow you to wring every last drop of performance out of what you're doing.

But, if you aren't in a situation where optimizing to that extent is cost-effective, and you're working in a business domain where frequent heap allocation of short-lived objects is what you do, so that idiomatic, standard C++'s default way of doing things is known to generally be not significantly better, and often slower, than some of the faster GC languages, then it's just possible that that you should go for the pragmatic option.

woodruffw · on Aug 9, 2022

Precisely. This is perhaps the strangest part of the original post: C++ has the same performance advantages as Rust! It has them not because it's more safe (although it is, in some regards), but because it allows programmers to express behaviors that the compiler can reason about statically.

marcosdumay · on Aug 9, 2022

C++ has a few of the performance advantages that Rust has.

And many that are negated due to a bad std API.

Ygg2 · on Aug 9, 2022

Interesting. Can you list an example of C++ type system allowing optimization Rust system doesn't?

tialaramex · on Aug 9, 2022

Rust's assumption is that it's the compiler's job to reject all wrong programs (usually with a helpful diagnostic). In C++ the assumption is that it's the compiler's job to permit all correct programs.

You obviously ideally want both, but that's not actually possible when you have a language this powerful. So, Rest's choice means sometimes (more rarely these days but it can happen) you will write a program that is correct, but the compiler doesn't believe you and rejects your program, you will need to alter it, perhaps after alterations it's actually nicer, but equally perhaps you feel this made it uglier or slower, nevertheless you have no choice in Rust (well, you could try waiting a few years, the compiler gets smarter)

However the C++ choice means sometimes (maybe even often) you will write a program that isn't correct and the compiler gives you no indication whatsoever that there's a problem, you get an executable or object file or whatever out, but what it does is completely arbitrary. Maybe it works how you expected... until it doesn't.

The magic phrase in the C++ standard is "Ill-formed, no diagnostic required". For example suppose you try to sort some floats in C++ 20. That's ill-formed (floats aren't in fact Totally Ordered but the function signature says you promise they are) and no diagnostic is required for... whatever it is your program now does. Maybe it crashes, maybe it works fine, not their problem, good luck with that.

Now, probably if all your floats are like boring normal finite reals like -2.5 or something this will work fine, there's no practical reason it wouldn't, but who knows, the C++ language denies all responsibility. So it gets to be very "optimal" here since it can do whatever it wants and it's your fault.

proto_lambda · on Aug 9, 2022

To expand on your float sorting example, sorting a slice[1] in Rust requires the element type to implement the Ord trait, i.e. be totally ordered. Trying to sort a slice of floats will result in a compiler error, even though it might be totally fine as long as all your floats are "ordinary".

Instead, to sort a slice of floats, you have to explicitly specify what would happen for the non-ordinary cases; e.g. by using `.sort_by(f32::total_cmp)`, where f32::total_cmp()[2] is one possible interpretation of a total ordering of floats. This requires writing more code even for cases where it would be completely unnecessary.

[1]: https://doc.rust-lang.org/std/primitive.slice.html#method.so... [2]: https://doc.rust-lang.org/std/primitive.f32.html#method.tota...

Ygg2 · on Aug 9, 2022

So rather than introducing a hard to detect bug (with NaN, Inf, -Inf), Rust makes me think about it and not just let whoever worked on compiler decide.

How is this a negative? I'd rather program fail at compile than runtime, and rather it fail loudly than quietly.

Also Rust doesn't prevent you from making optimal ordering, just a tinge more verbose.

alpaca128 · on Aug 9, 2022

I also like this priority in Rust, which constantly makes me wonder why the developers allowed shadowing. It has already caused runtime bugs for me while the compiler didn't even throw a warning about it, and as Rust is otherwise so strict about making possible mistakes like this explicit it's definitely not the first cause I consider when debugging.

proto_lambda · on Aug 9, 2022

While I think shadowing is great for code readability and I've never encountered a bug caused by it, you can always make sure clippy doesn't let you do it by putting a `#![deny(clippy::shadow_reuse, clippy::shadow_same, clippy::shadow_unrelated)]` at the top level of your crate.

tialaramex · on Aug 9, 2022

Like proto I've never had this happen, even though I was initially sceptical until I found myself writing stuff like (real examples more complicated hence decision to break them down)

  let geese = something(lots_of_birds).bunch().of_chained().functions();

  let geese = geese.somehow().just().count_them(); // We don't actually need geese, just #

Could you name that first variable something else? Yeah. But, it's geese, it's not the number of geese, it's a different type, but it is just geese, that's the right name for it. OK, maybe rename the second variable? But number_of_geese is a stupid variable name, I would push back on a patch which tried to name a variable that because it's stupid. n_geese isn't stupid, but it is ugly and Rust is OK with me just naming it geese, so, geese it is.

However, if you do run into trouble those Clippy rules can save you. You probably will find you don't want them all (or perhaps any of them) at deny, but Rust is content for you to decide you only want a warning (which you can then suppress where appropriate) and importantly these are three rules, you might well decide you only hate shadow_same or shadow_reuse or something. Here's the link specifically for shadow_reuse as an example:

https://rust-lang.github.io/rust-clippy/master/#shadow_reuse

proto_lambda · on Aug 9, 2022

> I'd rather program fail at compile than runtime, and rather it fail loudly than quietly.

I agree! I was just illustrating the kind of tradeoff that has to be made for that to be possible.

afiori · on Aug 9, 2022

I don't think you could do [0] in Rust, but would be very interested in finding out otherwise.

[0] https://capnproto.org/news/2015-03-02-security-advisory-and-...

proto_lambda · on Aug 9, 2022

Not on stable, but you can on nightly (though it's still quite wonky): https://play.rust-lang.org/?version=nightly&mode=debug&editi...

whimsicalism · on Aug 9, 2022

I don't think they suggested this.

tick_tock_tick · on Aug 9, 2022

template<size_t n> struct example { int e[n]; };

GrumpySloth · on Aug 9, 2022

Rust:

  struct Example<const N: usize> {
      pub e: [i32; N],
  }

smitty1e · on Aug 9, 2022

One cannot imagine C++ failing to emulate Rust's borrow checker by the end of the decade, even if compiler support is required.

Has C++ ever failed to snarf a feature?

tialaramex · on Aug 9, 2022

Safety features. The committee are, perhaps unconsciously, biased against safety on the presumption (seen in many comments here on HN) that safer has to mean lower performance.

But part of the impetus for Carbon is that WG21 (the C++ Standards Committee) rejected proposals that C++ should focus on better performance and safety. So maybe performance is no longer important either. What's left?

Where they've taken things which might appear on the surface to be modelled on a safer Rust feature, usually the committee insists they be made unsafe. For example suppose I call a Rust function which might return a char or might not, it returns Option<char> and if I'm an idiot and I try to treat that as a char, it doesn't type check because it isn't one, I need to say what I'm going to do when it isn't or else that won't compile.

You can write that in modern C++... except it can automatically try to take the char (which isn't there) out of the empty optional structure and that's Undefined Behaviour. So whereas the Rust prevents programmers from making easy mistakes, the C++ turns those into unexploded bombs throughout your code.

bluGill · on Aug 9, 2022

Many on the C++ committee are interested in the borrow checking, but are not sure how to make it work in C++. The hard part is they cannot break compatibility with code that is legal with previous versions of C++. If there is even one pathological case where the borrow checker will reject code that doesn't have a memory leak then they will not accept it, and require whoever proposes this borrow checker to prove the absence of such a thing. (note if it rejects code that worked until the leaks mean you run out of memory they will accept that). I don't know if such a thing even exists, but if it does I'm confident that in Rust it is new code that you can write differently to avoid the bug, while with C++ that may be a very massive effort to figure out 25 year old code nobody understands anymore before you can rewrite it.

One obvious corner case: It is very common to allocate a buffer at startup and let the system clean it up when the program exits. (often this is embedded cases where the only way for the program to exit is power off). I don't know how you do this in rust (if you can - I'm not a rust expert)

dwattttt · on Aug 9, 2022

Box::leak. Assuming the object you're leaking contains no references, it can then live for 'static (the lifetime of the process)

alpaca128 · on Aug 9, 2022

> allocate a buffer at startup and let the system clean it up when the program exits

This is possible with the lazy_static library or (not yet in stable Rust) OnceCell. It allows you to allocate & initialize any datastructure once during runtime and get global read-only access.

GuB-42 · on Aug 9, 2022

I don't know Rust but I know C++.

And C++ has the potential to be faster than C, mostly thanks to metaprogramming (templates, ...). It is horrible if you have to do it, but if you are just using the standard library, you don't have to feel the pain but still take advantage of it. That's how algorithms are implemented. Because so much is known at compile time, optimizers can do a lot.

The reason C++ is generally regarded as slower is that C++ programmers tend to create objects on the heap all the time because constructors and destructors make it easy. Modern C++ also discourages raw pointers and so you get references counters all over the place, essentially turning C++ into a garbage collected language. I am not saying it is bad, but it certainly impacts performance.

But if you manage your memory in C++ just as you do in C, keeping track of all your buffers reusing them, and not using more than necessary, I can easily see C++ beat C.

mort96 · on Aug 9, 2022

> Modern C++ also discourages raw pointers and so you get references counters all over the place, essentially turning C++ into a garbage collected language.

This doesn't match my experience. It's true that modern C++ discourages owning raw pointers, but the solution is usually unique_ptr, not shared_ptr. Truly shared ownership is actually pretty uncommon IME, usually you can have one entity which obviously "owns" the object and then any other reference to it can be non-owning.

It's also worth noting that with std::move, actually changing the refcount of a share_ptr can be pretty rare even if you do have shared ownership.

bregma · on Aug 9, 2022

This is not my experience. Most developers are just not very good at what they do, and the go-to smart pointers for not-very-good C++ developers is std:shared_ptr<T>.

saidinesh5 · on Aug 9, 2022

This has been my experience as well - especially when C++11 came out. I have seen codebases where it has been "use std::shared_ptr for everything, becuase it is safer if/when we use threads". I know that doesn't make sense, but it just was the attitude back then.

Tbh, Back then, I didn't see a problem with it. Once i started chasing down weird bugs where objects aren't freed properly because no one knew which objects own what, I have been very cautious.

bluGill · on Aug 9, 2022

We had a few developers like that here when C++11 was introduced, but a few people gave them the smack down and now we rarely see shared pointers.

mort96 · on Aug 9, 2022

Hmm, that might be. Most of the C++ I've seen has been in LLVM, Google projects, projects where I'm the only developer or projects where I laid the groundwork which other people build upon, so I'm probably not mainly looking at the kind of code bases you're talking about.

lupire · on Aug 9, 2022

OK but those people aren't going to magically write safe performant C either.

jstimpfle · on Aug 9, 2022

unique_ptr is pretty bad for performance as well. It is more complicated to use compared to raw pointers and encourages an OOP object-per-object piecemal code and data architecture. I've never seen a C++ program making use of unique_ptr that didn't give a strong smell of enterprise programming.

mort96 · on Aug 9, 2022

There's nothing more complicated about using unique_ptr than a raw pointer, it just expresses who's responsible for calling `delete` explicitly in code rather than implicitly through program flow.

jstimpfle · on Aug 9, 2022

There's nothing complicated? You have to 1) #include <memory> 2) Write "std::unique_ptr<My_Foo_Type> foo" instead of just "My_Foo_Type *foo" in every definition. 3) Are required to define My_Foo_Type as a class with a separate deleter, or provide a deleter template argument at each declaration. 4a) write "foo.get()" in various places instead of just "foo". or 4b) lend around the unique_ptr in various places, breaking modularization and increasing build times. 5) Be stuck with a non-POD type that you can't just memcpy() around. 6) enjoy the worse runtime because your program has just been artifically compartmentalized even more!

Sometimes you C++ guys are just blinded by the tale of "zero-cost abstractions".

unique_ptr, like the idea of RAII in general, binds together what should be separate. Data schemas and physical layout on the one hand, and memory and lifetime management on the other hand. What you get as a result is what you deserve: The idea of "more safe and maintainable" where the "more" isn't added to the equivalent non-RAII program. No, it is added to the more convoluted, less understandable, and thus inherently less safe and maintainable program. Who knows what the bottom line is (in my experience often safety is a bit better but I pray for you if you need to debug a problem, and maintainability is much worse), but out of interest in my own sanity I know my preference.

usefulcat · on Aug 10, 2022

I really don’t see what the big deal is? Generally the only time you should be returning or passing around a unique_ptr is when you’re actually transferring ownership of the referenced object. Otherwise just dereference it and pass around a reference to the underlying object.

kaba0 · on Aug 10, 2022

> Data schemas and physical layout on the one hand, and memory and lifetime management on the other hand

How are they separate? Like, that’s what Rust does pretty explicitly, with great results.

jstimpfle · on Aug 10, 2022

I'm not following, what is Rust doing exactly? Coupling schema / layout with lifetime management? If that's what you mean I would like to disagree about the "great results" because of a gut feeling, and possibly the disagreement could in theory be justified with build times, or viewpoints on maintainability or whatever. But unfortunately I have no basis for doing so. I don't understand Rust well. And have very little experience, expect failing at compiling some projects and their 500 dependencies a couple times...

bregma · on Aug 9, 2022

Use correctly std::unique_ptr<T> has no measurable impact on performance compared with the equivalent non-smart-pointer code. You use std::unique_ptr<T> to indicate ownership, and pass raw pointers around to indicate non-ownership. That approach has the strong smell of a good programmer using the right tool for the job, especially considering the job is to communicate intent to the future reader.

It's like the classic argument against using exceptions: compared with the traditional C method of completely ignoring error conditions and not checking status, they're much slower.

account42 · on Aug 9, 2022

> Use correctly std::unique_ptr<T> has no measurable impact on performance compared with the equivalent non-smart-pointer code.

One wart of unique_ptr (and other smart pointers) is that it cannot be passed in a register when used as a function parameter, at least with the System V ABI used on Linux.

Also, the caller is responsible for destruction and there is no way to specify that a function always "consumes" a unique_ptr so the compiler cannot eliminate the destructor code: https://godbolt.org/z/sz79GoETv

Of course if the compiler can inline the call or at least controls both and can clone the function with a custom calling convention then that doesn't have to be a problem. But it still sucks that even something as seemingly simple as a tiny wrapper around a pointer does come with a cost.

jstimpfle · on Aug 9, 2022

That's the point. As a rule of thumb, fine-grained ownership is a very bad idea. It makes your program into a mess, which will be slow and make your program hard to understand. The slow part applies in any case, whether you have to suffer it in code (as you do with C) or not (as in many other languages that allow you to make even more of a mess).

As a C programmer, I try to avoid tracking ownership in separate struct member fields. I try to make central data structures that keep care of the tracking. Cleaning up shouldn't happen pointer-by-pointer. Usually a much bigger context has a shared lifetime, so there is no point in splitting stuff up in individually tracked "objects". Instead you just track a bigger block of memory.

anonymoushn · on Aug 9, 2022

Are the answers to this stackoverflow question incorrect? https://stackoverflow.com/questions/58339165/why-can-a-t-be-...

vitus · on Aug 9, 2022

> unique_ptr is pretty bad for performance as well.

Do you mean in terms of cache locality because it's heap-allocated instead of stack-allocated, or are you actually commenting on the overhead of copying some extra ints and invoking the destructor?

Because it's certainly correct that just because you can use a unique_ptr, doesn't mean you should. ("A std::unique_ptr is used for expressing transfer of ownership. If you never pass ownership elsewhere, the std::unique_ptr abstraction is rarely necessary or appropriate." - https://abseil.io/tips/187)

a_t48 · on Aug 9, 2022

Safety is a good reason. I like protection against leaks and use after free. If I’m already allocating I’m not going to worry about the little bit of extra performance cost the abstraction might have.

vitus · on Aug 9, 2022

To be clear: I'm not advocating for the use of `new` / `delete` over unique_ptr. But if you're creating a local object that never has to leave the current scope (or a member variable that's never moved in or out), there's no benefit to using a unique_ptr instead of creating the object directly on the stack or inline as part of your class, where the object's lifetime is bound to the scope, and your destructor is automatically run when its containing scope is cleaned up.

As an added bonus, you don't actually have to do a separate heap allocation.

a_t48 · on Aug 10, 2022

I agree! You should use a regular object if possible, I’d never suggest otherwise. The rare exceptions I’ve run into are annoying initialization order issues (usually code that I didn’t have the time/knowledge/political credits to refactor) and large arrays that blow the stack.

pjmlp · on Aug 9, 2022

As of C++17 not so horrible, and C++2x versions even less so, unless one has some strange fetisch for SFINAE and tag dispatch.

Since 1993, I never saw any need to keep bothering with C other than having it imposed on me, C++ had enough C89 subset on it, if I ever miss coding like C and its warts.

Nowadays that compatibility is up to C11 subset.

flohofwoe · on Aug 9, 2022

> Nowadays that compatibility is up to C11 subset.

Not true unfortunately, the "C subset" is still stuck at something that can at best be called a fork of "C95" which was then developed into a "bastard language" that resembled C on the surface, but isn't actually C (e.g. the incomplete designated init support in C++20 is the best example of this half-assed "looks like C, but isn't actually C" philosophy).

pjmlp · on Aug 9, 2022

That is why it is called a subset, duh!

jstimpfle · on Aug 9, 2022

> It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.

On the other hand, empirically, it is not unusual to see straightforward C programs being dramatically faster than comparable C++ programs written in enterprise style, and to also build much faster.

> Last but not least, you would have to be a masochist to write heavily multi-threaded code in C

You have to be a masochist to write heavily multi-threaded code that uses a lot of ad-hoc synchronization with mutexes and atomics. As it turns out, for many many tasks, it's also a spectacular bad way to go about parallelization, because mutexes are the _opposite_ of parallelization.

As a rule of thumb, do coarse-grained concurrency. Install a few queues, come up with a job system, and it won't be hard to get parallization right in plain C at all. Writing in C is often a good idea because what's a bad idea to do on hardware coincedes pretty well with what is painful to write.

bluGill · on Aug 9, 2022

> On the other hand, empirically, it is not unusual to see straightforward C programs being dramatically faster than comparable C++ programs written in enterprise style, and to also build much faster.

Your only comparison cases are cases where the code in question was re-written in C. This most likely means that everyone already knew it was slow and so the re-write also fixed the fundamental problems. If the code had been rewritten in C++ it would also be faster - and since C++ allows some optimizations C doesn't it would be even faster. (it is known that if you switch from gcc to g++ your code often will run faster if it compiles)

There is a reason for enterprise style C++. Most of the time it is still fast enough, and it is a lot more maintainable.

jstimpfle · on Aug 9, 2022

> it is known that if you switch from gcc to g++ your code often will run faster if it compiles)

I've never heard such a claim, can you back it up? And what does it say about the language?

> and it is a lot more maintainable

If you equate "maintainable" = readable, I've never once seen maintainable enterprise code. Everything is a convoluted mess that never gets anything done. Probably I haven't worked at the best shops, but then again, where are those? And why doesn't the language help mediocre programmers to write maintainable code?

I suspect that maintainability is almost exclusively a function of experience, not the programming language used. Experienced programmers do seem to agree that C-style C++ or even plain C is the way to go.

bluGill · on Aug 10, 2022

https://www.codeproject.com/questions/445035/c-vs-cplusplus-... has a long discussion. The short answer is C++ has stricter aliasing rules, and so the compiler can apply more optimization. This of course assumes that your C code is also valid C++ code (C is no a pure subset of C++), and you don't have those aliases - those apply to a lot of C programs but not all.

> And what does it say about the language?

C++ has a stronger type system. This is already known. You avoid a few bugs in C++ because of this. The type system isn't nearly as strong as Haskell.

> I've never once seen maintainable enterprise code. Everything is a convoluted mess that never gets anything done

Two sides of the same coin. While the code is convoluted, it often is doing a lot of things in a generic way. More straightforward code is possible, but only by created a lot more code, and quantity is itself convolution.

> And why doesn't the language help mediocre programmers to write maintainable code?

It does. However you have to be careful here. C++ is often used for very large problems that are also complex. I would never use python for something that is over 100,000 lines of code as you can't change anything anymore for fear that some case isn't covered in the tests and so you won't see that syntax error until months later. I maintain 15 million lines of C++ (and this isn't the largest C++ codebase I know of).

Not, I'm not arguing that C++ is a great language. It has a lot of inconsistencies, and foot guns. However it is still the best language I know for very large, very complex programs. (Note that I do not know ADA or Rust, two that often come up in context of very large, very complex programs. I would not be surprised if they are better. That C++ is better known that others is itself an advantage to C++)

> I suspect that maintainability is almost exclusively a function of experience, not the programming language used.

Sort of. As I said before languages like Python are out of the running for very large programs because they are not compiled and so you can get runtime errors. There are also intentionally impossible to write languages that we can throw out even sooner. However there are for sure other languages that can play in the very large program space. So long as we limit ourself to languages that play in the very large program space, experience is the largest factor.

flohofwoe · on Aug 9, 2022

> become dramatically faster when rewritten in a more modern language

IME that's mostly a myth though. A C compiler will stamp out a specialized version just as well if it can see all the relevant function bodies (either via inlining or LTO).

"Zero cost abstraction" isn't just a C++ thing, it happens mostly in the language agnostic optimizer passes. For instance the reason why std::sort() shows up faster in benchmarks than C's qsort() is simply because std::sort() implementation is all inline template code, not because of some magic performance-enhancing qualities of the C++ template system.

planede · on Aug 9, 2022

inlining only goes so far. You won't get full of qsort to be inlined, and if it's not inlined, it needs to be at least cloned to be on par with std::sort, so the comparator function could get const-propagated.

AFAIK out of the major compilers, gcc has the most aggressive cloning, but it's still nowhere near to const propagate the comparator from qsort. With std::sort with a stateless comparator function object (such as std::less, which is the default), you get this for free*.

* of course this is not entirely free, as this is more prone to code bloat. But, you can always type-erase the comparator, and use a function pointer, or std::function, if this ever becomes a problem. But you can't convince a C compiler to const propagate the comparator in qsort all the way through, if the optimizer chooses that it doesn't worth it.

GrumpySloth · on Aug 9, 2022

glibc qsort's implementation is in libc.so, not in the header. GCC doesn't have anything to work with.

It's also an apples-to-oranges comparison, since std::sort and qsort implement different algorithms.

A lot of std::sort's performance is actually from using the version without any callbacks. If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback. Theoretically the compiler should make the two cases the same, but apparently GCC is too dumb (that's not a slight on GCC; I think people expect too much from compilers):

  ------------------------------------------------------------------------
  Benchmark                              Time             CPU   Iterations
  ------------------------------------------------------------------------
  std_sort_random                 52881299 ns     52873089 ns           14
  std_sort_with_callback_random   63319633 ns     63307876 ns           11
  qsort_random                   106803314 ns    106784567 ns            7
  external_sort_random            97642851 ns     97640888 ns            7
  std_sort_sorted                  8433311 ns      8432564 ns           82
  std_sort_with_callback_sorted   13868016 ns     13865170 ns           50
  qsort_sorted                    28098439 ns     28093720 ns           26
  external_sort_sorted            33629020 ns     33628108 ns           21

external_sort is just std::sort hidden behind an extern function implemented in a separate .o file. Those benchmarks are from sorting 1MB of random and already-sorted data (as indicated in the names). I think it's important to test such cases, because often online people benchmark code which is written all in a single file, whereas real-life C++ projects are usually organized in such a way that every little class is in its own little file, which gets compiled into a separated object file, and then it all gets linked together without LTO. And then those same people go on to claim performance benefits of their language without actually using the setup which enables those benefits, which IMO is a bit dishonest.

When I drill further down into everything I want to drill into, maybe I'll publish the source for the benchmarks somewhere.

planede · on Aug 23, 2022

> If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback.

This is not true. `std::sort`'s default comparator is a `std::less` object. The advantage comes from using a stateless callback functor object. If you pass a capture-less lambda instead of a function pointer, you can reap the same benefits as using the default comparator. Even if that capture-less lambda just forwards to a specific free function anyway.

In short, `std::sort(beg, end, [](auto x, auto y) { return foo(x,y); })` can be faster than `std::sort(beg, end, &foo)`.

enqk · on Aug 9, 2022

check this out: https://github.com/WebKit/WebKit/blob/main/Source/bmalloc/li...

temac · on Aug 9, 2022

Interesting but I'm not sure about the relevancy to the above comment.

On a sidenote, it has weird claims:

> obviating the need for ownership type systems or other compiler approaches to fixing the type-safety of use-after-frees. This means that we need one heap per type, and be 100% strict about it.

enqk · on Aug 9, 2022

I guess I have to quote from it, then:

---

C lets you do most of what C++ can if you rely on always_inline. This didn't used to be the case, but modern C compilers will meat-grind the code with repeated application of the following things:

- Inlining any always_inline call except if it's recursive or the function uses some very weird features that libpas doesn't use (like goto pointer).

- Copy-propagating the values from the callsite into the function that uses the value.

Consequently, passing a function pointer (or struct of function pointers), where the pointer points to an always_inline function and the callee is always_inline results in specialization akin to template monomorphization.

This works to any depth; the compiler won't be satisfied until there are no more always_inline function calls. This fortuitous development in compilers allowed me to write very nice template code in C. Libpas achieves templates in C using config structs that contain function pointers -- sometimes to always_inline functions (when we want specialization and inlining) and sometimes to out-of-line functions (when we want specialization but not inlining). Additionally, the C template style allows us to have true polymorphic functions. Lots of libpas slow paths are huge and not at all hot. We don't want that code specialized for every config. Luckily, this works just fine in C templates -- those polymorphic functions just pass around a pointer to the config they are using, and dynamically load and call things in that config, almost exactly the same way that the specialized code would do. This saves a lot of code size versus C++ templates.

camel-cdr · on Aug 9, 2022

qsort only isn't inline, because libcs don't supply an inline definition. If you write your own qsort, then you'll see it getting inlined and/or function cloned for different types.

camel-cdr · on Aug 9, 2022

The only real difference between qsort and std::sort in terms of code generation, is that for std::sort the default assumption is to function clone and for qsort it is to generate the full slow function. Now the compiler will in most cases detect that qsort can be cloned or inlined, but sometimes it might decide not to and the fallback is, in most cases slower then the C++ fallback.

PS.: I'm just annoyed that my generic C hashtable that is written in a qsort style doesn't get function copied/inlined when it's used for more than one type.

account42 · on Aug 9, 2022

You can't just ignore that qsort is implemented in a different library just because it could be implemented inline in the header.

JonChesterfield · on Aug 9, 2022

It could and it should, but function specialisation without inlining is still on the wish list for llvm last I checked.

cmeacham98 · on Aug 9, 2022

I totally agree with you, but are we really expecting "a typical PC" to have 10+ threads?

aerxes · on Aug 9, 2022

Gonna beat a dead horse here, but >50% of PCs that are surveyed by Steam have 12 threads or more.

That’s PCs that have steam installed at all.

Intel’s bare minimum current-gen i3 processor has 12 threads. That’s the absolute cheapest desktop-level processor you can get.

Your phone probably has 6 cores (though not 12 threads).

So yes, if you’re writing code for desktop hardware, it’s safe to assume you have at least 8 threads. Maybe you don’t want to consume all of them, but it’s better to let the OS handle scheduling.

https://www.techspot.com/article/2363-multi-core-cpu/

jcelerier · on Aug 9, 2022

Gaming is very much not representative. There's roughly 120M active steam users, vs. ~1.4 billion windows installs.

If I look around me, for instance in my whole family we're two with Steam installed but ever household has a desktop or a laptop (and generally a 7-8 years old cheap entry-level 350€ one, you'd be hard-pressed to find even a quad-core in there)

rascul · on Aug 9, 2022

> and generally a 7-8 years old cheap entry-level 350€ one, you'd be hard-pressed to find even a quad-core in there

My $400 laptop from 2014 has a quad core processor.

aerxes · on Aug 9, 2022

I don’t know dude, if you want to write software for the worst performers instead of commodity hardware that’s up to you

Just that single core systems are dying and probably won’t come back. Even Raspberry Pi’s are quad core now.

jcelerier · on Aug 9, 2022

> I don’t know dude, if you want to write software for the worst performers instead of commodity hardware that’s up to you

I want to write software that people, most people, not only those with SV salaries able to buy a computer every year, can use.

Here's the current best seller laptop in Amazon in my country: https://www.amazon.fr/Dell-Inspiron-i5-1135G7-Ordinateur-por...

It's half past 2022 and the most sold laptop here in France, 7th-ranked in GDP, has 8 gigabytes of RAM and 4 cores. This is what the real world looks like. (and just a year ago it was still 4GB of RAM iirc)

The second best sale is a gamer PC which is a bit better. https://www.amazon.fr/MILUI-Modulaire-Colosseum-Livraison-30...

The third https://www.amazon.fr/Ordinateur-Portable-Celeron-Resolution... comes with a CPU that does not support hyperthreading (so 4C/4T) and does not even support friggin AVX and goes to SSE4.2 at most.

That does not mean not making use of multiple cores of course, but a software should still be able to work on a single-core. Right now we only have certifications such as https://www.blauer-engel.de/en/productworld/resources-and-en... (see https://www.umwelt-campus.de/en/research/projekte/green-soft... for the methodology) but hopefully in a few years we can start making it first heavily discouraged and over time less and less viable to create resource-wasting software - in any case this is a thing I am asking of the people whom I vote for :-)

mwcampbell · on Aug 9, 2022

Thank you! Please keep pushing such certifications until they become regulations that, like GDPR, even we American developers cannot ignore. Then I can make a strong business case to move away from Electron in the product I'm currently working on.

Edit to add:

Related to your links to best-selling computers, I've been thinking about downgrading to a low-spec PC as my daily driver, and using a remote machine for the times that I truly need something powerful for a big compile or the like. That would force me to feel the users' pain. But how far should I go? Taken to the extreme, I could use a machine with a spinning rust hard drive (not SSD) and the bare minimum system requirements for Windows 10 or 11, and keep all the crapware on it to more accurately reflect the typical user's environment. But then, maybe I'd just be hurting myself for no benefit, since the pressure to value developer productivity over runtime efficiency would not actually go away in the absence of regulations.

aerxes · on Aug 9, 2022

I’m not advocating making software multithreaded only, since obviously that doesn’t make sense.

But, in many modern languages (including c++) multi threading 1. Doesn’t significantly detract from the performance of single core systems 2. Can massively improve the performance of multi core systems, even with 2 cores or more.

For appropriate applications, the memory overhead and the cost of the bootstrapping code for instantiating a worker thread should be dwarfed by the time of actually computing the task (we’re talking about actions 100ms or longer). Not using multiple threads when you could reasonably half or quarter that time (without needing to drop support for single-core systems) is just foolish. If you’re that worried about single core performance then maintain two code paths, but at least recognize that the majority of commodity systems sold today, including the ones you listed, have multiple threads available to them to do the work that have the most painful wait times.

jcelerier · on Aug 9, 2022

> Related to your links to best-selling computers, I've been thinking about downgrading to a low-spec PC as my daily driver,

my rule of thumb for the software I develop is - on my desktop computer (2016 intel 6900k, still plenty powerful) - there mustn't be any slowness / lag in any user interaction when built at -O0 with -fsanitize=address. This has ensured so far that said software had correct performance on optimized builds on a Raspberry Pi 3 in ARMv7 mode.

msla · on Aug 9, 2022

> Please keep pushing such certifications until they become regulations that, like GDPR, even we American developers cannot ignore.

People are apparently surprised at how easy it is to ignore the GDPR:

https://web.archive.org/web/20200813235643/http://slawsonand...

> Article 3(2), a new feature of the GDPR, creates extraterritorial jurisdiction over companies that have nothing but an internet presence in the EU and offer goods or services to EU residents[1]. While the GDPR requires these companies[2] to follow its data processing rules, it leaves the question of enforcement unanswered. Regulations that cannot be enforced do little to protect the personal data of EU citizens.

> This article discusses how U.S. law affects the enforcement of Article 3(2). In reality, enforcing the GDPR on U.S. companies may be almost impossible. First, the U.S. prohibits enforcing of foreign-country fines. Thus, the EU enforcement power of fines for noncompliance is negligible. Second, enforcing the GDPR through the designated representative can be easily circumvented. Finally, a private lawsuit brought by in the EU may be impossible to enforce under U.S. law.

[snip]

> Currently, there is a hole in the GDPR wall that protects European Union personal data. Even with extraterritorial jurisdiction over U.S. companies with only an internet presence in the EU, the GDPR gives little in the way of tools to enforce it. Fines from supervisory authorities would be stopped by the prohibition on enforcing foreign fines. The company can evade enforcement through a representative simply by not designating one. Finally, private actions may be stalled on issues of personal jurisdiction. If a U.S. company completely disregards the GDPR while targeting customers in the EU, it can use the personal data of EU citizens without much fear of the consequences. While the extraterritorial jurisdiction created by Article 3(2) may have seemed like a good way to solve the problem of foreign companies who do not have a physical presence in the EU, it turns out to be practically useless.

"Patching" that hole seems to require either action on the American side or, perhaps, a return to old-fashioned impressment or similar projection of Majestic European Power to Benighted Lands Beyond the Ocean Sea. /s

afiori · on Aug 9, 2022

The EU can fine US companies the same as it can fine most other extraterritorial companies, that is only if the other country allows it. The EU is not going to start an armed invasion over a GDPR violation.

Still big multinational companies will have international branches (Google, Amazon, Microsoft, ...) that can easily be fined in their host countries.

account42 · on Aug 9, 2022

The EU can also prevent companies from doing business in the EU if they don't follow the local laws. No need for an armed invasion if the EU can block all transfers from EU banks for anything related to your company.

afiori · on Aug 9, 2022

I think GP was referring to enforcing GDPR against companies that do not do business in the EU (no employment, no sales, no bank account, no revenue, no taxes, etc.).

For example, a company like Digital Ocean might have no assets of any kind in the EU (assuming that they don't own their European datacenters), so the EU cannot force them to pay a fine nor seize their assets; the EU could technically sanction them by stopping EU datacenter providers (like AWS-Germany) from renting compute to Digital Ocean, but maybe not for something like a GDPR violation.

rowanG077 · on Aug 9, 2022

You should always write software for the worst performers. Unless you have a very good reason not to. Writing for the top performers is how we got into the silly mess where computers from 30 years ago have much higher ux then now.

aerxes · on Aug 9, 2022

If we were arguing about designing vehicle safety testing suites for the worst performers (a very real problem that we have right now) we wouldn’t even be having this conversation.

Writing multithreaded applications increases the performance ceiling. If an application can’t take use of multiple threads, but is written in a multi-threaded way, there’s no harm done. It simply runs the multi threaded code in a single threaded way (think of ParArray) with a bit of overhead incurred for “becoming multithreaded”.

Reasoning out of adding multithreaded support for long running actions because “most systems can’t take use of the extra threads” is just irrational, especially since most modern commodity systems could have a linear improvement with the additional threads.

The single core systems are barely hurt by the memory overhead involved with provisioning CORE_NUM of worker threads. But the multi core systems can take massive advantages from it.

rowanG077 · on Aug 9, 2022

I don't disagree with your specific point here, it's easy to dynamically allocate threads based on system cores. But I disagree that you should write your code for a medium speced system.

aerxes · on Aug 9, 2022

That’s what debate’s about. I do recognize that caring about single threaded workloads and performance do contribute to snappier UI (and backwards compatibility).

rahen · on Aug 9, 2022

Writing for the top performers is the main reason why computers turn into e-waste. It's Wirth law applied.

I wish more developers would work on RPIs for the same reason.

aerxes · on Aug 9, 2022

The point that I’m making is that you can write multi threaded applications without dropping support for single core systems.

cmeacham98 · on Aug 9, 2022

This article doesn't say that, what it actually says is "Over 70% of Steam users have a CPU with 4 or more cores."

Steam doesn't even measure publicize information about threads on the survey, which makes it near impossible to check because not that long ago Intel locked out hyperthreading/SMT on their low/mid-grade CPUs.

Additionally, and more importantly: the Steam hardware survey _obviously_ doesn't represent the average consumer PC.

aerxes · on Aug 9, 2022

The data that it presents shows that >50% of PCs surveyed have 6 cores or more.

rob74 · on Aug 9, 2022

>50% of PCs which have Steam installed, i.e. are used for gaming. So the statistic is not really representative of all PCs out there...

loup-vaillant · on Aug 9, 2022

The fact remains that virtually all systems except perhaps old low-end phones now have more than one thread. Not going multi-thread for anything that makes the user wait leaves significant performance on the table.

Low end systems (4 threads or less) have less potential, but they also have the most need for speed, making multi-threading quite important. And high-end systems have more threads, so going multi-thread makes a bigger difference.

jiggawatts · on Aug 9, 2022

My laptop has 8 cores and 16 threads.

I'm about to buy a PC with 16 cores and 32 threads for "normal" money.

The AMD EPYC server CPUs scale to dual sockets with 64-cores each, for a whopping 256 hardware threads in a single box. That's not some sort of esoteric hyper-scale configuration, but completely ordinary off-the-shelf stuff you can get in large quantities from mainstream vendors.

A single-threaded application on a server like that will use between 0.5% to about 50% of the total available performance, depending on where its bottleneck is. It will never reach 100%!

This matters to things like CLI tools, batch jobs, and the like, many of which are written in C, especially in the Linux world. A case-in-point that demonstrates how much performance has been left on the table is ripgrep, which is a multi-threaded Rust replacement for grep.

piperswe · on Aug 9, 2022

My Ryzen 5 3600X (a years-old mid-range CPU) has 6 cores and 12 threads, I wouldn't say that's atypical.

Asraelite · on Aug 9, 2022

Today, it's debatable, but if we're talking about programming languages for the future then the future is what's relevant. I don't think it will be long before 50+ thread CPUs are common. Multithreading won't be a nice-to-have feature, it will be a necessity.

aldanor · on Aug 9, 2022

Given that many phones these days have half a dozen cores...

woodruffw · on Aug 9, 2022

Why not? I just opened htop on my desktop, and I have 789 threads running over 117 processes, with 12 logical cores.

mardifoufs · on Aug 9, 2022

> I totally agree with you, but are we really expecting "a typical PC" to have 10+ threads?

I think you are mixing hyperthreading, or SMT with regular "software" threading

astrange · on Aug 9, 2022

They're mixing parallelism and concurrency. (nb: I might be abusing these terms too)

Parallelism aka CPU-bound tasks are limited by the number of cores you have. Concurrency aka IO-bound tasks are not, because they're usually not all runnable at once. It can be faster to go concurrent even on a single core because you can overlap IOs, but it'll use more memory and other resources.

Also, "going faster" isn't always a good thing. If you're a low priority system task, you don't want to consume all the system resources because the user's apps might need them. Or the the user doesn't want the fans to turn on, or it's a passive cooled system that shouldn't get too hot, etc.

And for both of them, it not only makes it easier to write bugs in unsafe languages, but in safe languages you can easily accidentally make things slower instead of faster just because it's complicated.

Jtsummers · on Aug 9, 2022

Rob Pike tried to create a distinction here: https://go.dev/talks/2012/waza.slide#1

Using his distinction, concurrency isn't about IO-boundedness (though that's a common use-case for it), but instead is about composing multiple processes (generic sense). They may or may not be running in parallel (truly running at the same time).

On a unix shell this would be an example of concurrency, which may or may not be parallel:

  $ cat a-file | sort | uniq | wc

Each process may run at the literal same time (parallelism), but they don't have to, and on a single core machine would not be executing simultaneously.

This hasn't fully caught on, though.

naniwaduni · on Aug 9, 2022

Yes, not catching on is a common problem when you invent distinctions not borne out by existing practice.

yvdriess · on Aug 9, 2022

A succinct way to distinguish both is to focus on what problem they solve:

> Concurrency is concerned about correctness, parallelism concerned about performance.

Concurrency is concerned about keeping things correct[1] when multiple things are happening at once and sharing resources. The reason why those problems arise might be for performance reasons, e.g. multiplexing IO over different threads. As such, performance is still a concern. But, your solution space still involves the thread and IO resources, and how they interleave.

Parallelism is in a different solution space: you are looking at the work space (e.g. iteration space) of the problem domain and designing your algorithm to be logically sub-dividable to get the maximum parallel speedup (T_1 / T_inf). Now, a runtime or scheduler will have to do the dirty work of mapping the logical subdivisions to hardware execution units, and that scheduler program is of course full of concurrency concerns.

[1] For the sake of pedantry: yes, parallelism is sometimes also used to deal with correctness concerns: e.g. do the calculation on three systems and see if the results agree.

nickysielicki · on Aug 9, 2022

If not today, definitely tomorrow.

Nursie · on Aug 9, 2022

> it's just that C developers avoid them.

I'm not sure it's fair to say C developers avoid hash tables - I've worked on several projects with hash-table implementations in them.

The 'problem' if there is one, is that such things are rarely picked up from any sort of standard library, and are instead implemented in each project.

I'm also not really sure what the problem is with 'resorting' to void*, it's part of the language. It's not 'safe' in that the compiler won't catch your errors if you stuff any old thing in there, but that's C.

> you would have to be a masochist to write heavily multi-threaded code in C

pthreads makes it relatively straightforward. I've seen (and written) fairly sophisticated thread-pool implementations around them.

weatherlite · on Aug 9, 2022

C noob here. Why isn't a hash table implementation merged into the c standard library? Is it because the stdlib has to be as thin as possible for some performance reason or something?

Nursie · on Aug 10, 2022

I did C for about 15 years....

Yeah C doesn't really go in for that sort of thing. The standard library tends to be much more about some minimal support for strings and interfaces to OS features like files, signals, memory allocation etc. It doesn't really provide much in the way of building blocks to be reused by application developers.

The recommendation out there on the net seems to be to look at Glib, which is used by gtk, for that sort of thing.

Another good alternative might be the NSPR - https://firefox-source-docs.mozilla.org/nspr/reference/index...

I used this way back in 2001-3 for a multi-platform project because it provides some good platform abstractions, and it looks like it has a hash-table implementation in amongst its other features.

weatherlite · on Aug 10, 2022

How was doing C - is it a rewarding career? What did u move to? Sorry for randomly asking this I'm contemplating moving from Ruby/Go to C because doing web for so long gets old...I'm not feeling like I'm deepening my knowledge anymore.

Nursie · on Aug 10, 2022

Honestly I'm happier where I am now, which is generally writing http APIs and cryptography related code in Java (with bits of Go and python thrown in).

Development in C is slow, fraught with unexpected pitfalls and you end up writing an awful lot of stuff from scratch. While this is satisfying in some ways, I find the modern paradigms of the languages I work in now to be more fulfilling - yes you throw a lot of existing components together, but you also get to deliver functionality much more frequently.

There are also a lot of very old-school C shops out there, that don't believe in modern things like CI/CD, git, even automated testing. I'm sure there are good ones too, but there are a lot of dinosaurs in that arena. One of the last ones I contracted for (for about three weeks until I quit) responded to my usual first-day question of "OK, how do we build this product?" with "Oh don't worry about that, we can sort that out later" and later never came.

That all said - I really enjoyed working on a couple of embedded devices. Seeing what you can achieve with 128kB of SRAM and 256kB of flash is challenging, and since I was a kid I've enjoyed making the computer do stuff. With embedded devices that have buzzers, leds, little displays etc, you get immediate feedback. And having to think more about things like memory allocation strategies does influence your thinking in (I think) a good way. You can definitely gain some deep knowledge!

weatherlite · on Aug 10, 2022

Do you think experience holds better the lower you go down the stack? Part of my frustration with web development - especially front end, is knowledge decays very fast there. I'm fine with learning new stuff but relearning the same thing all the time and losing my edge is a big annoyance. So part of my wanting to move lower down the stack is my belief that my knowledge and experience will hold up better there. So I'm considering either that or moving to backend work writing something like Java which I also perceive to be a very good investment. .

camel-cdr · on Aug 9, 2022

"void* and function pointers" behaves essentially the same as templates, assuming the compiler inlines or function clones the function called with constant expression arguments.

account42 · on Aug 9, 2022

All equivalent code has the same performance with a sufficiently smart compiler. In practice, compilers are often not sufficiently smart.

pjmlp · on Aug 9, 2022

Without the related typechecking.

jimbob45 · on Aug 9, 2022

What did you mean when you said C hashtables can’t be generic? Is the (void*) not an adequate solution?

woodruffw · on Aug 9, 2022

It depends on what you're trying to do. In general, marshaling everything through void pointers is possible, but it'll cost you in terms of both bug surface (it's much easier to make mistakes when you've opted out of C's weak types) and performance (you now have at least one mandatory pointer indirection, which is especially egregious when your underlying value type is something simple like an integer).

Anything you can do in C++, you can do in C. But C++ compilers will generally optimize semantically equivalent code better than C compilers will, because C++ gives the compiler more freedom.

enqk · on Aug 9, 2022

Another perfectly good solution is to treat a C hashtable as a sort of acceleration index. That C hashtable then, rather than holding pointers simply holds `hashkey -> i` where i is the position into a typed array.

I.e. your data structure is like this:

  generic_hashtable Hash; // (hash->int)
  Foo *values; // where my typed data is.

BenFrantzDale · on Aug 9, 2022

Using void* means the compiler (almost certainly?) can’t see through it to optimize. More importantly, it looses you type safety and the self-documentation that flat_hash_map<K, V> gives you.

marcosdumay · on Aug 9, 2022

There is no obvious way to get a hash from (void*).

And yes, C developers also tend to avoid ordered containers.