> Any C alternative will be expected to be on par with C in performance. The pro...

jiggawatts · on Aug 9, 2022

Not to mention that both C++ and Rust can specialise algorithms and containers for specific types, whereas in C most developers resort to void* and function pointers. It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.

For example, typical C programs also don't use hashtables even when this makes the most sense, causing weird performance cliffs due to O(n^2) algorithms all over the place. Why not hashtables? Because they're not generic, so they're a pain to use. Not impossible of course, it's just that C developers avoid them.

Similarly, "strongly typed" containers full of simple struct types enable compiler auto-vectorisation that's often unavailable in C for the same kind of reason.

Last but not least, you would have to be a masochist to write heavily multi-threaded code in C... so hardly anybody does. These days, that's throwing away over 90% of the computer power in a typical PC, let alone a server.

dagw · on Aug 9, 2022

It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.

This was equally true back when C vs Fortran was the big debate, and something not easily captured in benchmarks. C, as written by an expert in high performance C, was equally fast as Fortran written by an expert in high performance Fortran. C, as written by a domain expert with limited programming skills, was often very much slower than Fortran written by a domain expert with limited programming skills.

mumblemumble · on Aug 9, 2022

This actually reminds me a bit of an old competition between two Microsoft MVPs comparing C++ and C#, where they went back and forth optimizing their respective versions of a model program, and discussing the optimizations they made.

The gist, as I recall it was: the initial, idiomatic, written-for-maintainability version of the C# program was significantly faster than the C++ equivalent. Up until the end, the C# version also generally needed to go through less heroics to keep up with the C++ version. Eventually, the final C++ version did end up being faster than the fastest C# one, but, considering what needed to be done to get there, it was a decidedly Pyrrhic victory.

One huge mitigating factor, though, is that the model program was doing something business-y. I doubt C++ would have had such a hard time of it if it had been a number crunching or systems program.

water8 · on Aug 10, 2022

C# is closer to Java than C++. It’s a garbage collector language and nowhere near the performance of C++

mumblemumble · on Aug 13, 2022

So, one of the things they discovered as part of the back and forth was that C#'s generational garbage collector was actually an advantage. Because it made finding memory for a new object allocation O(1), while for C++ it was O(N).

That observation was actually key to the C++ version ultimately producing the fastest version. Chen replaced malloc() with an implementation that was tailored to the problem in question.

I guess the thing that I always find lacking in these discussions is a cost/benefit analysis. Yes, C++ will let you do things like that, and they will absolutely allow you to wring every last drop of performance out of what you're doing.

But, if you aren't in a situation where optimizing to that extent is cost-effective, and you're working in a business domain where frequent heap allocation of short-lived objects is what you do, so that idiomatic, standard C++'s default way of doing things is known to generally be not significantly better, and often slower, than some of the faster GC languages, then it's just possible that that you should go for the pragmatic option.

woodruffw · on Aug 9, 2022

Precisely. This is perhaps the strangest part of the original post: C++ has the same performance advantages as Rust! It has them not because it's more safe (although it is, in some regards), but because it allows programmers to express behaviors that the compiler can reason about statically.

marcosdumay · on Aug 9, 2022

C++ has a few of the performance advantages that Rust has.

And many that are negated due to a bad std API.

Ygg2 · on Aug 9, 2022

Interesting. Can you list an example of C++ type system allowing optimization Rust system doesn't?

tialaramex · on Aug 9, 2022

Rust's assumption is that it's the compiler's job to reject all wrong programs (usually with a helpful diagnostic). In C++ the assumption is that it's the compiler's job to permit all correct programs.

You obviously ideally want both, but that's not actually possible when you have a language this powerful. So, Rest's choice means sometimes (more rarely these days but it can happen) you will write a program that is correct, but the compiler doesn't believe you and rejects your program, you will need to alter it, perhaps after alterations it's actually nicer, but equally perhaps you feel this made it uglier or slower, nevertheless you have no choice in Rust (well, you could try waiting a few years, the compiler gets smarter)

However the C++ choice means sometimes (maybe even often) you will write a program that isn't correct and the compiler gives you no indication whatsoever that there's a problem, you get an executable or object file or whatever out, but what it does is completely arbitrary. Maybe it works how you expected... until it doesn't.

The magic phrase in the C++ standard is "Ill-formed, no diagnostic required". For example suppose you try to sort some floats in C++ 20. That's ill-formed (floats aren't in fact Totally Ordered but the function signature says you promise they are) and no diagnostic is required for... whatever it is your program now does. Maybe it crashes, maybe it works fine, not their problem, good luck with that.

Now, probably if all your floats are like boring normal finite reals like -2.5 or something this will work fine, there's no practical reason it wouldn't, but who knows, the C++ language denies all responsibility. So it gets to be very "optimal" here since it can do whatever it wants and it's your fault.

proto_lambda · on Aug 9, 2022

To expand on your float sorting example, sorting a slice[1] in Rust requires the element type to implement the Ord trait, i.e. be totally ordered. Trying to sort a slice of floats will result in a compiler error, even though it might be totally fine as long as all your floats are "ordinary".

Instead, to sort a slice of floats, you have to explicitly specify what would happen for the non-ordinary cases; e.g. by using `.sort_by(f32::total_cmp)`, where f32::total_cmp()[2] is one possible interpretation of a total ordering of floats. This requires writing more code even for cases where it would be completely unnecessary.

[1]: https://doc.rust-lang.org/std/primitive.slice.html#method.so... [2]: https://doc.rust-lang.org/std/primitive.f32.html#method.tota...

Ygg2 · on Aug 9, 2022

So rather than introducing a hard to detect bug (with NaN, Inf, -Inf), Rust makes me think about it and not just let whoever worked on compiler decide.

How is this a negative? I'd rather program fail at compile than runtime, and rather it fail loudly than quietly.

Also Rust doesn't prevent you from making optimal ordering, just a tinge more verbose.

alpaca128 · on Aug 9, 2022

I also like this priority in Rust, which constantly makes me wonder why the developers allowed shadowing. It has already caused runtime bugs for me while the compiler didn't even throw a warning about it, and as Rust is otherwise so strict about making possible mistakes like this explicit it's definitely not the first cause I consider when debugging.

proto_lambda · on Aug 9, 2022

While I think shadowing is great for code readability and I've never encountered a bug caused by it, you can always make sure clippy doesn't let you do it by putting a `#![deny(clippy::shadow_reuse, clippy::shadow_same, clippy::shadow_unrelated)]` at the top level of your crate.

tialaramex · on Aug 9, 2022

Like proto I've never had this happen, even though I was initially sceptical until I found myself writing stuff like (real examples more complicated hence decision to break them down)

  let geese = something(lots_of_birds).bunch().of_chained().functions();

  let geese = geese.somehow().just().count_them(); // We don't actually need geese, just #

Could you name that first variable something else? Yeah. But, it's geese, it's not the number of geese, it's a different type, but it is just geese, that's the right name for it. OK, maybe rename the second variable? But number_of_geese is a stupid variable name, I would push back on a patch which tried to name a variable that because it's stupid. n_geese isn't stupid, but it is ugly and Rust is OK with me just naming it geese, so, geese it is.

However, if you do run into trouble those Clippy rules can save you. You probably will find you don't want them all (or perhaps any of them) at deny, but Rust is content for you to decide you only want a warning (which you can then suppress where appropriate) and importantly these are three rules, you might well decide you only hate shadow_same or shadow_reuse or something. Here's the link specifically for shadow_reuse as an example:

https://rust-lang.github.io/rust-clippy/master/#shadow_reuse

proto_lambda · on Aug 9, 2022

> I'd rather program fail at compile than runtime, and rather it fail loudly than quietly.

I agree! I was just illustrating the kind of tradeoff that has to be made for that to be possible.

afiori · on Aug 9, 2022

I don't think you could do [0] in Rust, but would be very interested in finding out otherwise.

[0] https://capnproto.org/news/2015-03-02-security-advisory-and-...

proto_lambda · on Aug 9, 2022

Not on stable, but you can on nightly (though it's still quite wonky): https://play.rust-lang.org/?version=nightly&mode=debug&editi...

whimsicalism · on Aug 9, 2022

I don't think they suggested this.

tick_tock_tick · on Aug 9, 2022

template<size_t n> struct example { int e[n]; };

GrumpySloth · on Aug 9, 2022

Rust:

  struct Example<const N: usize> {
      pub e: [i32; N],
  }

smitty1e · on Aug 9, 2022

One cannot imagine C++ failing to emulate Rust's borrow checker by the end of the decade, even if compiler support is required.

Has C++ ever failed to snarf a feature?

tialaramex · on Aug 9, 2022

Safety features. The committee are, perhaps unconsciously, biased against safety on the presumption (seen in many comments here on HN) that safer has to mean lower performance.

But part of the impetus for Carbon is that WG21 (the C++ Standards Committee) rejected proposals that C++ should focus on better performance and safety. So maybe performance is no longer important either. What's left?

Where they've taken things which might appear on the surface to be modelled on a safer Rust feature, usually the committee insists they be made unsafe. For example suppose I call a Rust function which might return a char or might not, it returns Option<char> and if I'm an idiot and I try to treat that as a char, it doesn't type check because it isn't one, I need to say what I'm going to do when it isn't or else that won't compile.

You can write that in modern C++... except it can automatically try to take the char (which isn't there) out of the empty optional structure and that's Undefined Behaviour. So whereas the Rust prevents programmers from making easy mistakes, the C++ turns those into unexploded bombs throughout your code.

bluGill · on Aug 9, 2022

Many on the C++ committee are interested in the borrow checking, but are not sure how to make it work in C++. The hard part is they cannot break compatibility with code that is legal with previous versions of C++. If there is even one pathological case where the borrow checker will reject code that doesn't have a memory leak then they will not accept it, and require whoever proposes this borrow checker to prove the absence of such a thing. (note if it rejects code that worked until the leaks mean you run out of memory they will accept that). I don't know if such a thing even exists, but if it does I'm confident that in Rust it is new code that you can write differently to avoid the bug, while with C++ that may be a very massive effort to figure out 25 year old code nobody understands anymore before you can rewrite it.

One obvious corner case: It is very common to allocate a buffer at startup and let the system clean it up when the program exits. (often this is embedded cases where the only way for the program to exit is power off). I don't know how you do this in rust (if you can - I'm not a rust expert)

dwattttt · on Aug 9, 2022

Box::leak. Assuming the object you're leaking contains no references, it can then live for 'static (the lifetime of the process)

alpaca128 · on Aug 9, 2022

> allocate a buffer at startup and let the system clean it up when the program exits

This is possible with the lazy_static library or (not yet in stable Rust) OnceCell. It allows you to allocate & initialize any datastructure once during runtime and get global read-only access.

GuB-42 · on Aug 9, 2022

I don't know Rust but I know C++.

And C++ has the potential to be faster than C, mostly thanks to metaprogramming (templates, ...). It is horrible if you have to do it, but if you are just using the standard library, you don't have to feel the pain but still take advantage of it. That's how algorithms are implemented. Because so much is known at compile time, optimizers can do a lot.

The reason C++ is generally regarded as slower is that C++ programmers tend to create objects on the heap all the time because constructors and destructors make it easy. Modern C++ also discourages raw pointers and so you get references counters all over the place, essentially turning C++ into a garbage collected language. I am not saying it is bad, but it certainly impacts performance.

But if you manage your memory in C++ just as you do in C, keeping track of all your buffers reusing them, and not using more than necessary, I can easily see C++ beat C.

mort96 · on Aug 9, 2022

> Modern C++ also discourages raw pointers and so you get references counters all over the place, essentially turning C++ into a garbage collected language.

This doesn't match my experience. It's true that modern C++ discourages owning raw pointers, but the solution is usually unique_ptr, not shared_ptr. Truly shared ownership is actually pretty uncommon IME, usually you can have one entity which obviously "owns" the object and then any other reference to it can be non-owning.

It's also worth noting that with std::move, actually changing the refcount of a share_ptr can be pretty rare even if you do have shared ownership.

bregma · on Aug 9, 2022

This is not my experience. Most developers are just not very good at what they do, and the go-to smart pointers for not-very-good C++ developers is std:shared_ptr<T>.

saidinesh5 · on Aug 9, 2022

This has been my experience as well - especially when C++11 came out. I have seen codebases where it has been "use std::shared_ptr for everything, becuase it is safer if/when we use threads". I know that doesn't make sense, but it just was the attitude back then.

Tbh, Back then, I didn't see a problem with it. Once i started chasing down weird bugs where objects aren't freed properly because no one knew which objects own what, I have been very cautious.

bluGill · on Aug 9, 2022

We had a few developers like that here when C++11 was introduced, but a few people gave them the smack down and now we rarely see shared pointers.

mort96 · on Aug 9, 2022

Hmm, that might be. Most of the C++ I've seen has been in LLVM, Google projects, projects where I'm the only developer or projects where I laid the groundwork which other people build upon, so I'm probably not mainly looking at the kind of code bases you're talking about.

lupire · on Aug 9, 2022

OK but those people aren't going to magically write safe performant C either.

jstimpfle · on Aug 9, 2022

unique_ptr is pretty bad for performance as well. It is more complicated to use compared to raw pointers and encourages an OOP object-per-object piecemal code and data architecture. I've never seen a C++ program making use of unique_ptr that didn't give a strong smell of enterprise programming.

mort96 · on Aug 9, 2022

There's nothing more complicated about using unique_ptr than a raw pointer, it just expresses who's responsible for calling `delete` explicitly in code rather than implicitly through program flow.

jstimpfle · on Aug 9, 2022

There's nothing complicated? You have to 1) #include <memory> 2) Write "std::unique_ptr<My_Foo_Type> foo" instead of just "My_Foo_Type *foo" in every definition. 3) Are required to define My_Foo_Type as a class with a separate deleter, or provide a deleter template argument at each declaration. 4a) write "foo.get()" in various places instead of just "foo". or 4b) lend around the unique_ptr in various places, breaking modularization and increasing build times. 5) Be stuck with a non-POD type that you can't just memcpy() around. 6) enjoy the worse runtime because your program has just been artifically compartmentalized even more!

Sometimes you C++ guys are just blinded by the tale of "zero-cost abstractions".

unique_ptr, like the idea of RAII in general, binds together what should be separate. Data schemas and physical layout on the one hand, and memory and lifetime management on the other hand. What you get as a result is what you deserve: The idea of "more safe and maintainable" where the "more" isn't added to the equivalent non-RAII program. No, it is added to the more convoluted, less understandable, and thus inherently less safe and maintainable program. Who knows what the bottom line is (in my experience often safety is a bit better but I pray for you if you need to debug a problem, and maintainability is much worse), but out of interest in my own sanity I know my preference.

usefulcat · on Aug 10, 2022

I really don’t see what the big deal is? Generally the only time you should be returning or passing around a unique_ptr is when you’re actually transferring ownership of the referenced object. Otherwise just dereference it and pass around a reference to the underlying object.

kaba0 · on Aug 10, 2022

> Data schemas and physical layout on the one hand, and memory and lifetime management on the other hand

How are they separate? Like, that’s what Rust does pretty explicitly, with great results.

jstimpfle · on Aug 10, 2022

I'm not following, what is Rust doing exactly? Coupling schema / layout with lifetime management? If that's what you mean I would like to disagree about the "great results" because of a gut feeling, and possibly the disagreement could in theory be justified with build times, or viewpoints on maintainability or whatever. But unfortunately I have no basis for doing so. I don't understand Rust well. And have very little experience, expect failing at compiling some projects and their 500 dependencies a couple times...

bregma · on Aug 9, 2022

Use correctly std::unique_ptr<T> has no measurable impact on performance compared with the equivalent non-smart-pointer code. You use std::unique_ptr<T> to indicate ownership, and pass raw pointers around to indicate non-ownership. That approach has the strong smell of a good programmer using the right tool for the job, especially considering the job is to communicate intent to the future reader.

It's like the classic argument against using exceptions: compared with the traditional C method of completely ignoring error conditions and not checking status, they're much slower.

account42 · on Aug 9, 2022

> Use correctly std::unique_ptr<T> has no measurable impact on performance compared with the equivalent non-smart-pointer code.

One wart of unique_ptr (and other smart pointers) is that it cannot be passed in a register when used as a function parameter, at least with the System V ABI used on Linux.

Also, the caller is responsible for destruction and there is no way to specify that a function always "consumes" a unique_ptr so the compiler cannot eliminate the destructor code: https://godbolt.org/z/sz79GoETv

Of course if the compiler can inline the call or at least controls both and can clone the function with a custom calling convention then that doesn't have to be a problem. But it still sucks that even something as seemingly simple as a tiny wrapper around a pointer does come with a cost.

jstimpfle · on Aug 9, 2022

That's the point. As a rule of thumb, fine-grained ownership is a very bad idea. It makes your program into a mess, which will be slow and make your program hard to understand. The slow part applies in any case, whether you have to suffer it in code (as you do with C) or not (as in many other languages that allow you to make even more of a mess).

As a C programmer, I try to avoid tracking ownership in separate struct member fields. I try to make central data structures that keep care of the tracking. Cleaning up shouldn't happen pointer-by-pointer. Usually a much bigger context has a shared lifetime, so there is no point in splitting stuff up in individually tracked "objects". Instead you just track a bigger block of memory.

anonymoushn · on Aug 9, 2022

Are the answers to this stackoverflow question incorrect? https://stackoverflow.com/questions/58339165/why-can-a-t-be-...

vitus · on Aug 9, 2022

> unique_ptr is pretty bad for performance as well.

Do you mean in terms of cache locality because it's heap-allocated instead of stack-allocated, or are you actually commenting on the overhead of copying some extra ints and invoking the destructor?

Because it's certainly correct that just because you can use a unique_ptr, doesn't mean you should. ("A std::unique_ptr is used for expressing transfer of ownership. If you never pass ownership elsewhere, the std::unique_ptr abstraction is rarely necessary or appropriate." - https://abseil.io/tips/187)

a_t48 · on Aug 9, 2022

Safety is a good reason. I like protection against leaks and use after free. If I’m already allocating I’m not going to worry about the little bit of extra performance cost the abstraction might have.

vitus · on Aug 9, 2022

To be clear: I'm not advocating for the use of `new` / `delete` over unique_ptr. But if you're creating a local object that never has to leave the current scope (or a member variable that's never moved in or out), there's no benefit to using a unique_ptr instead of creating the object directly on the stack or inline as part of your class, where the object's lifetime is bound to the scope, and your destructor is automatically run when its containing scope is cleaned up.

As an added bonus, you don't actually have to do a separate heap allocation.

a_t48 · on Aug 10, 2022

I agree! You should use a regular object if possible, I’d never suggest otherwise. The rare exceptions I’ve run into are annoying initialization order issues (usually code that I didn’t have the time/knowledge/political credits to refactor) and large arrays that blow the stack.

pjmlp · on Aug 9, 2022

As of C++17 not so horrible, and C++2x versions even less so, unless one has some strange fetisch for SFINAE and tag dispatch.

Since 1993, I never saw any need to keep bothering with C other than having it imposed on me, C++ had enough C89 subset on it, if I ever miss coding like C and its warts.

Nowadays that compatibility is up to C11 subset.

flohofwoe · on Aug 9, 2022

> Nowadays that compatibility is up to C11 subset.

Not true unfortunately, the "C subset" is still stuck at something that can at best be called a fork of "C95" which was then developed into a "bastard language" that resembled C on the surface, but isn't actually C (e.g. the incomplete designated init support in C++20 is the best example of this half-assed "looks like C, but isn't actually C" philosophy).

pjmlp · on Aug 9, 2022

That is why it is called a subset, duh!

jstimpfle · on Aug 9, 2022

> It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.

On the other hand, empirically, it is not unusual to see straightforward C programs being dramatically faster than comparable C++ programs written in enterprise style, and to also build much faster.

> Last but not least, you would have to be a masochist to write heavily multi-threaded code in C

You have to be a masochist to write heavily multi-threaded code that uses a lot of ad-hoc synchronization with mutexes and atomics. As it turns out, for many many tasks, it's also a spectacular bad way to go about parallelization, because mutexes are the _opposite_ of parallelization.

As a rule of thumb, do coarse-grained concurrency. Install a few queues, come up with a job system, and it won't be hard to get parallization right in plain C at all. Writing in C is often a good idea because what's a bad idea to do on hardware coincedes pretty well with what is painful to write.

bluGill · on Aug 9, 2022

> On the other hand, empirically, it is not unusual to see straightforward C programs being dramatically faster than comparable C++ programs written in enterprise style, and to also build much faster.

Your only comparison cases are cases where the code in question was re-written in C. This most likely means that everyone already knew it was slow and so the re-write also fixed the fundamental problems. If the code had been rewritten in C++ it would also be faster - and since C++ allows some optimizations C doesn't it would be even faster. (it is known that if you switch from gcc to g++ your code often will run faster if it compiles)

There is a reason for enterprise style C++. Most of the time it is still fast enough, and it is a lot more maintainable.

jstimpfle · on Aug 9, 2022

> it is known that if you switch from gcc to g++ your code often will run faster if it compiles)

I've never heard such a claim, can you back it up? And what does it say about the language?

> and it is a lot more maintainable

If you equate "maintainable" = readable, I've never once seen maintainable enterprise code. Everything is a convoluted mess that never gets anything done. Probably I haven't worked at the best shops, but then again, where are those? And why doesn't the language help mediocre programmers to write maintainable code?

I suspect that maintainability is almost exclusively a function of experience, not the programming language used. Experienced programmers do seem to agree that C-style C++ or even plain C is the way to go.

bluGill · on Aug 10, 2022

https://www.codeproject.com/questions/445035/c-vs-cplusplus-... has a long discussion. The short answer is C++ has stricter aliasing rules, and so the compiler can apply more optimization. This of course assumes that your C code is also valid C++ code (C is no a pure subset of C++), and you don't have those aliases - those apply to a lot of C programs but not all.

> And what does it say about the language?

C++ has a stronger type system. This is already known. You avoid a few bugs in C++ because of this. The type system isn't nearly as strong as Haskell.

> I've never once seen maintainable enterprise code. Everything is a convoluted mess that never gets anything done

Two sides of the same coin. While the code is convoluted, it often is doing a lot of things in a generic way. More straightforward code is possible, but only by created a lot more code, and quantity is itself convolution.

> And why doesn't the language help mediocre programmers to write maintainable code?

It does. However you have to be careful here. C++ is often used for very large problems that are also complex. I would never use python for something that is over 100,000 lines of code as you can't change anything anymore for fear that some case isn't covered in the tests and so you won't see that syntax error until months later. I maintain 15 million lines of C++ (and this isn't the largest C++ codebase I know of).

Not, I'm not arguing that C++ is a great language. It has a lot of inconsistencies, and foot guns. However it is still the best language I know for very large, very complex programs. (Note that I do not know ADA or Rust, two that often come up in context of very large, very complex programs. I would not be surprised if they are better. That C++ is better known that others is itself an advantage to C++)

> I suspect that maintainability is almost exclusively a function of experience, not the programming language used.

Sort of. As I said before languages like Python are out of the running for very large programs because they are not compiled and so you can get runtime errors. There are also intentionally impossible to write languages that we can throw out even sooner. However there are for sure other languages that can play in the very large program space. So long as we limit ourself to languages that play in the very large program space, experience is the largest factor.

flohofwoe · on Aug 9, 2022

> become dramatically faster when rewritten in a more modern language

IME that's mostly a myth though. A C compiler will stamp out a specialized version just as well if it can see all the relevant function bodies (either via inlining or LTO).

"Zero cost abstraction" isn't just a C++ thing, it happens mostly in the language agnostic optimizer passes. For instance the reason why std::sort() shows up faster in benchmarks than C's qsort() is simply because std::sort() implementation is all inline template code, not because of some magic performance-enhancing qualities of the C++ template system.

planede · on Aug 9, 2022

inlining only goes so far. You won't get full of qsort to be inlined, and if it's not inlined, it needs to be at least cloned to be on par with std::sort, so the comparator function could get const-propagated.

AFAIK out of the major compilers, gcc has the most aggressive cloning, but it's still nowhere near to const propagate the comparator from qsort. With std::sort with a stateless comparator function object (such as std::less, which is the default), you get this for free*.

* of course this is not entirely free, as this is more prone to code bloat. But, you can always type-erase the comparator, and use a function pointer, or std::function, if this ever becomes a problem. But you can't convince a C compiler to const propagate the comparator in qsort all the way through, if the optimizer chooses that it doesn't worth it.

GrumpySloth · on Aug 9, 2022

glibc qsort's implementation is in libc.so, not in the header. GCC doesn't have anything to work with.

It's also an apples-to-oranges comparison, since std::sort and qsort implement different algorithms.

A lot of std::sort's performance is actually from using the version without any callbacks. If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback. Theoretically the compiler should make the two cases the same, but apparently GCC is too dumb (that's not a slight on GCC; I think people expect too much from compilers):

  ------------------------------------------------------------------------
  Benchmark                              Time             CPU   Iterations
  ------------------------------------------------------------------------
  std_sort_random                 52881299 ns     52873089 ns           14
  std_sort_with_callback_random   63319633 ns     63307876 ns           11
  qsort_random                   106803314 ns    106784567 ns            7
  external_sort_random            97642851 ns     97640888 ns            7
  std_sort_sorted                  8433311 ns      8432564 ns           82
  std_sort_with_callback_sorted   13868016 ns     13865170 ns           50
  qsort_sorted                    28098439 ns     28093720 ns           26
  external_sort_sorted            33629020 ns     33628108 ns           21

external_sort is just std::sort hidden behind an extern function implemented in a separate .o file. Those benchmarks are from sorting 1MB of random and already-sorted data (as indicated in the names). I think it's important to test such cases, because often online people benchmark code which is written all in a single file, whereas real-life C++ projects are usually organized in such a way that every little class is in its own little file, which gets compiled into a separated object file, and then it all gets linked together without LTO. And then those same people go on to claim performance benefits of their language without actually using the setup which enables those benefits, which IMO is a bit dishonest.

When I drill further down into everything I want to drill into, maybe I'll publish the source for the benchmarks somewhere.

planede · on Aug 23, 2022

> If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback.

This is not true. `std::sort`'s default comparator is a `std::less` object. The advantage comes from using a stateless callback functor object. If you pass a capture-less lambda instead of a function pointer, you can reap the same benefits as using the default comparator. Even if that capture-less lambda just forwards to a specific free function anyway.

In short, `std::sort(beg, end, [](auto x, auto y) { return foo(x,y); })` can be faster than `std::sort(beg, end, &foo)`.

enqk · on Aug 9, 2022

check this out: https://github.com/WebKit/WebKit/blob/main/Source/bmalloc/li...

temac · on Aug 9, 2022

Interesting but I'm not sure about the relevancy to the above comment.

On a sidenote, it has weird claims:

> obviating the need for ownership type systems or other compiler approaches to fixing the type-safety of use-after-frees. This means that we need one heap per type, and be 100% strict about it.

enqk · on Aug 9, 2022

I guess I have to quote from it, then:

---

C lets you do most of what C++ can if you rely on always_inline. This didn't used to be the case, but modern C compilers will meat-grind the code with repeated application of the following things:

- Inlining any always_inline call except if it's recursive or the function uses some very weird features that libpas doesn't use (like goto pointer).

- Copy-propagating the values from the callsite into the function that uses the value.

Consequently, passing a function pointer (or struct of function pointers), where the pointer points to an always_inline function and the callee is always_inline results in specialization akin to template monomorphization.

This works to any depth; the compiler won't be satisfied until there are no more always_inline function calls. This fortuitous development in compilers allowed me to write very nice template code in C. Libpas achieves templates in C using config structs that contain function pointers -- sometimes to always_inline functions (when we want specialization and inlining) and sometimes to out-of-line functions (when we want specialization but not inlining). Additionally, the C template style allows us to have true polymorphic functions. Lots of libpas slow paths are huge and not at all hot. We don't want that code specialized for every config. Luckily, this works just fine in C templates -- those polymorphic functions just pass around a pointer to the config they are using, and dynamically load and call things in that config, almost exactly the same way that the specialized code would do. This saves a lot of code size versus C++ templates.

camel-cdr · on Aug 9, 2022

qsort only isn't inline, because libcs don't supply an inline definition. If you write your own qsort, then you'll see it getting inlined and/or function cloned for different types.

camel-cdr · on Aug 9, 2022

The only real difference between qsort and std::sort in terms of code generation, is that for std::sort the default assumption is to function clone and for qsort it is to generate the full slow function. Now the compiler will in most cases detect that qsort can be cloned or inlined, but sometimes it might decide not to and the fallback is, in most cases slower then the C++ fallback.

PS.: I'm just annoyed that my generic C hashtable that is written in a qsort style doesn't get function copied/inlined when it's used for more than one type.

account42 · on Aug 9, 2022

You can't just ignore that qsort is implemented in a different library just because it could be implemented inline in the header.

JonChesterfield · on Aug 9, 2022

It could and it should, but function specialisation without inlining is still on the wish list for llvm last I checked.

cmeacham98 · on Aug 9, 2022

I totally agree with you, but are we really expecting "a typical PC" to have 10+ threads?

aerxes · on Aug 9, 2022

Gonna beat a dead horse here, but >50% of PCs that are surveyed by Steam have 12 threads or more.

That’s PCs that have steam installed at all.

Intel’s bare minimum current-gen i3 processor has 12 threads. That’s the absolute cheapest desktop-level processor you can get.

Your phone probably has 6 cores (though not 12 threads).

So yes, if you’re writing code for desktop hardware, it’s safe to assume you have at least 8 threads. Maybe you don’t want to consume all of them, but it’s better to let the OS handle scheduling.

https://www.techspot.com/article/2363-multi-core-cpu/

jcelerier · on Aug 9, 2022

Gaming is very much not representative. There's roughly 120M active steam users, vs. ~1.4 billion windows installs.

If I look around me, for instance in my whole family we're two with Steam installed but ever household has a desktop or a laptop (and generally a 7-8 years old cheap entry-level 350€ one, you'd be hard-pressed to find even a quad-core in there)

rascul · on Aug 9, 2022

> and generally a 7-8 years old cheap entry-level 350€ one, you'd be hard-pressed to find even a quad-core in there

My $400 laptop from 2014 has a quad core processor.

aerxes · on Aug 9, 2022

I don’t know dude, if you want to write software for the worst performers instead of commodity hardware that’s up to you

Just that single core systems are dying and probably won’t come back. Even Raspberry Pi’s are quad core now.

jcelerier · on Aug 9, 2022

> I don’t know dude, if you want to write software for the worst performers instead of commodity hardware that’s up to you

I want to write software that people, most people, not only those with SV salaries able to buy a computer every year, can use.

Here's the current best seller laptop in Amazon in my country: https://www.amazon.fr/Dell-Inspiron-i5-1135G7-Ordinateur-por...

It's half past 2022 and the most sold laptop here in France, 7th-ranked in GDP, has 8 gigabytes of RAM and 4 cores. This is what the real world looks like. (and just a year ago it was still 4GB of RAM iirc)

The second best sale is a gamer PC which is a bit better. https://www.amazon.fr/MILUI-Modulaire-Colosseum-Livraison-30...

The third https://www.amazon.fr/Ordinateur-Portable-Celeron-Resolution... comes with a CPU that does not support hyperthreading (so 4C/4T) and does not even support friggin AVX and goes to SSE4.2 at most.

That does not mean not making use of multiple cores of course, but a software should still be able to work on a single-core. Right now we only have certifications such as https://www.blauer-engel.de/en/productworld/resources-and-en... (see https://www.umwelt-campus.de/en/research/projekte/green-soft... for the methodology) but hopefully in a few years we can start making it first heavily discouraged and over time less and less viable to create resource-wasting software - in any case this is a thing I am asking of the people whom I vote for :-)

mwcampbell · on Aug 9, 2022

Thank you! Please keep pushing such certifications until they become regulations that, like GDPR, even we American developers cannot ignore. Then I can make a strong business case to move away from Electron in the product I'm currently working on.

Edit to add:

Related to your links to best-selling computers, I've been thinking about downgrading to a low-spec PC as my daily driver, and using a remote machine for the times that I truly need something powerful for a big compile or the like. That would force me to feel the users' pain. But how far should I go? Taken to the extreme, I could use a machine with a spinning rust hard drive (not SSD) and the bare minimum system requirements for Windows 10 or 11, and keep all the crapware on it to more accurately reflect the typical user's environment. But then, maybe I'd just be hurting myself for no benefit, since the pressure to value developer productivity over runtime efficiency would not actually go away in the absence of regulations.

aerxes · on Aug 9, 2022

I’m not advocating making software multithreaded only, since obviously that doesn’t make sense.

But, in many modern languages (including c++) multi threading 1. Doesn’t significantly detract from the performance of single core systems 2. Can massively improve the performance of multi core systems, even with 2 cores or more.

For appropriate applications, the memory overhead and the cost of the bootstrapping code for instantiating a worker thread should be dwarfed by the time of actually computing the task (we’re talking about actions 100ms or longer). Not using multiple threads when you could reasonably half or quarter that time (without needing to drop support for single-core systems) is just foolish. If you’re that worried about single core performance then maintain two code paths, but at least recognize that the majority of commodity systems sold today, including the ones you listed, have multiple threads available to them to do the work that have the most painful wait times.

jcelerier · on Aug 9, 2022

> Related to your links to best-selling computers, I've been thinking about downgrading to a low-spec PC as my daily driver,

my rule of thumb for the software I develop is - on my desktop computer (2016 intel 6900k, still plenty powerful) - there mustn't be any slowness / lag in any user interaction when built at -O0 with -fsanitize=address. This has ensured so far that said software had correct performance on optimized builds on a Raspberry Pi 3 in ARMv7 mode.

msla · on Aug 9, 2022

> Please keep pushing such certifications until they become regulations that, like GDPR, even we American developers cannot ignore.

People are apparently surprised at how easy it is to ignore the GDPR:

https://web.archive.org/web/20200813235643/http://slawsonand...

> Article 3(2), a new feature of the GDPR, creates extraterritorial jurisdiction over companies that have nothing but an internet presence in the EU and offer goods or services to EU residents[1]. While the GDPR requires these companies[2] to follow its data processing rules, it leaves the question of enforcement unanswered. Regulations that cannot be enforced do little to protect the personal data of EU citizens.

> This article discusses how U.S. law affects the enforcement of Article 3(2). In reality, enforcing the GDPR on U.S. companies may be almost impossible. First, the U.S. prohibits enforcing of foreign-country fines. Thus, the EU enforcement power of fines for noncompliance is negligible. Second, enforcing the GDPR through the designated representative can be easily circumvented. Finally, a private lawsuit brought by in the EU may be impossible to enforce under U.S. law.

[snip]

> Currently, there is a hole in the GDPR wall that protects European Union personal data. Even with extraterritorial jurisdiction over U.S. companies with only an internet presence in the EU, the GDPR gives little in the way of tools to enforce it. Fines from supervisory authorities would be stopped by the prohibition on enforcing foreign fines. The company can evade enforcement through a representative simply by not designating one. Finally, private actions may be stalled on issues of personal jurisdiction. If a U.S. company completely disregards the GDPR while targeting customers in the EU, it can use the personal data of EU citizens without much fear of the consequences. While the extraterritorial jurisdiction created by Article 3(2) may have seemed like a good way to solve the problem of foreign companies who do not have a physical presence in the EU, it turns out to be practically useless.

"Patching" that hole seems to require either action on the American side or, perhaps, a return to old-fashioned impressment or similar projection of Majestic European Power to Benighted Lands Beyond the Ocean Sea. /s

afiori · on Aug 9, 2022

The EU can fine US companies the same as it can fine most other extraterritorial companies, that is only if the other country allows it. The EU is not going to start an armed invasion over a GDPR violation.

Still big multinational companies will have international branches (Google, Amazon, Microsoft, ...) that can easily be fined in their host countries.

account42 · on Aug 9, 2022

The EU can also prevent companies from doing business in the EU if they don't follow the local laws. No need for an armed invasion if the EU can block all transfers from EU banks for anything related to your company.

afiori · on Aug 9, 2022

I think GP was referring to enforcing GDPR against companies that do not do business in the EU (no employment, no sales, no bank account, no revenue, no taxes, etc.).

For example, a company like Digital Ocean might have no assets of any kind in the EU (assuming that they don't own their European datacenters), so the EU cannot force them to pay a fine nor seize their assets; the EU could technically sanction them by stopping EU datacenter providers (like AWS-Germany) from renting compute to Digital Ocean, but maybe not for something like a GDPR violation.

rowanG077 · on Aug 9, 2022

You should always write software for the worst performers. Unless you have a very good reason not to. Writing for the top performers is how we got into the silly mess where computers from 30 years ago have much higher ux then now.

aerxes · on Aug 9, 2022

If we were arguing about designing vehicle safety testing suites for the worst performers (a very real problem that we have right now) we wouldn’t even be having this conversation.

Writing multithreaded applications increases the performance ceiling. If an application can’t take use of multiple threads, but is written in a multi-threaded way, there’s no harm done. It simply runs the multi threaded code in a single threaded way (think of ParArray) with a bit of overhead incurred for “becoming multithreaded”.

Reasoning out of adding multithreaded support for long running actions because “most systems can’t take use of the extra threads” is just irrational, especially since most modern commodity systems could have a linear improvement with the additional threads.

The single core systems are barely hurt by the memory overhead involved with provisioning CORE_NUM of worker threads. But the multi core systems can take massive advantages from it.

rowanG077 · on Aug 9, 2022

I don't disagree with your specific point here, it's easy to dynamically allocate threads based on system cores. But I disagree that you should write your code for a medium speced system.

aerxes · on Aug 9, 2022

That’s what debate’s about. I do recognize that caring about single threaded workloads and performance do contribute to snappier UI (and backwards compatibility).

rahen · on Aug 9, 2022

Writing for the top performers is the main reason why computers turn into e-waste. It's Wirth law applied.

I wish more developers would work on RPIs for the same reason.

aerxes · on Aug 9, 2022

The point that I’m making is that you can write multi threaded applications without dropping support for single core systems.

cmeacham98 · on Aug 9, 2022

This article doesn't say that, what it actually says is "Over 70% of Steam users have a CPU with 4 or more cores."

Steam doesn't even measure publicize information about threads on the survey, which makes it near impossible to check because not that long ago Intel locked out hyperthreading/SMT on their low/mid-grade CPUs.

Additionally, and more importantly: the Steam hardware survey _obviously_ doesn't represent the average consumer PC.

aerxes · on Aug 9, 2022

The data that it presents shows that >50% of PCs surveyed have 6 cores or more.

rob74 · on Aug 9, 2022

>50% of PCs which have Steam installed, i.e. are used for gaming. So the statistic is not really representative of all PCs out there...

loup-vaillant · on Aug 9, 2022

The fact remains that virtually all systems except perhaps old low-end phones now have more than one thread. Not going multi-thread for anything that makes the user wait leaves significant performance on the table.

Low end systems (4 threads or less) have less potential, but they also have the most need for speed, making multi-threading quite important. And high-end systems have more threads, so going multi-thread makes a bigger difference.

jiggawatts · on Aug 9, 2022

My laptop has 8 cores and 16 threads.

I'm about to buy a PC with 16 cores and 32 threads for "normal" money.

The AMD EPYC server CPUs scale to dual sockets with 64-cores each, for a whopping 256 hardware threads in a single box. That's not some sort of esoteric hyper-scale configuration, but completely ordinary off-the-shelf stuff you can get in large quantities from mainstream vendors.

A single-threaded application on a server like that will use between 0.5% to about 50% of the total available performance, depending on where its bottleneck is. It will never reach 100%!

This matters to things like CLI tools, batch jobs, and the like, many of which are written in C, especially in the Linux world. A case-in-point that demonstrates how much performance has been left on the table is ripgrep, which is a multi-threaded Rust replacement for grep.

piperswe · on Aug 9, 2022

My Ryzen 5 3600X (a years-old mid-range CPU) has 6 cores and 12 threads, I wouldn't say that's atypical.

Asraelite · on Aug 9, 2022

Today, it's debatable, but if we're talking about programming languages for the future then the future is what's relevant. I don't think it will be long before 50+ thread CPUs are common. Multithreading won't be a nice-to-have feature, it will be a necessity.

aldanor · on Aug 9, 2022

Given that many phones these days have half a dozen cores...

woodruffw · on Aug 9, 2022

Why not? I just opened htop on my desktop, and I have 789 threads running over 117 processes, with 12 logical cores.

mardifoufs · on Aug 9, 2022

> I totally agree with you, but are we really expecting "a typical PC" to have 10+ threads?

I think you are mixing hyperthreading, or SMT with regular "software" threading

astrange · on Aug 9, 2022

They're mixing parallelism and concurrency. (nb: I might be abusing these terms too)

Parallelism aka CPU-bound tasks are limited by the number of cores you have. Concurrency aka IO-bound tasks are not, because they're usually not all runnable at once. It can be faster to go concurrent even on a single core because you can overlap IOs, but it'll use more memory and other resources.

Also, "going faster" isn't always a good thing. If you're a low priority system task, you don't want to consume all the system resources because the user's apps might need them. Or the the user doesn't want the fans to turn on, or it's a passive cooled system that shouldn't get too hot, etc.

And for both of them, it not only makes it easier to write bugs in unsafe languages, but in safe languages you can easily accidentally make things slower instead of faster just because it's complicated.

Jtsummers · on Aug 9, 2022

Rob Pike tried to create a distinction here: https://go.dev/talks/2012/waza.slide#1

Using his distinction, concurrency isn't about IO-boundedness (though that's a common use-case for it), but instead is about composing multiple processes (generic sense). They may or may not be running in parallel (truly running at the same time).

On a unix shell this would be an example of concurrency, which may or may not be parallel:

  $ cat a-file | sort | uniq | wc

Each process may run at the literal same time (parallelism), but they don't have to, and on a single core machine would not be executing simultaneously.

This hasn't fully caught on, though.

naniwaduni · on Aug 9, 2022

Yes, not catching on is a common problem when you invent distinctions not borne out by existing practice.

yvdriess · on Aug 9, 2022

A succinct way to distinguish both is to focus on what problem they solve:

> Concurrency is concerned about correctness, parallelism concerned about performance.

Concurrency is concerned about keeping things correct[1] when multiple things are happening at once and sharing resources. The reason why those problems arise might be for performance reasons, e.g. multiplexing IO over different threads. As such, performance is still a concern. But, your solution space still involves the thread and IO resources, and how they interleave.

Parallelism is in a different solution space: you are looking at the work space (e.g. iteration space) of the problem domain and designing your algorithm to be logically sub-dividable to get the maximum parallel speedup (T_1 / T_inf). Now, a runtime or scheduler will have to do the dirty work of mapping the logical subdivisions to hardware execution units, and that scheduler program is of course full of concurrency concerns.

[1] For the sake of pedantry: yes, parallelism is sometimes also used to deal with correctness concerns: e.g. do the calculation on three systems and see if the results agree.

nickysielicki · on Aug 9, 2022

If not today, definitely tomorrow.

Nursie · on Aug 9, 2022

> it's just that C developers avoid them.

I'm not sure it's fair to say C developers avoid hash tables - I've worked on several projects with hash-table implementations in them.

The 'problem' if there is one, is that such things are rarely picked up from any sort of standard library, and are instead implemented in each project.

I'm also not really sure what the problem is with 'resorting' to void*, it's part of the language. It's not 'safe' in that the compiler won't catch your errors if you stuff any old thing in there, but that's C.

> you would have to be a masochist to write heavily multi-threaded code in C

pthreads makes it relatively straightforward. I've seen (and written) fairly sophisticated thread-pool implementations around them.

weatherlite · on Aug 9, 2022

C noob here. Why isn't a hash table implementation merged into the c standard library? Is it because the stdlib has to be as thin as possible for some performance reason or something?

Nursie · on Aug 10, 2022

I did C for about 15 years....

Yeah C doesn't really go in for that sort of thing. The standard library tends to be much more about some minimal support for strings and interfaces to OS features like files, signals, memory allocation etc. It doesn't really provide much in the way of building blocks to be reused by application developers.

The recommendation out there on the net seems to be to look at Glib, which is used by gtk, for that sort of thing.

Another good alternative might be the NSPR - https://firefox-source-docs.mozilla.org/nspr/reference/index...

I used this way back in 2001-3 for a multi-platform project because it provides some good platform abstractions, and it looks like it has a hash-table implementation in amongst its other features.

weatherlite · on Aug 10, 2022

How was doing C - is it a rewarding career? What did u move to? Sorry for randomly asking this I'm contemplating moving from Ruby/Go to C because doing web for so long gets old...I'm not feeling like I'm deepening my knowledge anymore.

Nursie · on Aug 10, 2022

Honestly I'm happier where I am now, which is generally writing http APIs and cryptography related code in Java (with bits of Go and python thrown in).

Development in C is slow, fraught with unexpected pitfalls and you end up writing an awful lot of stuff from scratch. While this is satisfying in some ways, I find the modern paradigms of the languages I work in now to be more fulfilling - yes you throw a lot of existing components together, but you also get to deliver functionality much more frequently.

There are also a lot of very old-school C shops out there, that don't believe in modern things like CI/CD, git, even automated testing. I'm sure there are good ones too, but there are a lot of dinosaurs in that arena. One of the last ones I contracted for (for about three weeks until I quit) responded to my usual first-day question of "OK, how do we build this product?" with "Oh don't worry about that, we can sort that out later" and later never came.

That all said - I really enjoyed working on a couple of embedded devices. Seeing what you can achieve with 128kB of SRAM and 256kB of flash is challenging, and since I was a kid I've enjoyed making the computer do stuff. With embedded devices that have buzzers, leds, little displays etc, you get immediate feedback. And having to think more about things like memory allocation strategies does influence your thinking in (I think) a good way. You can definitely gain some deep knowledge!

weatherlite · on Aug 10, 2022

Do you think experience holds better the lower you go down the stack? Part of my frustration with web development - especially front end, is knowledge decays very fast there. I'm fine with learning new stuff but relearning the same thing all the time and losing my edge is a big annoyance. So part of my wanting to move lower down the stack is my belief that my knowledge and experience will hold up better there. So I'm considering either that or moving to backend work writing something like Java which I also perceive to be a very good investment. .

camel-cdr · on Aug 9, 2022

"void* and function pointers" behaves essentially the same as templates, assuming the compiler inlines or function clones the function called with constant expression arguments.

account42 · on Aug 9, 2022

All equivalent code has the same performance with a sufficiently smart compiler. In practice, compilers are often not sufficiently smart.

pjmlp · on Aug 9, 2022

Without the related typechecking.

jimbob45 · on Aug 9, 2022

What did you mean when you said C hashtables can’t be generic? Is the (void*) not an adequate solution?

woodruffw · on Aug 9, 2022

It depends on what you're trying to do. In general, marshaling everything through void pointers is possible, but it'll cost you in terms of both bug surface (it's much easier to make mistakes when you've opted out of C's weak types) and performance (you now have at least one mandatory pointer indirection, which is especially egregious when your underlying value type is something simple like an integer).

Anything you can do in C++, you can do in C. But C++ compilers will generally optimize semantically equivalent code better than C compilers will, because C++ gives the compiler more freedom.

enqk · on Aug 9, 2022

Another perfectly good solution is to treat a C hashtable as a sort of acceleration index. That C hashtable then, rather than holding pointers simply holds `hashkey -> i` where i is the position into a typed array.

I.e. your data structure is like this:

  generic_hashtable Hash; // (hash->int)
  Foo *values; // where my typed data is.

BenFrantzDale · on Aug 9, 2022

Using void* means the compiler (almost certainly?) can’t see through it to optimize. More importantly, it looses you type safety and the self-documentation that flat_hash_map<K, V> gives you.

marcosdumay · on Aug 9, 2022

There is no obvious way to get a hash from (void*).

And yes, C developers also tend to avoid ordered containers.

cesarb · on Aug 9, 2022

My favorite example of a safety feature in Rust which also improves runtime performance is string slices, which are implemented as a pair of a pointer to the first character and the length (used for bounds checking). Not only does this avoid having to scan for the NUL terminator to find the string length (that is, O(1) instead of O(N), which can make the difference between an O(N) and O(N^2) algorithm), but also it allows taking substrings without copying or modifying the original string (also helped by the borrow checker allowing the programmer to safely omit defensive copies of strings).

alcover · on Aug 9, 2022

Made this C lib for exactly the reasons you mention. I love C but got mad at the constant bounds errors and copying.

https://github.com/alcover/buffet

WalterBright · on Aug 9, 2022

C has a huge performance burden from 0 terminated strings. Programs are constantly running strlen() or equivalent to get the length. Length-delineated strings, like what D has, are an order of magnitude faster.

EdSchouten · on Aug 9, 2022

Exactly! And because you can't create a substring without mutating the original value, you see that C code often needs to resort to unnecessary copying as well.

jstimpfle · on Aug 9, 2022

No. You can't create a zero-terminated substring other than a proper suffix or a buffer copy. But that's not really a surprise right?

Well then, don't use zero-terminated strings for proper string processing. You don't have to use zero-termination, even when some programmers in the 70s and 80s were convinced enough of it that abominations like strtok() landed in the standard.

xigoi · on Aug 9, 2022

>You don't have to use zero-termination

You can choose between zero-termination and having to convert strings back and forth when using idiomatic libraries.

jstimpfle · on Aug 10, 2022

Idiomatic is the wrong word here, because it's certainly not idiomatic to do extra allocations when unneeded. Most APIs let you give the length explicitly if it makes any sense. A not very well-known fact is that even printf format strings let you do printf("%.*s\n", 3, "Hello") which only prints "Hel\n". This is in POSIX C, just not sure when it was standardized.

jstimpfle · on Aug 9, 2022

_There are_ programs that are constantly running strlen(). C strings are the default builtin string representation that has an acceptable tradeoff for performance vs space and simplicity for where they are used: Mostly in string literals, which are expected to be small strings. Mostly for printf() and friends. Zero-terminated strings are space efficient and don't allow bike shedding like length-prefixed strings do. And don't get us started about allocation strategies.

"A magnitude faster" for doing what? Typical usages of zero-terminated strings are performance-uncritical. And note that zero-terminated doesn't preclude using separate length fields.

Sane programs use store length of strings explicitly where strings get longer and/or performance is a concern, just as it is the case with other types of arrays.

WalterBright · on Aug 9, 2022

> that has an acceptable tradeoff for performance vs space and simplicity for where they are used

Is it? I've been programming strings for 45 years now. Including on 8 and 10 bit machines. All that space efficiency goes out the window when one wants a subset of a string that isn't a common tail.

The simplicity goes out the window as soon as you want a substring that isn't a common tail. Now you have memory allocation to deal with.

The performance goes out the window because now the entire string contents has to be loaded into the cache to determine its length.

> length-prefixed

Are worse. Which is why I didn't mention them.

> Sane programs use store length

Meaning they become length-delineated programs, except it's done manually, tediously, and error-prone.

Whenever I review C code, the first thing I look at are the strlen/strncpy/str** sequences. It's almost always got a bug in it, an off-by-one error.

jstimpfle · on Aug 9, 2022

Again, I'm not saying you should represent substrings, or strings in general for that matter, as zero terminated strings, and I'm not saying use zero terminated strings for anything longer than a couple bytes.

No, I recommend everyone to use whatever fits the situation best. It might be a 2 byte start index and a 1 byte length fields that expresses the length as a multiple of 12 bytes. It might be rope data structure. Or it might be whatever. "String" is not a super well defined thing, and I don't understand why everybody is so super concerned about a canonical string data type. String data types are for scripting languages. 99% of my usage of string (literals) is just printf and opening files, and C does these just fine.

Zero terminated strings are only a default thing for string literals that does indeed bring a little bit of simplicity and convenience (no need for a builtin string type and the associated bike shedding, and only need to pass a single pointer to functions like printf).

> Meaning they become length-delineated programs, except it's done manually, tediously, and error-prone.

Not sure when is the last time I found it "manually, tediously, and error-prone". There are very rare cases where I have to construct zero-terminated strings from code, or need to strlen() something because of an API. And even when these cases occur they don't bother me at all. Stuff just works for me generally and I'm moving on. I have probably 500 stupid bugs unrelated to string handling before I once forget a zero terminator, and when that one time happens I just fix it and move on. On the plus side, given that we're in C where there are no slice types, zero-terminated strings spare me to pass extra length values for format strings or filepaths.

Sometimes I envision being able to use slices but I have some concerns if that would be an actual improvement. Importantly it should be about arrays and not just about strings. Strings are arrays, they aren't special.

I think a good design for slices could be one whose length can never be accessed by the programmer, but which can be used for automated bounds checks. Keeping size/capacity/offset and 43 cursors into whatever buffers separate is actually correct in my view from a modularization standpoint, because "String <-> Index/Size/Offset etc." isn't a 1:1 relationship.

> Whenever I review C code, the first thing I look at are the strlen/strncpy/str* sequences. It's almost always got a bug in it, an off-by-one error.

You will have to look quite a bit to find strlen() or strncpy() in my code. I'm not advocating for them, and not advocating to build serious string processing on top of zero-terminated strings.

WalterBright · on Aug 9, 2022

D doesn't have a builtin string type. A string in D is an array of characters. All arrays are length delineated.

> You will have to look quite a bit to find strlen() or strncpy() in my code. I'm not advocating for them, and not advocating to build serious string processing on top of zero-terminated strings.

Rolling your own string mechanism is simply not a strength of C. The downside of rolling your own is it is incompatible with everyone else's notion of how to avoid using 0 termination.

jstimpfle · on Aug 9, 2022

I haven't even suggested to roll your own "string" type. Not more than rolling any other type of array or slice. In my programs I normally do not define a "string" type. Not a central one at least. Zero-terminated strings work just fine for the quick printf() or fopen().

Instead, I might have many string-ish types. A type to hold strings in the UI (may include layout information!), a type of string slice that points into some binary buffer, a rope string type to use in my editor, a fixed-size string as part of some message payload, a string-builder string that tries to be fast without imposing a fixed length... Again, there is little point in an "optimized" generic string type for systems programming, because... generic and optimized is a contradiction.

WalterBright · on Aug 9, 2022

Any length delineated string you're using, and you did say you were using length delineation, suffers from the problem of not being compatible with any other C code. There's a good reason operating system API calls tend to use 0 terminated strings.

If you want to do a quick debug printf() on it, well, you could use %.*s, but it's awkward and ugly (I speak from lots of experience). Otherwise, you gotta append the zero.

I'm not a C newbie. I've been programming C for 40 years now. I've written 2 professional C compilers, the most recent one I finished this year. When I started D, a major priority was doing strings a better way, as C ranks among the most inconvenient string processing languages :-)

jstimpfle · on Aug 9, 2022

Sure, I know who you are but I hold opinions too :-)

I don't care about having to provide zero-terminated strings to OS and POSIX APIs, because somehow I almost always have the zero already. Maybe I'm a magician.

Sometimes I have not, but >99% of what I give to printf is actually "text", and that pretty much always has the zero anyway. It's a C convention, you might not like it, but I don't sweat it.

If I want to "print", or rather "write", something other than a zero-terminated string, which is normally "binary data", I use... fwrite() or something analogous.

> C ranks among the most inconvenient string processing languages

I've written my share of parser and interpreters (including also a dysfunctional toy compiler with x64 assembler backend, but doesn't matter here), so I'm not entirely a stranger to this game either.

I find parsing strings in C is extremely _easy_, and I find it in fact easier than say in Python where going through a stream of characters one-by-one feels surprisingly unpythonic.

Writing a robust, human-friendly parser with good error reporting and some nice recovery attributes is on the harder side, but that has nothing to do with C strings. A string input for the average parser isn't even required, you just read char by char, frankly I don't understand what you're doing that is hard about it. It doesn't matter one bit if there's a zero at the end or not.

WalterBright · on Aug 9, 2022

The inconvenience and inefficiency is apparent when building functions to do things like break up a path & filename & extension into components and reassemble them. You wind up, for each function, dealing with 0 termination or length, separately allocated or not, tracking who owns the memory, etc. There's just no satisfying set of choices. Maybe you've found an elegant solution that never does a defensive copy, never leaks memory, etc., but I never have, and I've never seen anyone else manage it, either.

jstimpfle · on Aug 9, 2022

I agree filepath related tasks are ugly. But there are a number of reasons for that that aren't related to zero termination. First, there is syntax & semantics of filepaths. Strings (whatever kind, just thinking about their monoidic structure) are a convenient user interface for specifying filepath constants, but they're annoying to construct from, and disassemble into, filepath components programmatically (relative to how easy I think it should be). Because of complicated syntax and especially semantics of components and paths, there are a lot of pitfalls. Filepath handling is most conveniently done in the shell, where also nobody has any illusion about it being fragile.

Second, you're talking about memory allocation, and this is arguably orthogonal to the string representations we're discussing here. Whether you make a copy or not for example totally depends on your specific situation. The same considerations arise for any array or slice type.

Third, again, you're free to make substrings using pointer + length or whatever, and this is in many cases the best solution. I could even agree that format strings should have better standardized support for explicit length, but it's really not a pain point for me. I'm only stating that zero-terminated is an acceptable default for string literals, and I want to stress this with another example: Last time you were looking at a binary using your editor or pager, how much better has your experience been thanks to NUL terminators? This argument can also extend to runtime debugging somewhat.

WalterBright · on Aug 10, 2022

> memory allocation, and this is arguably orthogonal to the string representations

A substringz cannot be produced from a stringz without doing an allocation.

> you're free to make substrings using pointer + length or whatever, and this is in many cases the best solution

Right, I can. And it's an ongoing nuisance in C to do so, because it doesn't have proper abstractions to build new types with. Even worse, if I switch my stringz to length delimited, and then pass it to fopen() which wants a stringz, I have to convert my length delimited string to stringz even though it is already a stringz. Because my length delimited API has no mechanism to say it also is 0 terminated.

You wind up with two string representations in your code, and then what? Have each string function come in a pair?

Believe me, I've done this stuff, I've thought about it a lot, and there is no happy solution. It annoys me enough that C is just not a tool I want to reach for anymore. I'm just tired of ugly, buggy C string code.

The good news is there is a fix, and I've proposed it, but it gets zero traction:

https://www.digitalmars.com/articles/C-biggest-mistake.html

jstimpfle · on Aug 10, 2022

> You wind up with two string representations in your code, and then what? Have each string function come in a pair?

As said, I don't think this is the end of the world, and I'm likely to add a number of other string representations. While it happens rarely, I don't worry about formatting a temporary string for an API into a temporary before calling it. Because most "string" things are small and dispensable. Zero-terminated strings are the cheap plastic solution that just works for submitting string-literals to printf, and that just works to view directly in a binary. And they're compatible with length delineated in the sense that you can supply a (cheap plastic) zero-terminated string to a (more serious) length delineated API. Also the other way, many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.

> The good news is there is a fix, and I've proposed it, but it gets zero traction

I'm aware of this and I like it ("fat pointers") but I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.

WalterBright · on Aug 10, 2022

> many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.

I'm sorry, I just have to say "no thanks" to that. I don't really want each string function to test the length and run strlen if it isn't there.

By now, the D community has 20 years experience with length as part of the string type. Nobody wants to go back to the C way. It's probably the most unambiguously successful and undisputed feature of D. C code that gets converted to D gets scrubbed of the stringz code, and the result is cleaner and faster.

D still interfaces with C and C strings. The conversion is done as the last step before calling the C function. (There's a clever way to add a 0 that only rarely requires an allocation.) Any C strings returned get immediately converted with the slice idiom:

    string s = p[0 .. strlen(p)];

> I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.

I bet you would like it! (Another problem with a separate length field is there's no obvious connection between it and the string - which is another source of bugs.)

WalterBright · on Aug 10, 2022

> Last time you were looking at a binary using your editor or pager, how much better has your experience been thanks to NUL terminators?

Not perceptibly better. And yeah, I do look at binary dumps now and then, after all, I wrote the code that generates ELF, OMF, MachO, and MSCOFF object file formats, and librarians for them :-)

jstimpfle · on Aug 10, 2022

I wrote simple ELF and PE/COFF writers too, but independently of that, zero terminators are what lets you find strings in a binary. And what allows the "strings" program to function. It simply couldn't work with without those terminators.

Similarly, the text we're exchanging consists of words and sentences that are terminated using not zero bytes, but other terminators. I'm very happy that they're not length delineated.

WalterBright · on Aug 10, 2022

> It simply couldn't work with without those terminators.

Yeah, it will. For a related example, I use `grep` all the time to find strings in source code. Source code is not 0 terminated. It works fine.

jstimpfle · on Aug 10, 2022

I use "grep -w foo" (or something like "grep '\<foo\>'"), because when I look for "foo" I don't want "bazfoobar". grep -w only works because the end of words is signaled in-band (surrounding / terminating words with whitespace).

kaba0 · on Aug 10, 2022

Zero-terminated strings was a bad decision even back then, let alone now. They make vectorization very painful, and you just needlessly have to iterate over strings at every use-site.

jstimpfle · on Aug 10, 2022

Except nobody cares about vectorization of your printf("Hello, World\n") or other 12-character strings. Vectorization here would in fact be a waste of build time as well as object output size, and the runtime performance would be not measureably different, possibly even slower in some cases. It's a total waste.

When you're processing actual buffers full of text or binary data, and performance matters, of course you are not advised to use an in-band signaled sentinel like zero-terminator is. Use an explicit length for those cases.

pjmlp · on Aug 9, 2022

Additionally, back in the 8 and 16 bit home computer days, C programms were full of inline Assembly, because C compilers generated garbage machine code.

It was 40 years of throwing UB optimizations at it, that made its current performance come true.

This is nothing special, any language can eventually reach that reality with similar investment.

logicchains · on Aug 9, 2022

Rust does runtime bounds checks on array access, which the compiler can't elide in non-trivial cases (e.g. binary search), so if you want to write such algorithms as fast as in C, you need to use a lot of unsafe array indexing.

kaba0 · on Aug 10, 2022

I would be interested in how big of a difference does it make — these branches will always go one way, so they are trivial to branch predict. Other than very slightly increasing the code size (which can have a big effect if the i cache is no longer sufficient), I fail to see it doing too badly as is.

bitcharmer · on Aug 9, 2022

You might be referring to slices. Arrays in rust have to have a known length at compile time.

masklinn · on Aug 9, 2022

Rustc may still need to do runtime bounds check if it can’t conclude that a dynamically computed index is in-bounds.

The length is known statically, but whether the index is in-bounds may not be.

The binary search GP talks about is exactly one such case, go on Godbolt, write up a simple binary search (with a static size so you get code), and you’ll see that the compiler checks against the literal array size on every iteration.

bitcharmer · on Aug 9, 2022

I have no idea how I totally missed that from the GP's comment. Thanks for correcting me.

chaeronanaut · on Aug 9, 2022

this pretty much summarises my opinion - one nitpick - i assume you meant "omit bounds and other checks", not "emit bounds and other checks" which seems to mean the opposite of what you're intending

zozbot234 · on Aug 9, 2022

Rust does emit bounds and other checks, though. Optimization passes can usually clear some of them away, but you'd need to check the assembly output to be sure.

kitd · on Aug 9, 2022

Yes, that's "omit".

"Emit" means "to send out", eg "emit a strange noise", "emit radiation".

Shish2k · on Aug 9, 2022

It is specifically both:

- trying to access an arbitrary element in a slice, the compiler will emit bounds checks (`if index > len: panic()`) to avoid an uncontrolled out-of-bounds memory access — https://godbolt.org/z/cbY5ebzvK (note how if you comment out the assert, the code barely changes, because the compiler is adding an invisible assert of its own)

- if the compiler can infer that `index` will always be less than `len`, then it will omit the bounds check — https://godbolt.org/z/TTashYnjd

woodruffw · on Aug 9, 2022

Yes, thank you! That's an embarrassing typo!

(And thanks to the other person as well, who presumably deleted the same comment after seeing yours.)

YetAnotherNick · on Aug 9, 2022

But not always. Some algorithms cannot be written in a way that would remove run time bound checking from rust e.g. KMP.

mlindner · on Aug 9, 2022

There are in fact ways that idiomatic Rust can be even faster than idiomatic C.

pkrumins · on Aug 9, 2022

[flagged]

dang · on Aug 9, 2022

Would you please stop trolling HN? You've been here for 15 years. You have a distinguished history of writing good articles. You didn't use to post crap comments like this or https://news.ycombinator.com/item?id=32387218 and we need you to stop it. If you don't we will have to ban you, which I would hate to do.

If you have some substantive critique to make about Rust in appropriate contexts, that's fine of course. But this sort of programming language flamewar is definitely not fine. It leads to lame, dumb threads that we're trying to avoid here.

https://news.ycombinator.com/newsguidelines.html

pkrumins · on Aug 13, 2022

Dang, I apologize for these comments. Every time I tried Rust, I just couldn’t get anything done in a reasonable amount of time, and I started hating this language. I will refrain from commenting on Rust posts from now on.

dang · on Aug 13, 2022

Thanks! I appreciate the reply.

hobofan · on Aug 9, 2022

Why do you keep posting this? One would think by the 10th heavily downvoted comment in the same vein you would have figured out by now that the flaming is neither clever nor appreciated.