Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The original ANSI C committee had no idea about modern optimization pipelines. If people had continually pushed back against undefined behavior back then, there's a good chance that by 2017 the result would have been that C would be dead, replaced by a language that allows for modern optimization techniques.


As others have said, if C were dead by now that would be great.

But I'm pretty skeptical, to be honest. I've been working on one or another high-performance C or C++ program for most of the past 20 years. I can't ever remember getting a really substantial speed improvement from upgrading the compiler for an important platform, because anything the old compiler did badly that really hurt program performance had already been avoided, either before or after seeing it in profiling results. I'm sure that if you took C code developed on a modern compiler and compiled it with a 90s compiler, it would be slower. But I doubt the software ecosystem would actually be drastically slower if optimizer technology hadn't advanced significantly since 1995, and everything had been developed under that constraint. And I don't think every single advance in optimization since 1995 is dependent on degenerate transformations of undefined behavior.


You're the last person I'd have expected to make that sound like a bad thing.

While benchmark games played a part in the modern ‘undefined behavior’, I'm not so sure that it would made much difference to adoption of the language. Consider Linus Torvald's well-known rant as a point in the opposite direction.

In the universe where ‘undefined behaviour’ had been clearly specified as ‘what you write is what you get’, C might have gone to consistent use of optimization-enabling annotations, following ‘const’ (and later ‘restrict’). Along those lines, ‘volatile’ was ANSI's other big mistake, as creating it broke existing code; I now think that should have been the default, with fragile-optimizable status being explicitly indicated by extending the use of ‘register’ keyword.


It's not just benchmark games. It's people's real-world code.

Look at how often folks on HN complain that the Web platform is useless because JS is slow. C without optimizations is just as slow if not slower. Compiler optimizations are so good that people don't realize how much they rely on them.

Linus was wrong when he complained about the compiler treating null dereference as undefined behavior. This is a very important optimization, because it allows the compiler to omit a lot of useless code in libraries. Often times, the easiest way to prove that "obviously" dead code can never be executed is to observe that the only way it could be executed is for null to be dereferenced.

Opt-in optimization keywords wouldn't scale to the sheer number of optimizations a modern compiler framework performs. Restrict hasn't been a success because it's too subtle of an invariant. It's the kind of thing compiler authors, not application developers, should be thinking about.


This is an important point, and it's why the "C should die" crowd is hard to take seriously. They've even started labeling people who use C as somehow morally suspect, as if we're bad people for choosing to use an unsafe language. We're knowingly putting people in danger! Right.

It's strange that the word "unsafe" has tainted people's thoughts so dramatically. Like calling torrenting music "piracy."


I'm not endorsing C. Don't use C for anything you need to be secure.


I need Emacs to be secure. It's written in C. It interfaces with the internet.

Ditto for Bitcoin. It's the basis of a new financial system. The core software is written in C++.

Same for Linux. C.

Prejudice generally isn't helpful, and it's a bit strange that you can recognize C's merits while also decrying it.


I haven't been "recognizing C's merits" either. In fact, the real reason behind this problem is that C is not type safe, so optimizations (such as the one in this very article!) that are perfectly fine to do in type-safe languages are not possible to do in C without aggressively "exploiting" undefined behavior.


If one don't follow the High Integrety, CERT, MISRA standards, validated with tools like LDRA.

Or at very minimumm compiling with warning enabled as errors, with a continuous build breaking on static analysers errors, then yes.


Look at how often folks on HN complain that the Web platform is useless because JS is slow.

I haven't actually seen anyone complaining about that. Do you have any links?

There are some specific complaints like: JS can't do 64-bit arithmetic or SIMD; but that's only really needed for games and scientific computing, which don't need to use JS. Or that JS is single-threaded; that's a fundamental feature of its design, nothing to do with optimisation.

C without optimisations is just as slow if not slower.

Nobody's talking about taking away all optimisations, just not trying to do extreme optimisations that exploit undefined behavior (or rather, assume it can never occur).

Plenty of C compilers worked that way in the 90s and performance was perfectly acceptable (on hardware with a fraction of the speed and memory of today's computers and phones).

Modern C++ probably relies on a higher level of optimisation, but that's another story.


> Nobody's talking about taking away all optimisations, just not trying to do extreme optimisations that exploit undefined behavior (or rather, assume it can never occur).

Those "extreme" optimizations are usually just surprising behavior that emerges from perfectly reasonable optimizations. For example, assuming that code that follows null dereference is dead is important.

> Plenty of C compilers worked that way in the 90s and performance was perfectly acceptable (on hardware with a fraction of the speed and memory of today's computers and phones).

You can get that experience with GCC -O0. Do you want to run code like that? I don't, and neither do customers.

People who don't work on modern compilers often think that there is a subset of "simple" optimizations that catches "obvious" things, and optimizations beyond that are weird esoteric things nobody cares about. That isn't how it works. Tons of "obvious" optimizations require assumptions about undefined behavior.


If I could get that on -O1 or -O2, or maybe -Os, I would very happily do so. (But I don't actually know what optimisations those entail without poring over the manuals.)

You're implying that 90s compilers had no optimisations, which is incorrect.

Why is this a hot topic now, and why was it not a hot topic 10 or 15 years ago?

I suggest that something changed in the interim, and that what changed is the addition of dangerous optimisations. I'm not sure where it all started but strict aliasing in GCC is a potential candidate.

As others have pointed out, GCC and Clang seem to have by far the most horror stories, even though they don't actually generate the fastest code. I imagine that's mostly because GCC and Clang are so widely used, though.


I don't see how it's an optimization to assume, at compile time, that a static pointer that is always null is actualy aiming at the function NeverCalled. Why not pick some other function, like one which prints a diagnostic and calls abort?


It's not aiming at the function NeverCalled; it's aiming at EraseAll.

As the article states, if a local static function has exactly one assignment to it, then it can be an important optimization to assume that it will always have that value. Imagine that it's some kind of "DebugPrintf(...)" function that, in release builds, is always set to a no-op that does nothing before being called. You would definitely want that indirect function call to be inlined to nothing in release.


For debug functions that completely disappear in release builds, we have inline functions with conditionally empty bodies or old-school macros.

It is a (decades ago) Solved Problem.


But sometimes these checks just seem to end up entirely removed, and that is just not OK: I have been a developer working on performance constrained system software in low-level programming languages (including heavily optimized games written in C++) and this undefined behavior idea has gone way way too far. I can always make code faster by removing checks I don't need manually: trying to compare the small gains here with "let's just use node.js lol" is dishonest.

C++ states stuff like references must not be NULL and "this" must not be NULL, but in the real world it is possible for a NULL pointer to be dereferenced into a reference and for a method to be called on that reference and for the method to complete execution without the app crashing. Yet, some C++ compilers are now insisting that "this == NULL" checks (which is the most hilarious case, but simple "&ref == NULL" are the same) and all the dependent code be entirely removed, hamstringing runtime safety and sanity checks.

What works for me is when the compiler says "for this to happen the code would have had to crash"; but what does not work for me is "for this to happen the code would have to be violating the specification" as the entire point of NULL checks in a program was always to check for invalid execution and mitigate its effects :/ and yet since this code has never crashed on any reasonable C++ compiler the only way to check for it is to add comparisons that are now being removed under some misguided assumption that the code would fail at runtime.


These optimization opportunities aren't small gains. They have big consequences, for example when they cause code that would not be vectorized to be vectorized. Again, compiler authors don't add UB optimization for the fun of it. Patches to add theoretical optimizations that don't actually move the needle are routinely rejected from LLVM and GCC (as they should, because optimizations slow down compilation, so they need to pull their weight). Rather, they add UB optimization when code is shown to benefit, often the code that people come to their bug trackers with complaining that it doesn't optimize to what they expect.


If there were a simple and reliable way to say "make the program crash if we hit any of this UB, rather than optimising it completely away" I think that would make a lot of people happy.


LLVM already does this as much as possible. Look at how Clang inserts "ud2" instructions in dead code paths.


Is there a way to use that to address the "NeverCalled" example?

I feel like there's a huge disconnect here. Even after the strange behavior is explained, some people say "wow, I never ever want that behavior, how do I reliably avoid it?" but others respond "there's no problem, you're just using it wrong".

Is there really no way to satisfy both sides?


The proper place to put those checks is before the undefined behavior would be invoked, not after.


What language do you think would have replaced it?

It seems to me that C had already won as the de facto systems language, long before any of these "modern optimisation techniques" cropped up.

Optimisations that make it harder to use the language safely are downright dangerous in my book.


I would have been happy with Modula-2, Ada or Object Pascal as basis.

But better yet would have been Modula-3 or Active Oberon.


I hear very good things about Ada, aside from its unfriendly old-fashioned syntax.


Some of us do enjoy verbose explicit syntax instead of hieroglyphs. :)

Quite helpful when maintaining unknown code in big corp projects.


It would have been much more prudent if the committee defined the behavior in a way amenable to optimization, rather than asking for a blank check.


They did. "Undefined" doesn't mean "unconsidered by the committee", it means "do what you must for optimization".


And would that be such a bad thing?

How come we have languages like Rust that achieve almost the same speed while maintaining dramatically better safety, anyway?


Because type safety makes a lot of optimizations sound. C and C++ have to use undefined behavior rules to achieve a lot of optimizations that type safe languages can more easily perform. In fact, the optimization that the article is complaining about is really only a problem because C is not type safe.


So that's my point. If C was those few percentage points slower, and died, then we'd have better languages sooner.


Sounds like win-win to me!

C would be dead.

We will have a sound low-level language suitable for optimization.

Now, if we throw in C++ eradrication...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: