Hacker News new | past | comments | ask | show | jobs | submit login

The C standard is clear on the issue

The problem is succinctly summarised by the saying "In theory, there is no difference between theory and practice. In practice, there is." Unfortunately what most programmers think of as "C" is subtly different from how the standard defines it. In other words, perhaps we should fix compilers (and eventually the standard) to match reality instead of the other way around.

The standard even states in the section on undefined behaviour that "behaving during translation or program execution in a documented manner characteristic of the environment" --- exactly what C programmers are usually expecting from UB --- is a possible choice.




Yes. People are quick to point to the standard when it comes to compilers doing something surprising, and quick to use this as evidence that programmers are silly to expect the expected. But if compilers did exactly what these programmers expect, they could point to the standard just the same...

(At least this article provides a good explanation of advantages to be had from being aggressively pedantic about UB. Many by way of evidence just wave their hands and claim people are stupid.)


> (At least this article provides a good explanation of advantages to be had from being aggressively pedantic about UB. Many by way of evidence just wave their hands and claim people are stupid.)

This. While it doesn't show how much real programs benefit from these UB assumptions (or how many new security bugs are introduced...), at least it has some explanations. Still, you have to ask yourself whether the compiler is a tool to help you do your work, or a perverse, pedantic, and frustrating adversary. The ultimate "UB optimizer" would do its best to find tiny corners of undefined behavior in your program, then replace the whole thing with "exit(0)".


That would be an interesting exercises. Of course, the whole thing is only interesting if it comes with a proof of justification. Otherwise it's too easy for real world C programs: they are all undefined.


If you don't go by the spec, what do you go by? Intuition? What if yours is different from someone else's?

In this case as well, acknowledging that signed integer overflow is undefined behavior allows for more consistent behavior in certain circumstances. For instance, the compiler knows that `x < x + c`, where `c > 0`, and optimize it to `true`. Even if you don't care about the optimization, is it not equally surprising that such a statement could ever be `false`?

No matter which way you go, some behavior is going to be surprising. And in these scenarios, the only plausible thing to do so is to follow the standard, so at least there's a consistent ideal that everyone's intuitions can converge to, which is crucially important. Each compiler implementing a different C-alike standard just leads to more confusion.


If you don't go by the spec, what do you go by?

To quote the standard, "a documented manner characteristic of the environment."

If the machine is two's complement (which is the vast majority of them), then you'd expect that behaviour. If it's sign-magnitude or one's complement or saturating, then the behaviour would be different but still consistent.

Even if you don't care about the optimization, is it not equally surprising that such a statement could ever be `false`?

Overflows wrapping around is not surprising, it's what decades of computing hardware (or centuries if you include mechanical calculators like https://en.wikipedia.org/wiki/Pascal%27s_Calculator ) have always done.


> If the machine is two's complement (which is the vast majority of them), then you'd expect that behaviour. If it's sign-magnitude or one's complement or saturating, then the behaviour would be different but still consistent.

How is this different in practice from undefined behavior, other than the compiler can't take advantage of it for optimizations?


Maybe a better way to illustrate the surprising behavior is that

if (x < x + c) { // block }

if (0 < c) { // block }

are not equivalent statements if you add overflow wrapping into the spec. What if the processor your application is running on doesn't do two's compliment arithmetic? Does gcc need to emulate this behavior to comply with the spec?


Except that different programmers subtly different flavour of C is different from others. For a handful of features it may work but in general I doubt it.


> In other words, perhaps we should fix compilers (and eventually the standard) to match reality instead of the other way around.

Without an analysis of the performance loss from removing this optimization, that would be irresponsible.

The authors of GCC were not stupid. They understood that these optimizations can be important for performance, and that's why they implemented them. When this came up on the mailing list several years ago, Ian Lance Taylor found several places this optimization was kicking in in GCC itself.

It is not OK to just randomly avoid doing optimizations allowed by the C standard because they might trigger bugs. C depends on undefined behavior for performance. You depend on these compiler optimizations to get good performance out of the programs you run every day.


It is not OK to just randomly avoid doing optimizations allowed by the C standard because they might trigger bugs. C depends on undefined behavior for performance. You depend on these compiler optimizations to get good performance out of the programs you run every day.

Geoff offers a simple succinct example of optimizing based on undefined behavior in another comment in this thread: https://news.ycombinator.com/item?id=11147068

I tested with GCC, Clang, MSVC, and ICC, and found that only GCC removed the checks and optimized the function down to a constant. Clang, MSVC, and ICC instead generated code that matched the programmer's clear intent.

Are you sure they would be better compilers if they matched GCC's behavior here? I agree the GCC authors are far from stupid, but I think there might be a difference in philosophy here rather than just a missed opportunity for optimization.


> Are you sure they would be better compilers if they matched GCC's behavior here?

I don't know. Measure it on SPECINT or something.


My point is that we are talking about basic types in C. Something you should learn in your first few hours with the language. It's not some arcane thing like what happens in the pre-processor if you nest things or what happens in some rarely used library function in some never occurring edge case.


In my experience most people don't get exposed to underflows and overflows when learning programming. It's not until they hit a bug that they actually find out overflows and underflows exist. Then, in my experience they don't go read the spec. They ask a co-worker or google it. They may or may not end up with the correct answer vs an answer that works at the moment.

I've used C since the mid 80s and C++ since the late 90s. Overflow for me always worked as a normal 2's compliment overflow on every project from 1985 to 2011 so I had no reason to question that it might be undefined.... until it was. Even more interesting is when asking a mailing list of 200+ C++ experts, the answers for how to actually detect overflow where all over the place. Meaning no one really knew for sure the spec correct way to check even though they were all C++ guys with +15 years experience

There's all kinds of details to languages that are buried in a spec but are not something that comes up "in your first few hours with the language".


The problem with the C standard is that most programmers treat C as portable assembler whereas from the beginning the group that wrote the standard tried to make it a proper, abstract high-level language.

So we got an enormous disconnect between what programmers expect and what the language really offers.

For a long time, compilers would side with the programmers making sure that optimisations would not break common idiom.

A number of years gcc left that path. So now, for system code, certainly if the code has to be secure, it is better to avoid gcc.

The C standard is in some areas extremely complex, and it doesn't make sense to expect all programmers to completely understand what is essentially a broken standard.


Perhaps we should even leave C as a language for system code. (System code has to be correct and secure first. Optimization for speed comes second, and is only needed for a few small, very hot code paths.)


Relevant article expressing the same sentiment: http://blog.metaobject.com/2014/04/cc-osmartass.html




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: