I'm not asking for the compiler to consistently identify undefined behavior, onl...

mjn · on June 29, 2015

I agree in cases where the compiler knows that undefined behavior is taking place. A lot of the silent optimizations LLVM and GCC make are in cases where the compiler isn't really sure it has identified undefined behavior, though.

To put it in classical logic terminology, one case is modus ponens reasoning. Undefined behavior implies the compiler can do whatever it wants. The compiler finds undefined behavior. Therefore it does whatever it wants. This is the case where it'd be better for the compiler to error out than do something nutty.

But many of the optimizations are doing modus tollens reasoning. If X were true, then the program would perform undefined behavior. Conforming programs do not perform undefined behavior. Therefore NOT-X must hold in conforming programs, and this fact can be used in optimizations.

kazinator · on June 29, 2015

> If the compiler was able to determine that there is a code path whereby j might be undefined, should it be allowed to remove the body of the loop, even in those cases where the programmer knows by other means that "j >= 1"?

No; rather, the correct logic is that compiler must preserve the body of the loop if there exists the possibility that it can be reached by a valid code path without any undefined behavior (j is defined, and so forth). Only if the compiler can prove that no well-defined execution path can reach the body can it remove it.

(A bad idea to do without any warning, though. If undefined behavior is confirmed, it should be diagnosed.)

lambda · on June 29, 2015

> only to have a mode where it refuses to silently make 'optimizations' when it does identify UB.

That's not always possible. It's not that it makes the optimization when it identifies UB. It's that it makes an optimization that is valid to make if UB doesn't occur, but if UB were to occur then that optimization could cause all kinds of unexpected problems. But the compiler can't necessarily identify those cases.

Please read the "what every C programmer should know about undefined behavior" series of articles from LLVM; they describe the reason why they can't, in general, provide warnings or errors for these cases in which optimizations rely on lack of undefined behavior:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

The third article describes why the compiler can't, in general, warn about those cases in which it's relying on lack of UB, but you should read the first two as well.

Note that for some of those cases, clang and GCC have recently added undefined behavior sanitizers, invoked via "-fsanitize=undefined", which can help even more than the warnings they can add. However, what they do is add extra instrumentation to the executable, and then either log a warning or crash when you hit undefined behavior. The runtime aspect helps avoid the "getting this right would involve solving the halting problem" aspect of why they can't, in general, provide appropriate warnings, but it does mean that this is generally only appropriate in test builds, and that you will only find the undefined behavior that you can trigger during test, while there may be more hiding that only show up in obscure circumstances.

If you really don't want undefined behavior, it's best to use a language, like Rust, which does not have any undefined behavior (outside of "unsafe" blocks). The problem with any kind of warnings that are tacked on after the design of the language is that you are either going to get lots of false positives, lots of false negatives, or both. With a language that is designed not to allow undefined behavior, you know that if the code compiles, it doesn't invoke UB.