> Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?
Simple: undefined behavior makes all physically possible behaviors permissible.
In reality though, such an elimination would only be correct if the compiler was able to prove that the function is ever called with NULL, and if the compiler is smart enough to do that, hopefully the compiler writers are not A-holes and will warn instead of playing silly-buggers.
This is a kind of Department of Motor Vehicles Bureaucrat thinking. For example, there as many thousands of lines of C code that reference *0, which is a perfectly good address in some environments. One should be able to depend on compilers following expressed intentions of the programmer and not making silly deductions based on counter-factual assumptions.
> This is a kind of Department of Motor Vehicles Bureaucrat thinking.
Sorry, but modern compilers are basically automatic theorem provers. They'll use whatever means necessary to get every last drop of performance. If you play cowboy with them you'll just get hurt.
> For example, there as many thousands of lines of C code that reference *0, which is a perfectly good address in some environments.
It's permissible for a particular platform to define behaviors that the standard has left undefined. If you try to take that code and run it elsewhere, that's your problem.
If you want to phrase it like that, a compiler tries to prove conjectures and performs actions (usually code and data elimination) based on whether it can prove them or their negatives. Sometimes it can't do either.
It's easy to see the problem in the null pointer case. The compiler deduction is that the null test is redundant, but it's not actually redundant. Therefore the compiler "proves" a false theorem. That the standards rules permit the compiler to deduce false things would be, in normal engineering analysis, considered to show a failure in the rules, but the bureaucratic mindset holds that the rules are always, by definition, correct, so the failure is the fault of the miscreant who neglected to staple cover sheet properly.
If the compiler is unable to prove a transformation preserves correctness, it should not do the transformation.
To your point below: The compiler is definitely not "forced" to assume that the pointer is not null - that is a choice made by the compiler writers. Even the ridiculous standard does not require the compiler writer to make that assumption. The compiler can simply compile the code as written - or, if it is smart enough to see the problem - it can produce a warning.
> Therefore the compiler "proves" a false theorem.
In the axiomatic system implied by the standard, the hypothetical compiler being discussed can prove that the null check can be eliminated. The fact that you believe this axiomatic system is inconvenient does not constitute a refutation of the truth of the theorem.
> If the compiler is unable to prove a transformation preserves correctness, it should not do the transformation.
Actually, the compiler is able to prove the invariance of correctness. Eliminating a null check after a pointer has been dereferenced does in fact preserve the correctness. Either the function is never called with NULL, and the program is correct, or the function is sometimes called with NULL, and the program is incorrect.
> the bureaucratic mindset holds that the rules are always, by definition, correct
Since you think that the compiler following the standard rigorously is "bureaucratic" and, I imagine, bad, it follows that you would prefer the compiler to sometimes ignore what the standard says and do something different. I suggest that you try compiling your code with a compiler for a different language. Compilers for languages other than C are guaranteed to not follow the C standard.
EDIT: I think I see where you're going. Your argument is that, since in some platforms NULL can be correctly dereferenced, if the compiler was to eliminate the null check, that would change the behavior of the program. If a compiler for that platform did that, I would agree that it would not be preserving correctness. A compiler for a given platform is only required to generate correct code for that particular platform. Compilers for platforms where dereferencing a null pointer is always invalid can correctly eliminate that late null check.
The compiler didn't create a security bug by removing the null check. The bug was created by the programmer when he didn't check for null before dereferencing the pointer. Even with the check, the program contained a bug.
The compiler converted a buggy program that was prevented from opening a security hole by defense in depth into a program with a security hole. It transformed a careless error into a systemic error, all in the cause of a micro-optimization that didn't.
In the referenced case the introduced error involved a reference to a null pointer but there was still no exploitable security hole. The exploit was enabled when the compiler removed an explicit check. The null dereference was an error, but it was not a security issue on its own.
> The compiler deduction is that the null test is redundant, but it's not actually redundant.
No. The compiler is forced to assume x isn't null, because int y = *x; has no meaning iff x IS null, so the compiler can't possibly generate any code to cover that case. There's no definition of that construct for the compiler to work off of that could possibly allow x to be null.
Blame the standard if you want, but you can't blame the compiler for not generating code to cover behaviour that you've made up in your head.
I didn't make it up in my head - I observed it in working code. Widely used working code. And the compiler is not at all forced to assume x is not null. The standard leaves it to the compiler writer to handle the case. Could the compiler perform sophisticated static analysis and reject the code under the standard? Yes. Could the compiler simply compile the code as written ? Yes. Could the compiler abuse the vagueness of the standard to produce an "optimization" that implements something the programmer specifically did not implement? I suppose. But that's poor engineering.
> the bureaucratic mindset holds that the rules are always, by definition, correct
It has long been known that you can get better optimization if you disregard correctness ;)
C compiler writers know that the assumption "pointer is not null because it was dereferenced" doesn't hold in general. C compiler writers know that they're performing incorrect optimizations.
The bureaucracy now doesn't tell them that the transformation is correct. It tells them it is fine to be incorrect because UB.
The bureaucracy gives them a "license to kill" for the greater good.
(What is the greater good, you ask? Can't answer that; ask a bureaucrat.)
If you don't like the "ridiculous" standard, maybe you shouldn't be writing the language that it defines. There are plenty of discussions online what parts of the standard should be changed to get a "friendly C" [1], unfortunately there is no consensus that could be implemented.
In order to do that, the standard would have to define that dereferencing a null pointer must produce a deterministic behavior. There are only two possible behaviors:
1. The program successfully reads/writes that memory location and retrieves/overwrites whatever is there without crashing. Then the program can continue on and execute the if even if the pointer was null.
2. The program crashes immediate whenever a null pointer is read/written.
#1 is problematic, because NULL is a single pointer value that can be applied to a variety of pointer types. What happens if you first write to (long )NULL and then read from (FILE )NULL?
#2 is very useful, and most platforms already crash any program that tries to read or write NULL. But if the standard requires this behavior, then this introduces an even stronger guarantee that a dereferenced pointer is not null, so there's no reason to remove that optimization.
C is not Haskell or Java. The C programmer may intend to interact with actual hardware and is not required to interact with some abstract machine. The standard can reflect this or it can attempt to convert C into a poorly designed high level language. Dereferencing the null pointer should be implementation dependent, but the compiler should be required to either detect and flag this as an error or compile it into the machine operations indicated. The actual execution in the second case may depend on the environment.
Sorry, but you are just wrong. The C standard does define an abstract machine.
> but the compiler should be required to either detect and flag [dereferencing the null pointer] as an error
How could the compiler detect at compile time the value of a run time variable? Sure, some instances might be detectable, but those are the extreme minority. Static analysis tools such as Clang are already capable of finding those compile time NULLs.
> or compile it into the machine operations indicated. The actual execution in the second case may depend on the environment.
Which is exactly what's done now. In most platforms accessing NULL causes a crash, so either the pointer is not null and the program doesn't crash, so the check in redundant; or the pointer is null and the does crash, so the check is never executed.
This battle has been fought and lost. If you require sensible behaviour, just move on and use a language that offers it. C compilers will do what makes them look good on benchmarks, and various "friendly C" efforts have been tried and failed.
You can if, for example, the function is static. Also there could be a link-time optimization pass. The linker can see all calls to that function, unless the function is exported.
Simple: undefined behavior makes all physically possible behaviors permissible.
In reality though, such an elimination would only be correct if the compiler was able to prove that the function is ever called with NULL, and if the compiler is smart enough to do that, hopefully the compiler writers are not A-holes and will warn instead of playing silly-buggers.