> Under what set of logic does being able to de-reference a pointer confer that ...

vyodaiken · on Feb 14, 2017

This is a kind of Department of Motor Vehicles Bureaucrat thinking. For example, there as many thousands of lines of C code that reference *0, which is a perfectly good address in some environments. One should be able to depend on compilers following expressed intentions of the programmer and not making silly deductions based on counter-factual assumptions.

E6300 · on Feb 14, 2017

> This is a kind of Department of Motor Vehicles Bureaucrat thinking.

Sorry, but modern compilers are basically automatic theorem provers. They'll use whatever means necessary to get every last drop of performance. If you play cowboy with them you'll just get hurt.

> For example, there as many thousands of lines of C code that reference *0, which is a perfectly good address in some environments.

It's permissible for a particular platform to define behaviors that the standard has left undefined. If you try to take that code and run it elsewhere, that's your problem.

vyodaiken · on Feb 14, 2017

If you are going to do theorem proving, you should try to only prove true theorems.

E6300 · on Feb 14, 2017

If you want to phrase it like that, a compiler tries to prove conjectures and performs actions (usually code and data elimination) based on whether it can prove them or their negatives. Sometimes it can't do either.

I don't see where you're going, though.

vyodaiken · on Feb 14, 2017

It's easy to see the problem in the null pointer case. The compiler deduction is that the null test is redundant, but it's not actually redundant. Therefore the compiler "proves" a false theorem. That the standards rules permit the compiler to deduce false things would be, in normal engineering analysis, considered to show a failure in the rules, but the bureaucratic mindset holds that the rules are always, by definition, correct, so the failure is the fault of the miscreant who neglected to staple cover sheet properly.

If the compiler is unable to prove a transformation preserves correctness, it should not do the transformation.

To your point below: The compiler is definitely not "forced" to assume that the pointer is not null - that is a choice made by the compiler writers. Even the ridiculous standard does not require the compiler writer to make that assumption. The compiler can simply compile the code as written - or, if it is smart enough to see the problem - it can produce a warning.

E6300 · on Feb 14, 2017

> Therefore the compiler "proves" a false theorem.

In the axiomatic system implied by the standard, the hypothetical compiler being discussed can prove that the null check can be eliminated. The fact that you believe this axiomatic system is inconvenient does not constitute a refutation of the truth of the theorem.

> If the compiler is unable to prove a transformation preserves correctness, it should not do the transformation.

Actually, the compiler is able to prove the invariance of correctness. Eliminating a null check after a pointer has been dereferenced does in fact preserve the correctness. Either the function is never called with NULL, and the program is correct, or the function is sometimes called with NULL, and the program is incorrect.

> the bureaucratic mindset holds that the rules are always, by definition, correct

Since you think that the compiler following the standard rigorously is "bureaucratic" and, I imagine, bad, it follows that you would prefer the compiler to sometimes ignore what the standard says and do something different. I suggest that you try compiling your code with a compiler for a different language. Compilers for languages other than C are guaranteed to not follow the C standard.

EDIT: I think I see where you're going. Your argument is that, since in some platforms NULL can be correctly dereferenced, if the compiler was to eliminate the null check, that would change the behavior of the program. If a compiler for that platform did that, I would agree that it would not be preserving correctness. A compiler for a given platform is only required to generate correct code for that particular platform. Compilers for platforms where dereferencing a null pointer is always invalid can correctly eliminate that late null check.

vyodaiken · on Feb 14, 2017

the post that brought this to my attention discussed how a security error in Linux was created by this "optimization".

E6300 · on Feb 14, 2017

This? http://blog.regehr.org/archives/970

The compiler didn't create a security bug by removing the null check. The bug was created by the programmer when he didn't check for null before dereferencing the pointer. Even with the check, the program contained a bug.

vyodaiken · on Feb 14, 2017

The compiler converted a buggy program that was prevented from opening a security hole by defense in depth into a program with a security hole. It transformed a careless error into a systemic error, all in the cause of a micro-optimization that didn't.

E6300 · on Feb 14, 2017

What are you talking about? Dereferencing invalid memory is a security bug.

vyodaiken · on Feb 14, 2017

Not necessarily.

E6300 · on Feb 14, 2017

In what cases it's not?

vyodaiken · on Feb 14, 2017

In the referenced case the introduced error involved a reference to a null pointer but there was still no exploitable security hole. The exploit was enabled when the compiler removed an explicit check. The null dereference was an error, but it was not a security issue on its own.

E6300 · on Feb 14, 2017

Why didn't the kernel panic when it tried to access NULL?

vyodaiken · on Feb 14, 2017

Why is a compiler writer attempting to legislate the kernel address space?

E6300 · on Feb 15, 2017

Give me a break. The kernel developers know C. They know what "undefined behavior" means.

msbarnett · on Feb 14, 2017

> The compiler deduction is that the null test is redundant, but it's not actually redundant.

No. The compiler is forced to assume x isn't null, because int y = *x; has no meaning iff x IS null, so the compiler can't possibly generate any code to cover that case. There's no definition of that construct for the compiler to work off of that could possibly allow x to be null.

Blame the standard if you want, but you can't blame the compiler for not generating code to cover behaviour that you've made up in your head.

cronjobber · on Feb 14, 2017

Nope, "undefined" leaves the compiler perfectly free to just Do The Right Thing.

vyodaiken · on Feb 14, 2017

I didn't make it up in my head - I observed it in working code. Widely used working code. And the compiler is not at all forced to assume x is not null. The standard leaves it to the compiler writer to handle the case. Could the compiler perform sophisticated static analysis and reject the code under the standard? Yes. Could the compiler simply compile the code as written ? Yes. Could the compiler abuse the vagueness of the standard to produce an "optimization" that implements something the programmer specifically did not implement? I suppose. But that's poor engineering.

cronjobber · on Feb 14, 2017

> the bureaucratic mindset holds that the rules are always, by definition, correct

It has long been known that you can get better optimization if you disregard correctness ;)

C compiler writers know that the assumption "pointer is not null because it was dereferenced" doesn't hold in general. C compiler writers know that they're performing incorrect optimizations.

The bureaucracy now doesn't tell them that the transformation is correct. It tells them it is fine to be incorrect because UB.

The bureaucracy gives them a "license to kill" for the greater good.

(What is the greater good, you ask? Can't answer that; ask a bureaucrat.)

adrianN · on Feb 14, 2017

If you don't like the "ridiculous" standard, maybe you shouldn't be writing the language that it defines. There are plenty of discussions online what parts of the standard should be changed to get a "friendly C" [1], unfortunately there is no consensus that could be implemented.

[1] http://blog.regehr.org/archives/1287

vyodaiken · on Feb 14, 2017

My prediction is that the standard will eventually follow the change in practice and eliminate that compiler "optimization".

E6300 · on Feb 14, 2017

In order to do that, the standard would have to define that dereferencing a null pointer must produce a deterministic behavior. There are only two possible behaviors:

1. The program successfully reads/writes that memory location and retrieves/overwrites whatever is there without crashing. Then the program can continue on and execute the if even if the pointer was null.

2. The program crashes immediate whenever a null pointer is read/written.

#1 is problematic, because NULL is a single pointer value that can be applied to a variety of pointer types. What happens if you first write to (long )NULL and then read from (FILE )NULL?

#2 is very useful, and most platforms already crash any program that tries to read or write NULL. But if the standard requires this behavior, then this introduces an even stronger guarantee that a dereferenced pointer is not null, so there's no reason to remove that optimization.

vyodaiken · on Feb 14, 2017

C is not Haskell or Java. The C programmer may intend to interact with actual hardware and is not required to interact with some abstract machine. The standard can reflect this or it can attempt to convert C into a poorly designed high level language. Dereferencing the null pointer should be implementation dependent, but the compiler should be required to either detect and flag this as an error or compile it into the machine operations indicated. The actual execution in the second case may depend on the environment.

E6300 · on Feb 14, 2017

Sorry, but you are just wrong. The C standard does define an abstract machine.

> but the compiler should be required to either detect and flag [dereferencing the null pointer] as an error

How could the compiler detect at compile time the value of a run time variable? Sure, some instances might be detectable, but those are the extreme minority. Static analysis tools such as Clang are already capable of finding those compile time NULLs.

> or compile it into the machine operations indicated. The actual execution in the second case may depend on the environment.

Which is exactly what's done now. In most platforms accessing NULL causes a crash, so either the pointer is not null and the program doesn't crash, so the check in redundant; or the pointer is null and the does crash, so the check is never executed.

mfukar · on Feb 14, 2017

Compilers operate under rules. The rules in this case say the null test is redundant.

You could argue the rules are "wrong", but that's a totally different topic.

lmm · on Feb 14, 2017

This battle has been fought and lost. If you require sensible behaviour, just move on and use a language that offers it. C compilers will do what makes them look good on benchmarks, and various "friendly C" efforts have been tried and failed.

vyodaiken · on Feb 14, 2017

Au contraire - gcc and clang both appear to do the right thing now.

mfukar · on Feb 14, 2017

https://godbolt.org/g/lqmNLh

mjevans · on Feb 14, 2017

You can't though. It's always possible to re-link the objects.

E6300 · on Feb 14, 2017

You can if, for example, the function is static. Also there could be a link-time optimization pass. The linker can see all calls to that function, unless the function is exported.