`restrict` pointers have nothing to do with the underlying "object" they point into (an array in this case). `restrict` lets the compiler assume that reads through a restricted pointer or any pointer expressions derived from it are only affected by writes through that pointer or expressions derived from it. There are only two writes in this example: `*x = 0`, which is trivially correct; and `*ptr = 1`, where `ptr` is derived from `x` (via `ptr = (int *)(uintptr_t)x`) so this is also correct. However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`. The post addresses this in "The blame game" and mentions that integers could carry provenance but that it's infeasible to actually do this.
The weak provenance solution is to ban optimizing the pointer to integer casts since they have a (conceptual) side-effect. The strict provenance proposal points out that the side effect is only observable if the integer is cast back to a pointer, so we can keep optimizing pointer to integer casts and instead ban integer to pointer casts. For example, an operation like `(int *)xaddr` is banned under strict provenance. Instead, we provide a casting operation that includes the pointer to assume the provenance of; something like `(int * with x)xaddr`. With this new provenance-preserving cast, we can see that the final optimization of replacing `*x` with its previously assigned value of `0` is no longer possible because the code in between involves `x`.
> However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`.
Yeah this article is great and the framing is pretty perfect. It really shows that optimization passes can't remove information, else they run the risk of tricking later passes. I definitely agree with OP that "the incorrect optimization is the one that removed xaddr"; that optimization seems wild to me. You only know y is x + 1 because of the way it's constructed in the calling function (main). So the compiler... inlines an optimized version of the function that removes most use of x? Isn't that optimizer fundamentally broken? Especially in a language with `volatile` and `restrict`?
Sure, but that requires compilation unit level analysis or inlining (when inlined you can include pointer provenance from main), otherwise you can't guarantee the relationship between x and y.
I guess what bugs me about optimizations is that it feels like something _I_ should be doing. Like if GCC told me this code optimizes down to printf 1 and why, I'd question what I was doing (and rightly so). Doing it automatically feels like too much spooky action at a distance.
In the case of the code we're talking about here, gcc/clang do rely on inlining to optimize down to the single printf. I don't think there's any actual compiler that does the dangerous and invalid optimization in the article.
OH! I've clearly misunderstood then. Rereading, it does look like this is just a hypothetical to illustrate the tension between allowing pointer-int-pointer round-trips and foiling analysis based on pointer provenance. OK I'm caught up, thank you haha.
The weak provenance solution is to ban optimizing the pointer to integer casts since they have a (conceptual) side-effect. The strict provenance proposal points out that the side effect is only observable if the integer is cast back to a pointer, so we can keep optimizing pointer to integer casts and instead ban integer to pointer casts. For example, an operation like `(int *)xaddr` is banned under strict provenance. Instead, we provide a casting operation that includes the pointer to assume the provenance of; something like `(int * with x)xaddr`. With this new provenance-preserving cast, we can see that the final optimization of replacing `*x` with its previously assigned value of `0` is no longer possible because the code in between involves `x`.