Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pointers Are Complicated III, or: Pointer-integer casts exposed (ralfj.de)
129 points by lukastyrychtr on April 15, 2022 | hide | past | favorite | 103 comments


I am misunderstanding something about `restrict`.

> the original program already had Undefined Behavior, or (at least) one of the optimizations is wrong.

As far as I know, since the uwu function was declared with x and y as restrict, the way uwu is called in main is undefined behavior. Because they are both pointing into the same array, and are both 'derived' from the same array.

I guess if I am wrong its because `restrict` does not care if both are derived from the same pointer. Instead it might only care if x was derived directly from y or y was derived directly from x. Is there a good reason to only care about this 'directly derived from' rather than 'shared derivation'? It would sorta suck if you can't use a function with to restrict arguments pointing into the same array, but this example seems to suggest that might be a reasonable requirement.


`restrict` pointers have nothing to do with the underlying "object" they point into (an array in this case). `restrict` lets the compiler assume that reads through a restricted pointer or any pointer expressions derived from it are only affected by writes through that pointer or expressions derived from it. There are only two writes in this example: `*x = 0`, which is trivially correct; and `*ptr = 1`, where `ptr` is derived from `x` (via `ptr = (int *)(uintptr_t)x`) so this is also correct. However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`. The post addresses this in "The blame game" and mentions that integers could carry provenance but that it's infeasible to actually do this.

The weak provenance solution is to ban optimizing the pointer to integer casts since they have a (conceptual) side-effect. The strict provenance proposal points out that the side effect is only observable if the integer is cast back to a pointer, so we can keep optimizing pointer to integer casts and instead ban integer to pointer casts. For example, an operation like `(int *)xaddr` is banned under strict provenance. Instead, we provide a casting operation that includes the pointer to assume the provenance of; something like `(int * with x)xaddr`. With this new provenance-preserving cast, we can see that the final optimization of replacing `*x` with its previously assigned value of `0` is no longer possible because the code in between involves `x`.


> However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`.

Yeah this article is great and the framing is pretty perfect. It really shows that optimization passes can't remove information, else they run the risk of tricking later passes. I definitely agree with OP that "the incorrect optimization is the one that removed xaddr"; that optimization seems wild to me. You only know y is x + 1 because of the way it's constructed in the calling function (main). So the compiler... inlines an optimized version of the function that removes most use of x? Isn't that optimizer fundamentally broken? Especially in a language with `volatile` and `restrict`?


It's a static function, so the compiler knows main is the only caller. gcc -O optimizes the whole program down to printf("%d\n", 1).


Sure, but that requires compilation unit level analysis or inlining (when inlined you can include pointer provenance from main), otherwise you can't guarantee the relationship between x and y.

I guess what bugs me about optimizations is that it feels like something _I_ should be doing. Like if GCC told me this code optimizes down to printf 1 and why, I'd question what I was doing (and rightly so). Doing it automatically feels like too much spooky action at a distance.


In the case of the code we're talking about here, gcc/clang do rely on inlining to optimize down to the single printf. I don't think there's any actual compiler that does the dangerous and invalid optimization in the article.


OH! I've clearly misunderstood then. Rereading, it does look like this is just a hypothetical to illustrate the tension between allowing pointer-int-pointer round-trips and foiling analysis based on pointer provenance. OK I'm caught up, thank you haha.


Indeed I don't think C cares about how the pointers you mark as "restrict" were constructed, it just cares about how you actually use the pointers and in the example code they are never used in an "overlapping" way.

I just checked cppreference and it says: "if some object that is accessible through P [marked restrict] (directly or indirectly) is modified, by any means, then all accesses to that object (both reads and writes) in that block must occur through P (directly or indirectly), otherwise the behavior is undefined"

The only accesses I can see are "*x = 0;", "*ptr = 1;" and "return *x;". "*ptr" is an indirect access through "x" by way of an integer round trip. If the orignal program is indeed UB that would mean that pointers built from an integer round trip are not considered (indirect) accesses through the original pointer.

The author assumes that it is an indirect access: "However, the only possibly suspicious part of the original program is a pointer-integer-pointer round-trip – and if casting integers to pointers is allowed, surely that must work. I will, for the rest of this post, assume that replacing x by (int*)(uintptr_t)x is always allowed."


> As far as I know, since the uwu function was declared with x and y as restrict, the way uwu is called in main is undefined behavior.

In the first example the integer-cast derivative of the second pointer does alias the first but isn't used to access either object, therefore there is (or should be) no UB.

> I guess if I am wrong its because `restrict` ...

The `restrict` keyword was -IIUC- specifically intended to make it possible to write functions like `memcpy()` w/ optimizations that are made possible by knowing that the pointed-to objects do not overlap. The semantics of that are crystal clear in a hand-wavy way, but... very difficult to nail down exactly.

With a hand-wavy definition of `restrict` it's pretty clear that the first `uwu()` is perfectly fine and has no UB.


IMHO, casts from integer to pointer should have been deprecated in C (and derivative languages) a long time ago.

Not only do they encourage unsafe practices, being forbidden in most C/C++ style guides.

They can also wreak havoc with any hardware or software-hardening that puts non-address bits in pointers. For instance, ARM PAC has been in Apple's CPUs for years now and support is mandatory in ARM v8.3 and up. Intel and AMD have also had experimental extensions that use those bits, and new ones are on the way.

When an integer is cast to pointer in those schemes, the pointer can become invalid. Therefore, an integer cast to a pointer type should always result in an invalid pointer, IMHO.

Integers are sometimes used to set alignment. I'd suggest that the macros in CHERI C be added to <stdalign.h>: align_up, align_down and is_aligned. Those can be expressed in normal C without having to cast an integer to pointer, or use faster expressions depending on the platform.


How do you do embedded then? I mean you need to specify the addrese of lets say the spi interface. Usually numbers are written as integers. So we would need a syntax for pointer literals?


Or language interop? At least in jni it’s normal to store a c/c++ pointer in Java as long, and then send it back to c where it has to be converted to a pointer.


I don't particularly agree with the grandparent that getting rid of pointer-to-integer casts makes sense, but at least the specific question you're asking doesn't seem like a non-starter. Two options that come to mind immediately would be making a union { intptr_t; foo * } well-defined; or adding a 'p' or 'ptr' literal suffix so 0x80004000p has type void *.


Yes type punning is still possiblem, but doesnt this violate the 'spirit' of the original idea (to get rid of conversions between int and *)


Type punning in general is undefined (unless one of the types is char * or equivalent); but defining it between intptr_t and pointer types in a few restricted contexts (e.g. for const variables of static storage duration only) would be sufficient to address memory mapped registers and restrictive enough that it should be possible to define clearly.


TBH I feel the opposite. "Pointers" should be simple memory addresses again without any compiler magic attached, and memory load/stores should be explicit, with the same level of control as assembly. At the least this would make manual performance optimization more predictable. There are plenty of languages "above" C, no need to move C into that direction even more, but I think the unexplored area "below" C (and above assembly) is much more interesting.


The problem is that C developers want their code to work as they expect, which is reasonable, and at the same time they want their code to be as fast as possible, which is also reasonable. But in practice these two desires are often at odds. So when C compilers start exploiting more UB in order to make code faster programmers turn around and complain, and then when C turns out to be slower than Fortran due to a lack of UB-exploiting optimizations a different segment of programmers will complain. It's a difficult balancing act for compiler and language developers.


I'm starting to think we need a way to allow UB-based optimisations on a per-function or per-block basis, with the rest of the code being compiled with the simplest mental model. It's getting a bit hard to reason about whether your code is actually safe or not, it would be better to compile most code as if all pointers can alias, integers wrap around, accessing nullptr is allowed, etc. (subject to per-platform quirks) Then we can mark the hot functions as candidates for better optimisation. Maybe other than an attribute we could add assertions to the code, e.g. assert_no_overlap(ptr1, size1, ptr2, size2) -- assert that the range [ptr1, ptr1+size1) does not overlap [ptr2, ptr2+size2). That could then optionally be compiled into an actual runtime assertion to help catch UB at runtime.

Ultimately the C (and C++) memory model is meant to help us write fast, safe software. Developers must be able to reason about the semantics of their code. Optimising short parts of the program that can be reasonably checked by a person for not invoking UB is probably the right way and will result in fast software with fewer bugs due to programmer error.

Edit: Not sure why people are downvoting this, it's like there's some UB religion out there. Do you people want your programs to crash due to the compiler doing surprising things?


> Do you people want your programs to crash due to the compiler doing surprising things?

I obviously can't speak for everyone who defends this sort of compiler behavior, but my preferred solution is for the compiler to continue performing these kinds of optimizations while also providing better static analysis and sanitizer tools to catch problems earlier, even if that involves extending or replacing the language to make that possible.

Expecting everyone who writes C or C++ to do this kind of reasoning in their head and perform these optimizations manually (or drop their performance back toward -O0) just sounds like it would make things worse.


This already exists. gcc has optimization pragmas and function attributes that allow optimization to be set on a per-function level. However, they don't really work as expected, mainly because inlining exists. Example:

  int foo(void) { return 42; }
  int bar(void) { return foo() + foo(); }
If the whole file is compiled as -O0, foo and bar must be compiled as-is. If the whole file is compiled as -O2, bar will be optimized to "return 84;". But what if foo is compiled as -O0 and bar is compiled as -O2? Can bar still be optimized to "return 84;"? What about "return 42+42;"? Can it do "int x = foo(); return x+x;"? What about "int x = foo(); if (x == 42) return 84; else return x+x;"? All of these are permitted in -O2 if gcc thinks it's a good idea, but in our "mixed mode" the semantics are unclear. It might be theoretically possible to come up with a consistent definition; it might start with "functions specified as -O0 cannot be inlined or have other functions inlined into them", but there are still more things to consider, like the behavior of global memory, thread-local storage, floating-point environment, and so on (what if one or more functions is static inline or __attribute__((always_inline))?).

The real solution with current gcc is to simply compile different files ("translation units") with different optimization flags if you want. This has always been supported. Unfortunately, it comes with a potentially significant performance penalty; the whole point of LTO is to avoid this performance penalty, but this once again returns the issue of the semantics of mixing functions with different optimization levels.


> Ultimately the C (and C++) memory model is meant to help us write fast, safe software

Fast, maybe. Safe, not really.


I think "restrict" is a really nice compromise in C: Your code declares that you're willing to follow additional rules, and the compiler can make additional optimizations.

The article shows a piece of code that makes that promise and then doesn't hold up its part of the agreement. I can't even follow the rest of the argument from there because it's all based on a faulty foundation.


> The article shows a piece of code that makes that promise and then doesn't hold up its part of the agreement.

Edit: disregard; I now see your other comments.

Incorrect. The article shows a piece of code that makes and fulfills that promise, and then the optimization passes break the code. If you disagree, I would be interested in hearing how/where you think the code itself is faulty.


No, I was sloppy when I read it the first time. The second snippet of code is broken, and I (incorrectly) assumed it was a valid translation from the first snippet. My bad.

The first snippet is subtle, and I'm not a language lawyer, but I can't see anything that screams "contract violation".


On the contrary, code that needs to do integer-to-pointer casts is almost exclusively written in C and C++. Hardware features such as PAC and CHERI were designed to work with those languages assuming that integer-to-pointer casts are possible. Any extension that prevents pointer tagging, extracting signatures, and the like is dead on arrival. Most code doesn't need to touch this kind of stuff, but the whole point of the languages is that it lets you do this when you need to.


The Rust APIs mentioned in the OP support things like pointer tagging without exposing a raw integer address to the user. AIUI CHERI had to bend over backwards to support these operations in C, not because it wanted to but because it had to out of pragmatism. I wager that the CHERI authors would be thrilled if the grandparent's proposal to disallow int2ptr casts were possible to implement without ruling out every significant C codebase in history.


I understand what you’re saying but I think we’re talking about two separate contexts, I’ll call them “CHERI unaware” and “CHERI aware”. There’s plenty of low-level, pointer casting CHERI unaware code, that should continue to work on the hardware. But there’s also people who write secure systems on top of it, and would like to do so in C and C++, so it’s important that they are able to access tags and such. Whether that is through wrappers over assembly instructions, or compiler intrinsics, or whatever, but the easier it is to use from high-level code the more likely it will see adoption there.


Well one reason I think is the following pattern in some library.

  struct SomeStruct {
    ...
    void (*callback)(A);
    A user_data;
  }
Here A will be either int or void*. But regardless of stated type, the expectation is that the user puts whatever they find the most convenient there, whether thats a pointer or integer (fd maybe?).


> When an integer is cast to pointer in those schemes, the pointer can become invalid

Doesn't that mean the compilers are non-compliant? uintptr_t is defined in terms of supporting round-trip conversions, no?


I was wrong elsewhere in this discussion. I now think the problem isn't that the original code breaks any rules/promises, but that the transformation to the second version of the code is not valid.

> So, which of the optimizations is the wrong one?

In the original function, the code compares addresses but only modifies through `x`. But since it's a static function, and since it can see all the call sites, it eliminates a test it can prove is always true. So far so good.

However, it's not valid to keep the `restrict` declarations after that rewrite. It changed a modification through `x` to a modification through `y`, so that clearly violates the promise.


This analysis is correct but the solution is not feasible. Changing a modification through `x` to a modification through `y` does indeed violate the semantics of `restrict`. The problem is that in order to detect this situation, we'd have to track the provenance of integers. In this specific example, we'd have to know that replacing `xaddr` with `y2addr` affects `x` and `y`. There is general consensus that tracking provenance for integers causes many more problems than it solves, so although this would solve the problem it is not feasible. This is why weak and strict provenance are being pursued instead.


With regards to optimization passes, does it matter if the analysis is infeasible? We agree the optimization/transformation from the first version to the second isn't valid, so I think it shouldn't have been done.

The original version with two stores and one load doesn't seem to have a problem. Having the optimizer punt when it gets confused by integer to pointer casts seems acceptable.


It's not sufficient to say "there's an int2ptr cast, so stop optimization." You can break the code up across several function calls, which means the code doing the optimization can't tell that there's an int2ptr cast around to shut down the optimization. (Most optimizations are intraprocedural, and even if they could look interprocedurally, there's no guarantee that the function body is even available for inspection--they could be in different C source files).

Instead, you'd have to instead prove that there's no possible int2ptr cast around everywhere, which is generally an impossible analysis.


> It's not sufficient to say "there's an int2ptr cast, so stop optimization."

Complicated or not, it's necessary that optimizations do not break correct code.

There doesn't seem to be a problem (UB or otherwise) in the first function at the top of the article, but the second one has a clear aliasing problem that violates the promise the `strict` makes. That translation was invalid.


If you can't track provenance to un-restrict the pointers because it's infeasible, then you have to give up on at least one of the optimization passes. In this case, the optimizations used are very fundamental and giving up on any one of them unilaterally would be catastrophic for performance. The provenance models being suggested add more nuance to the model (pointer provenance) so that we can keep all of the optimization passes while preventing these cases from being optimized incorrectly. Weak provenance says we can't optimize away the pointer to integer cast, strict provenance says we must provide provenance for integer to pointer casts. Weak provenance is broadly compatible with existing code (compiler semantics change) whereas strict provenance is not (language semantics change). The tradeoff is that strict provenance leads to better optimization in general.


Catastrophic sounds strong. As far as `restrict` goes, C was never that far behind Fortran in performance.

And if maintaining `restrict` for other passes is really important, maybe the order of the passes should be changed. I'm not pretending compiler optimization is simple, but I can't see any situation where having an incorrect optimization pass run first is the right thing to do. The broken pass needs to be fixed, and it shouldn't have emitted code with incorrect `restrict` on it.


It is much simpler in Pascal

When you write to a pointer, it writes into memory. And it does not optimize it away. As if all pointers are volatile.


But then you have to constantly work around that in your code, making it read more like assembly in a way.


PureBasic's new C-backend had the same problem, declaring them volatile solved it.

https://www.purebasic.fr/english/viewtopic.php?t=78859


Which Pascal?


FreePascal:

    procedure x;
    var a: integer;
       b: pinteger;
    begin
      b := @a;
      b^ := 1;
      b^ := 2;
    end;
    
becomes

    project1.lpr:15                           begin
    0000000000401090 50                       push   %rax
    project1.lpr:17                           b^ := 1;
    0000000000401091 c7042401000000           movl   $0x1,(%rsp)
    project1.lpr:18                           b^ := 2;
    0000000000401098 c7042402000000           movl   $0x2,(%rsp)
    project1.lpr:19                           end;
    000000000040109F 59                       pop    %rcx
    00000000004010A0 c3                       ret    

Perhaps Delphi, too

Although I do not know how much is by design and how much comes from bugs in the optimizer. But they said that one reason why they maintain their own optimizer rather than switching to LLVM. LLVM would "break" this function


Are there any experimental systems programming languages exploring the opposite of "pointer provenance" (and the resulting compiler complexity)?

E.g. treat memory strictly as some sort of separate storage device, and do all computations in local register-like variables (which could be backed by a special invisible scratch space to allow variables to spill from registers to scratch space.

Basically a language which would require explicit load/stores from and to regular memory, instead of trying to provide the illusion of an infinitely large register bank.


I think LLVM IR (and other IRs) works a bit like this. There you have an infinite set of registers while memory is separate and you need to explicitly load and store between them. Unfortunately it is not very suitable for humans to write.


Compiler IRs have the same semantics as the language they came from, otherwise it wouldn't be possible to optimize away any memory writes. I forget how it works in LLVM, but in general it's a combination of just saying which language it originally was and lots of alias set metadata.


Correction: they have the same provenance issues as the languages they come from; LLVM IR allows optimizing out memory operations based on aliasing and doesn't explicitly track provenance or aliasing in all cases, therefore it inherits these problems from the C it was designed to compile.


Assembly languages sound like what you describe, and they do not make the burden of memory management go away, just shift it.

You still have to manage a separate storage device somehow (most often via file systems drivers). That's why GCs are so popular: you have a smart "driver" between your code and a RAM storage that makes life so much easier.


It's not so much about removing the burden or making life easier, but instead making manual optimization more straightforward and predictable. Currently, manual optimization often means appeasing or even fighting the compiler's optimizer passes, with small innocent changes completely changing the output of the compiler.


Now that I think about exposing cache hierarchy in a programming language, that makes much more sense. One can imagine a programming language with explicit control over CPU caches and their hierarchy. Also, this could make GPU/CUDA programming more explicit, safe and efficient.

Still, this requires the hardware to cooperate with software instead of pretending that random access memory is truly random access. This functionality just isn't there at this point.

Edit: this would require programmable caching hardware to make caching analysis and algorithms introspectable and changeable. For now, fixed caching algorithms implemented in hardware do provide lots of benefits.


DSPs and other application-specific processors expose the cache hierarchy as a set of scratchpads. This works very well for them, but not for any CPU that is shared between applications, like a server.


What you're describing is reminding me of how Itanium tried to bring VLIW to non-embedded spaces and found "make the compiler smart enough to use this well" to be much easier said than done.


I do wonder if the long-term future of programming looks like https://www.microsoft.com/en-us/research/publication/coq-wor... , and providing libraries to help the programmer optimize themselves rather than black boxes that do it for them.


Are you looking for a language that does function local, bounded optimization, and dumps state directly to memory at various well-defined boundaries?


I think more like a medium level language somewhere between assembly and C, with a clear distinction between a "virtual register bank" and "mass storage memory". Moving data between the two would be an explicit operation, not something the compiler would ever do automatically.


So, one way you can do this is to mark all your "memory" pointers as volatile, then load them into register variables and store them back manually. This would actually allow for very aggressive optimizations in the region that you've fenced off with "register" since the compiler can assume there is no aliasing, while letting you define the boundary where you'd like to writes to "go to the hardware". In C this might be a bit of boilerplate but in C++ once could assume you could RAII the boilerplate away…might be worth exploring.


There are a few parts of this I still struggle to understand. I don’t get why ptr2int casts are a problem if you never try and cast the integer back to a ptr. It seems like int2ptr is the real issue.

Also it’s said that casts to int are better then transmutes because cast have side effects. But ptr2int casts don’t actually have side effects, they are a No-op on the hardware.


They are problem in the sense that the address of the pointer gets exposed. After you lose track of who has it and who might do what with it, you can't track the aliasing information of the pointer, so you have to suppress some optimizations. But you are correct in the sense that if int2ptr never happens, it's all good.

About side effects: we are not talking about having side effects on hardware level, we are talking about side effects from the compiler's viewpoint. Again, the compiler might track aliasing information for optimizations, and casting has the side effect of "exposing" the pointer.


> I don’t get why ptr2int casts are a problem if you never try and cast the integer back to a ptr.

AFAIK, you do understand. ptr2int casts are totally fine and defined behavior, as long as the program contains no int2ptr casts. Is there a passage from the OP that contradicts this?


From the section "Casts have a side-effect":

> But in this case, the operation in question is (uintptr_t)x, which has no side-effect – right? Wrong. This is exactly the key lesson that this example teaches us: casting a pointer to an integer has a side-effect, and that side-effect has to be preserved even if we don’t care about the result of the cast. ... We have to lose some optimization, as the example shows. However, the crucial difference to the previous section is that only code which casts pointers to integers is affected.

So even if we never even use the result, casting a pointer to an integer is problematic.

But in the explanation he only talks about the problems of int2ptr cast, which I do undestand.


The problem is that, if we assume that integers don’t have provenance, some far distant part of the code could guess the integer and do an int2ptr. If you can prove that nothing in the entire program could possibly do this for the entire lifetime of the original object, then sure, you could remove the ptr2int. But compiler optimizations usually work one function at a time. In some cases it might be feasible to prove this anyway, like if (a) you have a function that doesn’t call any other functions and (b) the object in question is a local variable that will go out of scope at the end of the function, making any further accesses UB regardless. But in most cases it’s not feasible.


Indeed int2ptr is the "evil" operation. If we banned it, we could get rid of all this "exposed" stuff and ptr2int would be fine. However, in order to make int2ptr work, we have to also make ptr2int a bit more complicated. That's what the example shows: removing a ptr2int introduced UB into the program.

Rust now (experimentally) has an `ptr.addr()` operation that is like ptr2int without the "expose" part, i.e., the resulting integer cannot be cast back but still used for other purposes.


I'd assume because it has something to do with the idea is that an optimizing compiler can't completely delete the address entirely anymore if it's being used for "something". For example optimizing it away to a register only variable.


> they are a No-op on the hardware.

That is not guaranteed. The only guarantee is that you can round trip conversions via (u)intptr_t. The integer representation of a converted pointer can be completely different to accommodate hardware like the Symbolics lisp machine.


Are we losing optimization on x86/arm due to mere existence of other hardware (like symbolics or CHERI) that handles things differently?


You don't lose the optimizations because of UB and aliasing rules letting them stay in, but the people who want to make C safer by simply defining all UB would lose you all these optimizations.

ARM already includes a small part of CHERI (pointer signing) and the rest is coming.


I mean you are correct, but why else would a pointer be converted to it if not to cast it back at some point? I guess you can print it for debugging, but most other uses means it will be used as pointer at some point.


Another reason for converting pointers to integers without ever doing the reverse operation is to hash them.


Some hand-vectorized code will do this to compute the number of non-vectorized elements that may exist before and after a suitably aligned region.


> [...] restrict, a C keyword to promise that a given pointer x does not alias any other pointer not derived from x

The author clearly states that restrict is a promise, and then immediately violates that promise. If you lie to the compiler, what do you expect to happen?.

Declaring your pointers "restrict" is a promise from your code to the compiler, not from the compiler to your code.

It seems like the entire rest of the article is based on this misunderstanding, and there are a whole bunch of strong statements and questionable conclusions drawn from it:

> This is bad news. Our optimizations changed program behavior. That must not happen!

> the only possibly suspicious part of the original program is a pointer-integer-pointer round-trip

> while the details of defining “derived from” are nasty, it should be clear that doing a bunch of operations that literally don’t involve x at all cannot by any stretch of the imagination produce a result that is “derived from” x

It's hard for me to make sense of the rest of the business about "provenance" when the problem it's supposed to solve are based on faulty reasoning. The part about casting to integer having side effects doesn't follow either.

I've always liked C, and I'm coming to really like Rust. I hope mistaken arguments like the ones in this article aren't persuasive to anyone in the compiler camps.


The one who is mistaken here is you, although I shouldn't be surprised that someone can dismiss what is relatively deep semantics work in compilers on the basis of misunderstanding what's actually going on.

The first fundamental issue is that pointer provenance is an area where people have (historically) dealt with the issue as essentially a requirement of data-dependency: for a dereference `p` to access an object x, then p must be at the end of a chain of arithmetic computations where &x is one of the inputs. It turns out, however, that compilers do not preserve data dependency, and there is pretty unanimous agreement among compiler writers and specification writers that they should not* do so in general.

What this means is that the challenge we have is how to specify something that generally follows the data-dependent rules while still allowing compilers to do their non-data-dependency-preserving optimizations. There is a working group in the C committee that is doing this. The current leading candidate for a specification is based on the idea that you track provenance (i.e., data dependency) via pointers, but not via integers.

This means that now we need to pin down what the actual semantics are where you convert a pointer to an integer or vice versa, and this blog post is part of the effort of trying to pin down sane semantics here.

So, yes, restrict is a promise from the programmer to the compiler... but what is the programmer actually promising? That's what this blog post is exploring.


> The one who is mistaken here is you, although I shouldn't be surprised that someone can dismiss [...]

Yes, I was mistaken in my first read through. I'm taking your indignation with some amusement, but I hope you can be civil going forward.

If you look at the second snippet of code, you can clearly see how it declares the arguments `restrict` and then uses them to access the same piece of memory. That's *wrong* - it violated the promise made by `restrict`.

What I didn't see at first was that the first snippet of code appears to be fine. As far as I can tell, it doesn't break any rules. *So the optimization pass from the first version to the second version is incorrect.* After that, everything that follows is still questionable.

The following passes use the `restrict` promise from the second version to eliminate from the load from `x` and so on...


You just fully agreed with what I said in the post. :) Explaining that one of the optimizations is wrong is my entire point. Then I go on saying which optimization is wrong (in my view) and propose a structural explanation for why it is wrong (casts have side-effects). Basically: if casts did not have side-effects, the optimization would be correct (it is always okay to remove a side-effect-free operation whose result is unused), so via proof by contradiction casts must have side-effects.


No, I don't agree with what you said in your post, and you make a lot of invalid proofs by contradiction.

The first pass of your optimizer took a valid program and made an invalid one. That should be the end of the article, but then you ran further (presumably valid) optimization passes which led to erroneous results.

Your whole argument seems to look like:

    (pointer_integer_pointer & invalidly_keeping_restrict) => erroneous_optimization
Since we can't have erroneous optimizations, you make a faulty conclusion the problem is pointer to integer round trips. You even say it here:

> However, the only possibly suspicious part of the original program is a pointer-integer-pointer round-trip

But it's not the only suspicious part, so all of your following statements are questionable. The contradiction should've led you to:

    !erroneous_optimization => (!pointer_integer_pointer | !invalidly_keeping_restrict)
Using Modus Tollens and De Morgan's. Instead you declare that casts have side-effects, dive into "integer provenance", make some distinction between "as" casts and transmute, and so on. Meanwhile, I conclude that broken optimizer passes should be fixed or removed.

> I would argue that the alternative is to treat the original program (after translation to Rust) as having Undefined Behavior.

I'm sure you'll get your way, and you probably won't stop until Rust has as many tricky "undefined behaviors" for the compiler to exploit as they do in C. Programmers will need a masters degree in "provenance" to avoid the traps you set for them. Then when they stumble and fall in the pit you dug, they can go to the forums and be chastised by the experts. You'll get some .1% speedup on a non-realistic benchmark and declare it a job well done.

> There are, to my knowledge, generally two reasons why people might want to transmute a pointer to an integer

This "argument by lack of imagination" strategy isn't sound either. What if you just haven't thought of a third reason yet?

> I think we should move towards discouraging, deprecating, or even entirely disallowing pointer-integer transmutation in Rust. That means a cast is the only legal way to turn a pointer into an integer

It's really unclear to me why the compiler can't recognize that transmuting a pointer to an integer is the same as a cast. And if it can recognize that, why make it UB?!? Maybe this is all about those 129 bit CHERI pointers. If you want to break the compiler for just that platform, more power to you.

Really, I don't care if you break transmute, so long you leave a way to cast function pointers:

https://rust-lang.github.io/unsafe-code-guidelines/layout/fu...


The original pointers `x` and `y` are not aliased. Only `x` and `y2` are aliases. Does the standard not allow you to pass any pointers which could be aliased based on arbitrary pointer arithmetic in the function body? That seems basically impossible to fulfill ...

Note that the use case here is for llvm IR that the Rust compiler generates. It seems very hard in general to know if two mutable references point to items in the same allocation.


The code intentionally uses `y` to access stuff that it also accesses through `x`. Declaring `x` and `y` to be `restrict` was a promise that it wouldn't do that.

> Does the standard not allow you to pass any pointers which could be aliased based on arbitrary pointer arithmetic

With the right hackish arithmetic, all pointers could be aliased with pointer arithmetic in C. Using `restrict` says your code won't do that so the compiler can optimize more.

> It seems very hard in general to know if two mutable references point to items in the same allocation.

Yeah, and in the limit it's impossible for C. It's not the compiler's job to figure it out, it's the code's job to uphold the promise.

It's different with Rust because you've got mutable/exclusive or immutable/shared promises from the borrow checker. And Rust doesn't let you dereference raw pointers outside of an unsafe block.

It seems to me that mislabeling your pointers `restrict` in C is akin to doing reckless operations inside of `unsafe` blocks in Rust. In both cases, you told the compiler "trust me, I know what I'm doing".


> With the right hackish arithmetic, all pointers could be aliased with pointer arithmetic in C.

This is not true. The behaviour is undefined in C if pointer arithmetic results in crossing the beginning or end of an object. If we have int a, b;, then the behaviour is undefined if we do &b - &a even if we never use the result to try to reconstruct &b from &a.

> Using `restrict` says your code won't do that so the compiler can optimize more.

`restrict` covers objects modified through one pointer and accessed through another. It does not cover pointers that point to the same object in general; if only one is used to access the object, or if both are used only for reading, all is fine. (It's actually a little bit more complicated than I'm presenting here, but not in a way that's relevant right now.)


Seems like we need a C compiler flag like "--acknowledge-reality", because the compilers for nearly 100% of the server, desktop, and mobile computers in the real world have flat address spaces and do allow arbitrary pointer arithmetic.

Yes, it's technically UB. But which operation could any C compiler disallow in practice: converting from pointer to integer, arithmetic on integers, or converting from integer to pointer?


Sorry to disappoint, but the compilers for nearly 100% of the server, desktop, and mobile computers in the real world do not allow arbitrary pointer arithmetic. They cannot issue an error message for it because that is provably impossible to detect in the general case, but they do optimise on the basis that it does not happen, even if it breaks programs that try to make use of it. Consider this example for GCC:

    int a[2], b[2];
    int offset(void) { return b - a; }
    int check(void) { return &a[1] + offset() == &b[1]; }
The function check() is optimised by GCC to return zero at -O1 optimisation level or higher, because it reasons that no matter how offset() is implemented, either the addition is undefined, or the comparison results in false.

(Note that GCC does this even in a few cases where it is unclear whether the optimisation is valid. The example I provided is slightly more complicated than I would have liked to avoid that issue; the optimisation is definitely valid in this example.)


I originally responded to your code, and you're right about it. That's not what I expected or would want to happen.

However, your code doesn't actually address my comment about conversions and arithmetic. Where does this code code fail?

    int a[2], b[2];
    uintptr_t x = (uintptr_t)a;
    uintptr_t y = (uintptr_t)b;
    uintptr_t offset = y - x;
    int* p = (int*)(x + offset);
    printf("b == p: %d\n", b==p);
I didn't try _every_ compiler on Godbolt, but I didn't see it misbehave anywhere I did try.

edit: changed to uintptr_t


That is integer arithmetic, not pointer arithmetic. My understanding of the C memory model that is mentioned in the article (look for PNVI-ae-udi) is that what you are doing is or will be well-defined. I suspect there may still be a few implementations around where the offset you get from integer subtractions has no obvious relation to the offset you would get from pointer subtractions (for two pointers where subtraction is well-defined), but for your example that makes no difference.


> That is integer arithmetic, not pointer arithmetic.

Yeah, but I did say: "which operation could any C compiler disallow in practice: converting from pointer to integer, arithmetic on integers, or converting from integer to pointer?"

If C/C++ compilers keep breaking pointer arithmetic in the game of exploiting undefined behavior for optimizations, people are going to start doing pointer arithmetic with integers when they need it. And they do need it sometimes, for debuggers, profilers, memory checkers, JITs, garbage collectors, shared memory, and so on.


That may be good. If the pointer arithmetic with integers, or other constructs that have the effect of disabling optimisations, is kept to the code that has additional requirements beyond what the standard guarantees, that means the code out there that does not have those additional requirements, which I suspect is the majority of code, can continue to be aggressively optimised.


There's nothing in the C standard that I could find that dictates that b[] has to follow directly after a[] in memory. That it works is just happenstance (or an implementation detail). The only place were order is maintained are fields in a structure definition.


There's nothing in that example that assumes they are adjacent. It only assumes they are in the same (flat) address space. Put a gigabyte between them if you want.

If I had used `uintptr_t` instead of `ssize_t`, I think it's even compliant as far as wraparound goes.

edit: Note I changed the code to use uintptr_t


Same compiler and optimization level provides different behavior when you ask it to compile as C vs C++. Gross.


I used to program on a platform where you could have two pointers (say, integer pointers) that were not equal to each other, yet a store to one would change the location pointed to by the other pointer. Hint: it was a very popular architecture in the 80s and 90s. Spoiler: The 8088. I'm not saying I want to go back to that era (yes, I do like flat address spaces) but even on modern systems today it's possible to map memory such that two different pointers point to the same physical memory (but yes, you have to go out of your way to do that these days).

If I could, I would disallow integer to pointer---there are (in my opinion) better ways to address a hardware register than to do

    unsigned char *uart = (unsigned char *)0xdeadbeef; /* [1] */
But I don't think it will go away any time soon in C (Over 30+ years later, and we're still a few years out from 1's compliment and signed magnitude integers being deprecated).

[1] Here's one way using NASM:

         global   uart
         absolute 0xdeadbeef
    uart resb 1
Then in C code:

    extern unsigned char uart;


> I used to program on a platform [...]

Yeah, and I know that's why the C standard has so much slop in it. They want to keep the window of conformance open for past or future architectures.

> If I could, I would disallow integer to pointer [...]

It's not just about 0xDeadBeef hardware addresses, and I don't think anyone is talking about breaking assembly language.

However, C has unions and Rust has enums. You want those to share the same piece of memory in the fewest bytes you can get away with. Some interpreters, written in C, even use nan-tagging to store the important 48 bits of a pointer in the 52 bits of payload with double precision floats.

You don't have to like it, and you can discourage your children or coworkers from doing it, but there are legitimate reasons for all of this. The examples look ugly because they're short and stupid.

And nobody has even mentioned that `A[i] == (A + i) == (i + A) == i[A]` is going to continue to be valid in C. https://stackoverflow.com/a/381549


> That seems basically impossible to fulfill ...

Any sort of pointer arithmetics in the presence of restrict pointers seems like a giant footgun.

As far as I can read the standard, it does not forbid aliasing `restrict` pointer as long as these pointers are not used to access anything. Here y2 aliases with x, but at no point is anything accessed through y2 (whether reading or writing). So it I feel like it respects the letter of the law.


> The author clearly states that restrict is a promise, and then immediately violates that promise

Which part of the program do you believe violates the restrict promise?

I think it's not obvious that it is violated. The question is whether a pointer obtained through an integer round trip is considered to be derived from the original pointer. The author assumes that it is.


> Which part of the program do you believe violates the restrict promise?

The part where they modify a piece of memory using `x` and then modify the same piece using `y`.

That's a promise the programmer made inside that function and for anyone else who calls the function. This is C, so it's not the compiler's job to prove the implementer or caller got it right.


But it's never modified using y?


I see your point, but this is really niddly. It explicitly asks if the address in `x` is equal to the address in `y-1`. And if so, it modifies the contents at that address which is equal to both.

If the conclusion was that the C standard could/should tighten up its verbiage for cases like this, I'd agree. But the conclusion is something about adding provenance to integers...


I don't believe the argument is mistaken at all. Consider the interface of `uwu`. It takes two pointers with the extra promise that they do not alias each other.

When you call that function with two pointers that happen to be adjacent but non-overlapping, that promise is absolutely satisfied.

The problems start with the notion of 'derived from' and pointer arithmetic. Obviously you can derive something from `y` that aliases `x` by subtracting the offset from `y`. But that's always true, and that – I believe – is the author's point. If you view this as 'derived from', you can never satisfy any restrict guarantee on a function whose implementation you have not carefully reviewed.

As for having influence: Ralf Jung has written a PhD thesis on the correctness of Rust (including formal proof that the model is sound), so I'm inclined to believe people are willing to listen to him, and rightly so.


> It takes two pointers with the extra promise that they do not alias each other.

... and then promptly uses pointer arithmetic to create an alias. It violated the promise.

> If you view this as 'derived from', you can never satisfy any restrict guarantee on a function whose implementation you have not carefully reviewed.

This is C, not Rust. You aren't getting proofs without careful review. You're using "restrict" in order to ask the compiler to create a faster function on the promise that it will never be used in a way that the pointers alias. It's a promise that needs to be upheld within the function and for everyone who calls the function.

Think of something like `memcpy` (overlaps not allowed) vs `memmove` (overlaps allowed). The compiler can make the first one faster, and it is your job to uphold the contract.

> As for having influence: Ralf Jung has written a PhD thesis on the correctness of Rust

Appeals to authority aren't motivating to me. I've known and worked with a whole lot of PhDs, and they aren't always right. In fact, frequently the ones I've known are overly confident when they step outside of their area of expertise.


> ... and then promptly uses pointer arithmetic to create an alias. It violated the promise.

I think this is where a lot can go wrong in terms of understanding.

It was my understanding that the restrict keyword requires a promise of the caller. Specifically, the caller promises that the arguments it passes do not alias. It was my point that the `uwu` function could not violate that promise, because it wasn't the one who made it.

After review of the standard, I'm no longer certain this is the case.

Disregarding this line of reasoning entirely, I still believe the author is correct.

The following is an excerpt from the Standard (or rather the latest available draft):

> An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.3.1 below, requires that all accesses to that object use, directly or indirectly, the value of that particular pointer

The `uwu` function complies with that requirement. While it does create an aliasing pointer derived from `y`, it never uses it to actually access memory. Hence, it does not violate the requirement imposed by restrict.

The aliasing access derived from `y` (which may well be undefined behavior after all) is only introduced after one of the optimization passes. Thus, this optimization is incorrect and may not be applied.

Edit: of course, there is an aliasing pointer in the original version of `uwu`. This pointer, however, is derived from `x`, thus not violating any guarantees implied by restrict.


> Thus, this optimization is incorrect and may not be applied.

You're right, and I was wrong in my initial take on things.

I currently believe the first version of the code is fine, but that the second version is clearly incorrect. There, you'll see it's using an address derived from `y` to modify memory modified through `x`. The translation to the second version should either not be allowed, or it should not continue to declare the arguments as `strict`.

As for the promise that `strict` makes, I interpret to be a promise both about the function and a promise from all callers of the function. In other words, it's a promise from the programmer(s) to the compiler, and nothing more.

For instance, the arguments to `memcpy` were declared as `restrict`. The documentation doesn't allow overlapping ranges, so the implementation would be fine. However, `uwu` could call it with overlapping ranges. It doesn't matter if `uwu` didn't make the promise, it has to uphold it for correct behaviour.


I believe we're on the same page now. (I assume in the following that where you wrote `strict`, you meant `restrict`.)

I now understand the standard such that it implies that promises made by `restrict` must be upheld in the whole program and distinctions between caller and callee are irrelevant. Both must cooperate such that the promises remain true.

As for the given code for `uwu`, I also believe the first version is allowed and the second is not. Since the second is the result of (hypothetical proposed) optimizations, the latter may not be applied even though they seem fine on the surface.

So we must have a way to determine when optimizations may be applied and imbuing pointer-to-integer casts with side effects certainly fixes this situation. It seems like a sound approach, but I do hope for a formal analysis, of course.


> (I assume in the following that where you wrote `strict`, you meant `restrict`.)

Yup, silly typos. Sorry about that.


Looks like you ended up agreeing with my post then. :)

(FWIW I fully agree re: appealing to authority. I'd rather people engage with my arguments than take them on face value.)


I'm not sure where you get that idea from. I think you draw wild conclusions from faulty logic.


> and then promptly uses pointer arithmetic to create an alias. It violated the promise.

afaik not creating aliased pointers is not a promise you make when marking a pointer as restrict. Only when using that aliased pointer for accesses (in the presence of modifications) can you actually break the promise. To quote cppreference:

"During each execution of a block in which a restricted pointer P is declared (typically each execution of a function body in which P is a function parameter), if some object that is accessible through P (directly or indirectly) is modified, by any means, then all accesses to that object (both reads and writes) in that block must occur through P (directly or indirectly), otherwise the behavior is undefined"


> Only in the presence of accesses and modifications can you actually break the promise.

Ok, creating aliases you don't use is probably fine. I should've worded that more carefully.

However, the example code certainly modifies the same piece of memory from two different pointers, and so it broke the promise.


Yes, I think it boils down to whether creating a derived pointer from a restrict pointer and then using both is allowed. I just had a look at the C standard and I think it is allowed if the derived pointer is (what the standard calls) "based" on the restrict pointer (page 89, 6.7.3.1 paragraph 3: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf). The standard language is very dense but it makes it sound like the program presented in the article would not uphold this "based" property and so might be UB as you suspected.


The standard defines "based on" by talking about hypothetical alternative executions where the original value has a different value. In those executions, the access in question does not even happen in my program. What this goes to show is that the definition in the standard is ill-formed -- this can happen because the standard is written in English rather than a formal, mathematical language.

However, I think it is quite clear that the intention of the standard is to allow my program. It would certainly allow the alternative where the `if (xaddr == y2addr) {` is removed and its body (then-branch) put directly into the main function. It makes no sense to disallow this program just because it introduces an extra condition, ergo I claim this program has defined behavior.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: