Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, most casts via pointers are formally UB in both C and C++, and you can induce weird behavior by compilers if you transmit casted pointers through a function boundary (see [0] for a standard example: notice how the write to *pf vanishes into thin air). Since people like doing it in practice, GCC and Clang have an -fno-strict-aliasing flag to disable this optimization, and the MSVC compiler doesn't use it in the first place (except for restrict pointers in C). They don't go too far with it regardless, since lots of code using the POSIX socket API casts around struct pointers as a matter of course.

Apart from memcpy(), the 'allowed' methods include unions in C (writing to one member and reading from another), and bit_cast<T>() and std::start_lifetime_as<T>() in C++.

[0] https://godbolt.org/z/dxMMfazoq



There are two additional ways of making this work.

The first is to allocate the memory using char a[sizeof(float)]. In C, char pointers may alias anything, so then you can do pointer conversions that would normally be undefined behavior and it should work. The other option is to use the non-standard __attribute__((__may_alias__)) on the pointer.

By the way, using union types for this is technically undefined behavior in the C and C++ standards, but GCC and Clang decided to make it defined as an implementation choice. Other compilers might not.


The union trick is actually defined in C.

And note that while char can alias anything, the reverse is not true: i.e. you can't generally cast a char array to anything else and expect sensible behaviour. There are ways to make this work (placement new in C++ for example), but it is not a way to escape TBAA: if you store a float in char array you can't then cast it to int with impunity.


To be more precise, it is defined since c99[0]. In c89 it was undefined, but type punning is the most used/sensible behaviour, so they changed it in c99.

[0]: https://en.cppreference.com/w/c/language/union


That is a common misconception. DR 283 is a suggestion for an amendment that was filed 3 years after C99 was published:

https://open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

It is not part of C99. It also is not part of the C standard since no subsequent C standard adopted it according to the GCC developers:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13

A read of the C11 standard draft, which would have this amendment if it were accepted by the C standards committee, shows that this has not been added:

https://open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Type punning via union types is therefore undefined behavior unless your compiler implements an extension to define it like GCC and Clang do.


Hum, from your wg14 link: 6.5.2.3 comma 3 and note 95. I thought that was the note that was added on TC3.

Also the note is non-normative, so it is only clarifying preexisting behaviour.

But I'm far from an expert on the C standard. Also that was the C11 draft, maybe the note was removed before the final standard.

Edit: I believe the alias rules are in 6.5 comma 7; specifically:

> An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

[...]

>an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),

Edit2: neither commas nor the note have changed in the 202y Draft.


You need more language in order to say that type punning is allowed. Implicitly, only the type of the last write is permitted reads, and anything else is undefined behavior. At least, this is my understanding based on my own read and the guidance from the GCC developers.


From a cursory search I can't find any languages in the C standard that disallow reading not from the last written member.

I'm familiar with such language in the C++ standard.

Edit: On the contrary, this note 92:

> 92)If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This can possibly be a non-value representation.

Edit2: and that's specifically the text that was added by dr283. I think you might be confusing with a different DR (don't remember the number) that specifically asked if generalized type punning was possible as long as an union containing the aliased types was visible in the translation unit. I think that's still open although GCC definitely forbids it.


Where did you find that note? I do not see it in the C standard draft I linked.


It is in n3301, the last 202y draft. In the 201x draft it was note 95.


I see it now. I guess the GCC developers were wrong then.


I think it is a bit more complicated. The rule, together with the aliasing rule, if taken a face value, means you could do unrestricted aliasing as long as you cast to an union type on access. I believe that's the interpretation the GCC Devs reject as is makes TBAA ineffective.

Instead they interpret it narrowly to only allow punning through objects that are actual unions (as described in the GCC docs).

Unsurprisingly the standard is kind of a mess.


Oh that's interesting. I guess I should actually look at the standard instead of taking cppreference's word for it next time


Yes, it 2025, I thought that we could at least imply C99 when talking about plain C :).

I'm probably an optimist.


It does not matter. The C99 standard does not define this behavior:

https://news.ycombinator.com/item?id=42568271m


For C++, `bit_cast<uint32_t>(0.f)` should be Well Defined, right? I'm curious, in C, is union-casting float->uint32_t also Perfectly Legal And Well Defined?

(I am not a C or C++ expert.)


It should be well-defined with reinterpret_cast<uint32_t&> though.


No, reinterpret_cast doesn't change the type of the underlying object. The rule in [basic.lval]/11 (against accessing casted values) applies to all glvalues, whether they come from pointers or references.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: