Hacker News new | past | comments | ask | show | jobs | submit login

Not just a union, but the union definition needs to be in scope _and_ used such so that the compiler can see the possibility of the relationship between the two objects.

But a union doesn't magically make type-punning correct. This code is not correct:

  union {
    int d;
    long long lld;
  } u;

  u.d = 1;
  printf("%lld\n", u.lld);
  u.lld = 0;
  printf("%lld\n", u.lld);
The union ensures that the compiler doesn't move "u.lld = 0" above the first print statement, but usually writing from one type and reading from another is undefined behavior no matter how you accomplish it. That's because the representations can be different, and one or the other might have invalid representations. The biggest exception is reading through a char pointer; reading representation bits through a char pointer is guaranteed to always be okay.

Aliasing and type punning are two different issues that are only tangentially related in terms of language semantics. But the issues do often coincide, especially in poorly written code.

You can also put the compiler on notice not to apply the strict aliasing rule by using simple type coercion (implicit or explicit) in the relevant statements. What matters is that we put the compiler on notice that two objects of [seemingly] different types are related and thus have an ordering relationship, and the standard provides a few ways to do that.

For example, this code is wrong:

  struct foo {
    int i;
  };

  struct bar {
    int i;
  };

  void baz(struct foo *foo, struct bar *bar) {
    foo->i = 0;
    bar->i++:
  }

  struct foo foo;
  baz(&foo, (struct bar *)&foo);
whereas all of

  void baz(struct foo *foo, struct bar *bar) {
    foo->i = 0;
    (((struct foo *)bar)->i)++;
  }
and

  void baz(struct foo *foo, struct bar *bar) {
    union {
      struct foo foo;
      struct bar bar;
    } *foo_u = (void *)foo, *bar_u = (void *)bar;    
    foo_u->foo.i = 0;
    bar_u->bar.i++;
  }
and

  void baz(struct foo *foo, struct bar *bar) {
      *(int *)foo = 0;
      (*(int *)bar)++;
  }
are correct. This should be correct, too, I think

  void baz(struct foo *foo, struct bar *bar) {
      *(int *)&foo->i = 0;
      (*(int *)&bar->i)++;
  }
and is also a weird case where the superfluous cast is necessary.

The purpose in all 4 cases is to make it evident viz-a-viz C's typing system that two objects might alias each other, and they do that by using constructs that put those objects into the same universe of alias-able types.

The conspicuous description of the union method in the C standard is more directed, I think, at compiler writers. It's not the only way to alias correctly (explicit casting to the basic type is enough), but often times it's the most natural when dealing with polymorphic compound objects.

Compiler writers historically didn't always implement enough smarts in their compiler to be able to detect possible aliasing through unions, and that needed to be addressed by a more thorough specification of union behavior. That is, the standard needed to make it clear that a compiler was required grok the relationship of two sub-objects (of the same basic type) that were derived from the same root union type.

Explicitly type-casting through a union just for aliasing is a little stilted, though, when you can achieve the same thing using a cast through a basic type. The union method is preferable, but only in so far as it's used to _avoid_ or to _minimize_ type coercion. And it'll never solve type punning issues.




> The union ensures that the compiler doesn't move "u.lld = 0" above the first print statement, but usually writing from one type and reading from another is undefined behavior no matter how you accomplish it.

I know, but the only reason aliasing becomes an issue is because someone is trying to cast between unrelated pointer types to perform cheap type conversions. Yes, even with the union the behavior is undefined, but if you know the platform you're targeting the program may be well-behaved.

As for your snippets, yes, casting pointers across function boundaries will work. The problem is when you don't want to introduce a call, which is where unions come in.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: