> Note also that C doesn't have return-value-optimization, hence all your struct-returning functions will cause a call to memcpy (won't happen when compiled in C++ mode of course).
What ?
RVO is precisely needed because a copy in C++ can run arbitrary code and so is not as easy to ellide as a memcpy. RVO is basically a promess you make to have your copy constructor & destructor be semantically harmless compared to a memcpy.
That was exactly my thought as well, but the examples seems to show otherwise (at least on gcc and clang[1]).
The compilers are using basically the same underling optimizer and back-end with different front-ends, and since in C there are no "user-defined constructors" and no destructors, one would expect that you don't need any special RVO rule in C: the compiler can simply observe that a local object is returned and construct it in-place as necessary.
Thinking about this example, this may not be the case: distinct objects have to have distinct addresses, right? So in C you might not be able to make this optimization since the do_stuff_to_foo method (a black box to the compiler) could save its argument, and the caller of blah() could see that the argument it passed has the same address as the local f object in blah, a violation of "distinct objects, distinct addresses".
C++ has a the RVO escape hatch for this: it is expected that some objects that appear distinct in the source may not actually be distinct if they fit the RVO (or NVRO) pattern - but C does not. So perhaps gcc and clang are doing the right there here.
---
[1] All numbered versions of clang up to 6.0 seem to behave the way indicated in the GP post, but trunk in godbolt, which shows version as 7.0.0 (trunk 333657) compiles C efficiently like C++.
very good point on the "addresses compare == iff same object" rule.
In that case though, I think clang is right to optimize the callee (but it does introduce a problem in the caller) :
the only place you could do the equality check and observe the rule being broken is before the callee returns since the lifetime of its variable is bound to the call.
It seems that clang will not let the return pointer alias a local in the caller except when the call is the initialization of said local.
So if the caller goes :
foo x; leak(&x); x = returns_foo();
the memory will be temporary stack (and then memcpy), thus upholding the rule. (and it seems to me that this inefficiency is really required to respect the standard if we actually leak the pointer)
in the case :
foo x = returns_foo();
clang will pass the actual address of x down but that's before the object exists (and its address cannot be known yet) so the rule is still fine.
I stand corrected though, this does mean that RVO would be useful for C as well, as a way to relax the aliasing rule.
edit: nevermind that, in the first case it's perfectly legal to read/write foo through the pointer downstream so you cannot make the optimization anyway.
Yes, the same thought occurred to me (that perhaps clang is careful in the caller in the case the address escapes), but I seemed to find cases where clang optimizes the caller also, so that two distinct objects receive the same pointer and both pointers escape.
the code changes and distinct objects are passed. I'm not sure if the first form (all in the definition) has a relevant difference per the standard that lets clang do this.
Thanks. Unless something is escaping me, that's an optimizer bug. I'm pretty sure the ABI allows you to do whatever you want with the sret pointer, including passing it to another function to chain return for free.
I guess you could say that RVO is more robust since it's implemented in the frontend and does not rely on finding the optimization after a fair amount of lowering.
I'd wager the optimisations you're talking about happen just fine if blah is a static function, where the compiler can assume nothing from the outside will call this function, so calling conventions can be broken at will.
Seeing as blah isn't a static function, I think the calling convention for C that g++ uses somehow dictates that a memcpy is to be used in this case.
Note that I haven't acually tried this, so no guarantees.
I don't claim to have any answers, but I found all of this interesting and surprising. I wondered about a couple of things. What happens if the do_stuff_to_foo is actually defined (and what happens in that actual function)? And is there a difference between value semantics and pointer/reference semantics?
These questions were my take-away from one of Chandler Carruth's C++ compiler optimization talks, I think it was this talk.
https://youtu.be/eR34r7HOU14
My take aways were that the optimizer gets a huge chunk of its performance by inlining. And with value semantics, the optimizer can "cheat like crazy".
So I defined two different variants for do_stuff_to_foo
Original Pointer/Reference Semantics:
void do_stuff_to_foo(foo* a)
{
a->x++;
}
Value Semantics:
foo do_stuff_to_foo(foo a)
{
a.x++;
return a;
}
In both cases, the compiler emits effectively the same output for C and C++ (I only tested clang.) (The main difference was name mangling. I omit stuff for brevity.)
Having a module boundary in a place where function call overhead is going to be signficant is a code smell. C++ and Rust programmers just don't notice because the entire standard library reeks of it.
What ?
RVO is precisely needed because a copy in C++ can run arbitrary code and so is not as easy to ellide as a memcpy. RVO is basically a promess you make to have your copy constructor & destructor be semantically harmless compared to a memcpy.
There is no need for RVO in C.