Hacker News new | past | comments | ask | show | jobs | submit login

Thanks. Unless something is escaping me, that's an optimizer bug. I'm pretty sure the ABI allows you to do whatever you want with the sret pointer, including passing it to another function to chain return for free.

I guess you could say that RVO is more robust since it's implemented in the frontend and does not rely on finding the optimization after a fair amount of lowering.

Edit: I don't have time to debug this but at least for llvm, I think this optimization should trigger in the optimizer's memcpy elimination pass ( https://github.com/llvm-mirror/llvm/blob/0818e789cb58fbf6b5e... ).

However I don't see why clang could not simply apply the RVO logic to C code as well.




I'd wager the optimisations you're talking about happen just fine if blah is a static function, where the compiler can assume nothing from the outside will call this function, so calling conventions can be broken at will.

Seeing as blah isn't a static function, I think the calling convention for C that g++ uses somehow dictates that a memcpy is to be used in this case.

Note that I haven't acually tried this, so no guarantees.


I don't claim to have any answers, but I found all of this interesting and surprising. I wondered about a couple of things. What happens if the do_stuff_to_foo is actually defined (and what happens in that actual function)? And is there a difference between value semantics and pointer/reference semantics?

These questions were my take-away from one of Chandler Carruth's C++ compiler optimization talks, I think it was this talk. https://youtu.be/eR34r7HOU14

My take aways were that the optimizer gets a huge chunk of its performance by inlining. And with value semantics, the optimizer can "cheat like crazy".

So I defined two different variants for do_stuff_to_foo

Original Pointer/Reference Semantics:

    void do_stuff_to_foo(foo* a)
    {
        a->x++;
    }
Value Semantics:

    foo do_stuff_to_foo(foo a)
    {
        a.x++;
        return a;
    }

In both cases, the compiler emits effectively the same output for C and C++ (I only tested clang.) (The main difference was name mangling. I omit stuff for brevity.)

Pointer/Reference Semantics:

    Lcfi2:
        .cfi_def_cfa_register %rbp
        incl	(%rdi)
        popq	%rbp
        retq
        .cfi_endproc
                                        ## -- End function
        .globl	_blah                   ## -- Begin function blah
        .p2align	4, 0x90
    _blah:                                  ## @blah
        .cfi_startproc
    ## BB#0:
        pushq	%rbp
    Lcfi3:
        .cfi_def_cfa_offset 16
    Lcfi4:
        .cfi_offset %rbp, -16
        movq	%rsp, %rbp
    Lcfi5:
        .cfi_def_cfa_register %rbp
        movq	%rdi, %rax
        popq	%rbp
        retq
        .cfi_endproc


Value Semantics:

    Lcfi3:
        .cfi_offset %rbx, -24
        movq	%rdi, %rbx
        incl	16(%rbp)
        leaq	16(%rbp), %rsi
        movl	$1036, %edx             ## imm = 0x40C
        callq	_memcpy
        movq	%rbx, %rax
        addq	$8, %rsp
        popq	%rbx
        popq	%rbp
        retq
        .cfi_endproc
What I find interesting here is that in both the C and C++ , the memcpy now appears. And in the C case, there is still only one memcpy, not two.

So as I said, I don't have any answers and really don't know what the take away is. But RVO no longer seems to be a factor in these variants.


Having a module boundary in a place where function call overhead is going to be signficant is a code smell. C++ and Rust programmers just don't notice because the entire standard library reeks of it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: