Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's also super disappointing that in 2022 we're still manually marking 10- line functions used in one place and called from one site as inline.

The `inline` keyword usually isn't necessary anymore to get the compiler to inline stuff, the compiler has some pretty good heuristics for determining whether to inline a function call or not. The `inline` keyword is an extra hint to the compiler, and affects linkage (i.e multiple object files defining the same `inline` function does not result in a linker error).

I don't understand what you're disappointed at, this all seems very reasonable.



I never said they used the inline keyword, you inferred that. If you look at the diff, they mark the fuctions as `pg_attribute_always_inline` which expands to __forceinline or __attribute__((always_inline)) inline depending on your compiler. These specifiers will remove any of the compilers heurestics [0].

> I don't understand what you're disappointed at, this all seems very reasonable.

I'm disappointed by the fact that we're still manually overriding compiler heuristics and manually writing faux copy-pasted generics for performance, when this is a solved problem in so many other languages.

EDIT: I would _love_ to see some benchmarks without the pg_attribute_always_inline to see whether forcibly inlining the code is even necessary. In my (extended) experience with optimising C++ code, leaning on __forceinline is unnecessary in the vast vast majority of cases and should be used very sparingly.

[0] https://docs.microsoft.com/en-us/cpp/cpp/inline-functions-cp...


If the profiler tells you to use force_inline, you use force_inline.


Given this is a blog post explaining their optimisations, you'd expect them to mention profiling with/without this attribute if they did it. Looking at the email thread, they share profiling results of before/after but there's nothing justifying the use of __forceinline in the thread or the blog post as far as I can see.


At least they measured, so that's immediately at least a B+. It is of course often hard to know whether anybody actually measured before doing their work or only afterwards to justify it but we should give benefit of the doubt.

Not everything you tried is going to be worth explaining, and I get that their whole plan here is "forcibly inline this" so in C, where the compiler won't necessarily get your memo, it seems reasonable to explicitly say so.


FWIW I just removed pg_attribute_always_inline from those three comparator functions in tuplesort.c and it made no difference to the generated code under GCC 10 at -O2. If we can show that's true for Clang and MSVC too, maybe we should just take them out. Edit: But I don't really mind them myself, they do document the intention of the programmer in instantiating all these specialisations...


> FWIW I just removed pg_attribute_always_inline from those three comparator functions in tuplesort.c and it made no difference to the generated code under GCC 10 at -O2.

I experimented with this on various projects and found the same even under lower optimisation levels. I'm definitely in the camp of "profile profile profile", and the guideline that I use is "if it has internal linkage, the compiler will figure it out. If it has external linkage, then I'm profiling".

> they do document the intention of the programmer in instantiating all these specialisations...

Is the intent of the programmer to write them inline, or for them to be inlined for performance? marking something as __forceinline does the former, but implies the latter, when it's not necessarily true to say inline == faster in every case!


> The `inline` keyword usually isn't necessary anymore to get the compiler to inline stuff, the compiler has some pretty good heuristics for determining whether to inline a function call or not.

Not really. The heuristics are pretty much based on a rough idea of function size (e.g. if it believes the call overhead is larger than just inlining the function, it will generally always be done if optimization is on), and/or functions it can prove is used only once (e.g. because they are static). That's it. There's no good idea of e.g. “this looks like a tight loop, and if I inline, I can maybe special-case away this cruft here”. The programmer's hints are very much relevant even in 2022.


They’re not. You are either using PGO and getting effective inline or you are just guessing.


Or, just maybe, I am right.

https://gcc.godbolt.org/z/qhPa1xxdq




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: