Does it inline?

thedanbob · on Dec 18, 2022

I wrote my own firmware for some IoT devices as a personal project. At first I tried to be careful about optimizing all the things, but when I changed course and rewrote parts to be more readable instead it compiled to exactly the same binary. So now I just leave optimizations to the compiler. (Nothing I’m writing is so performance critical that I need to manually optimize further, like I’m sure is necessary for other situations.)

atoav · on Dec 18, 2022

I would say this is good advice. First just worry about readability and data flow/storage.

Once everything where it should be think about how to process it faster (and you can do ao by swapping out the contents of functional blocks).

tialaramex · on Dec 18, 2022

Yes. Also, measure.

If your thing is too big/ slow/ ugly/ stupid, you need to be able to quantify that, so that you can tell whether you really improved it.

If you aren't measuring, any "optimisation" is just wanking, whether you're wanking before the product works or after. If it's many orders of magnitude different the "measurement" may be pretty casual. I had a script which used to take two weeks, now it runs overnight, I can't give you exact timings, however "overnight" versus "two weeks" is a measurement. But often improvements are smaller, which means your measurements need to be more careful.

Measuring also helps focus on actual goals. I'm pretty sure that script could be improved to take under an hour. But, I'm also certain nobody cares whether it takes one hour or ten, so, not a priority. Whereas when it took weeks that was causing problems.

roeles · on Dec 18, 2022

I completely agree with you. For work I'm writing a performance critical application, and using gprof/valgrind for profiling has been very useful. Not only for determining how time is spent, but also where it is _not_ spent. Some pieces of code can be implemented in a fast and stupid way, without impacting overall performance.

That said, inlining is precisely an area where I find profiling to be difficult. It's difficult to see where in a function time is spent if a lot of function-calls in that function are inlined. However, I'm not certain that the every part of my code is equally slower/faster depending on what optimizations I turn on/off in the compiler. Thus, reducing optimizations might give me the wrong impression about which functions take up the most runtime.

astrange · on Dec 18, 2022

Many of these cases are things the compiler doesn’t consider inlining even if the author does.

Deleting a call that always returns 0 is a different optimization called “interprocedural constant propagation”.

nmilo · on Dec 18, 2022

It doesn't really matter what the compiler people classify as different kinds of optimizations. If the function call is gone, it's inlined. From a user standpoint I don't really care if it's not technically considered an "inline" by the gcc team.

astrange · on Dec 18, 2022

There’s other differences - inlining often has a tradeoff where it increases program sizes or compile time but may make things faster. The other ones replace the call but don’t do that, so might be considered safer.

mhh__ · on Dec 18, 2022

IPA is quite fiddly, an older or more eager compiler could remove the call by inlining then doing interprocedural optimization

DennisL123 · on Dec 18, 2022

The inline keyword doesn’t do what most people think it does. Especially true with every compiler upgrade or platform change. One worse engine once summarized its user cases as twofold: first, to hide visibility when linking, second, to judge the engineering org’s maturity.

secondcoming · on Dec 18, 2022

In C++ inline is sometimes needed to prevent ODR violations.

levodelellis · on Dec 18, 2022

That's exactly why I included it. People automatically think inlining is my intention. In reality I avoid templates and not all of my code is in a class. It's short enough to be a candidate for inlining so I put it in a header.

I was hoping more comments would discuss this

Joel_Mckay · on Dec 18, 2022

Most modern compilers quietly inline and unroll small functions already.

Probably a wise choice, as people are usually taught to hit the Allocator like a Piñata full of candy. =)

jesse__ · on Dec 18, 2022

So, I guess the lesson here is don't trust the compiler to inline something.. verify?

levodelellis · on Dec 18, 2022

I'm the author, although that's 100% true I was going for "our intuition of optimization is bad". Before writing a word I knew that compilers don't inline code that's not in the compile unit (unless `-flto` is specified and it's not guaranteed to work). Imagine my surprise when clang optimized away the call to itoa/strol. I believe it's a heuristic in clang because clang doesn't optimize away 6A

dataflow · on Dec 18, 2022

To be honest, I feel like your blog post was moreso entertainment (and a good one) than actual support for your thesis. While not every examples was bad, you picked some incredible edge cases for many of them that are at best rare in reality (fibonacci, dynamic_cast immediately following construction), and at worst nonsensical (always_inline with noinline?!) and then deduced people have bad intuition if they couldn't guess what optimizations occurred here. That's not really an accurate assessment of people's intuitions - for example, I'm around ~100% sure that hardly anyone who claims "I have good intuition for inlining" has the answer to "what happens if I combine always_inline to noinline" in their head when they say that - nor should they! Heck, I already forget what that does on each compiler even after reading your post. If anything, someone with a good intuition would know that they should check the compiler output for something like that.

What you'd need to do for assessing intuition is finding some actual common/realistic cases, then seeing how accurately people perform on those - things that are useful, and which people might actually claim to have intuition about! Moreover, also note that someone with "good intuition" for inlining wouldn't necessarily know (or need to care) what happens at every call site - sometimes it doesn't really matter what specific level the inlining stops at, as long as most of the call chain is inlined, and that's what intuition often gets you.

jesse__ · on Dec 18, 2022

> at worst nonsensical (always_inline with noinline?!)

I think this is actually great support for their thesis. If you came across actual code that looked like that (and god knows it probably exists if it's legal), who fucking knows what the compiler would do?

My question is: why is it even legal to mark functions with both of these, in addition to the `inline` keyword? My guess is the C++ spec says something along the lines of "you can mark functions with as many attributes as you want .. something .. something .." ie compiler vendors get no choice but to (at best) emit a warning.

Joker_vD · on Dec 18, 2022

> god knows it probably exists if it's legal

It most certainly exists, even if it's actually illegal. I mean, come on, how many are there C codebases with more than 10k lines that have absolutely zero UB? I've seen several C programmers who, when "wrestling the compiler" to emit some certain kind of pattern in the resulting assembly, would do bloody anything without much regard whether what they wrote was legal, or implementation-specific, or UB. "It compiles to what I want with the compiler we use today, that's all that matters. We'll probably change it in the future if it breaks".

gpderetta · on Dec 18, 2022

the no_inline attribute is non-standard, so the C++ standard has nothing say about it.

In C++ marking an inline function no_inline is actually meaningful: the inline specifier has little to do with actually performing the inlining optimization and for the most part it means that the ODR rules are weakened (in practice enabling vague linkage). The no_inline attribute instead prevents the optimization from occurring. So an inline no_inline function is a function with vague linkage that should not be inlined.

As it allows the function to be defined in an header file and usually a function body need to be available in a translation unit, functions that should be inlined are often marked as inline (which also acts as a weak hint to the optimizer), but it is neither necessary nor sufficient. And the naming is of course for the most part historical.

jesse__ · on Dec 18, 2022

This is great context, thanks for the reply.

levodelellis · on Dec 18, 2022

Yes it's mostly for entertainment. I didn't want to stress anyone out with this. Round 2 was specifically to 1) Show people not to take this seriously 2) To serve as an example that some modifications doesn't mean earlier examples do/don't inline. So they wouldn't overthink.

Outside of the first two rounds most of these were inspired by real code but simplified for reading. I think it'd be less entertaining if it was more serious and more had realistic code. A lot more people would tune out. If it was any longer I think it would have made more sense to explain why something was optimized or not. I don't work on clang or gcc so I may not be a good person to write that kind of article

> someone with "good intuition" for inlining wouldn't necessarily know (or need to care) what happens at every call site

Round 5 (the virtual function/dynamic cast round) was inspired by a person who claimed to have 20 years of experience. He suggested a way I could implement a feature in my compiler. I eventually wrote a test case to see if compilers would 'devirtualize' function calls as he claimed. They didn't. From memory he works on making servers preform so he wasn't a stranger to performance. I think "good intuition" is more about knowing what won't inline and having some tactics you can use in the first 5 minutes after looking at a flamegraph

dataflow · on Dec 18, 2022

I'm not a compiler expert, but just a note regarding your last point, devirtualization of a function call and dynamic_cast are different beasts; in fact I've never heard of an optimization of the latter as being referred to as devirtualization. (Though perhaps this is just me?)

From what I've seen, dynamic_cast is implemented as an opaque external function call (and quite a complicated one), so the compiler would need to explicitly make assumptions about its behavior before it can optimize it away. (Which it certainly could, but that's extra work for the compiler writer that would need to be worth the payoff, which in this case I imagine is probably debatable.) Virtual functions, on the other hand, don't involve an opaque call for mere target resolution, and their vtables have definitions available at compile time, so they're much more tame. So expecting devirtualization to come with dynamic_cast getting optimized away seems a bit of a non-sequitur IMO.

okanat · on Dec 18, 2022

Yes and no. While verifying you should benchmark too. Maybe the inline option is slower! Don't try to outclever or force compiler to do weird things and be knowledgeable about limitations due to compilation steps and the ABI limitations is the lesson.

Most of the time using contexpr and defining inlineable functions in headers will get you the most optimal result. In the most of the remaining part the compiler is again right to not inline things because it probably knows more about the instruction caches etc than you. The remaining tiny part is basically PhD or very expert level experience which makes it your job to know more than the compiler because it is your job to write the compiler or discover its shortcomings.

loeg · on Dec 18, 2022

That's a pretty reasonable takeaway. Measure your hot loops and ideally incorporate a benchmark into your CI (to avoid regression).

spaintech · on Dec 18, 2022

I was reading this on my ipad, and it was a pain not to be able to see the output, would be great if the author just created a link to each of the section to godlbolt… https://godbolt.org/z/5cosP46TT >:D I know, how lazy we all are… my sample was just the fist section, I modded the code so I can see it all at once…

pyrolistical · on Dec 18, 2022

This is why I like zig’s comptime concept. It’s essentially the same concept except it’s explicit and you are in control

chrisaycock · on Dec 18, 2022

The two terms refer to very different phenomena. Consider this incrementing function:

  int f(int x) {return x + 1;}

  int main() {
    int y = f(3);  // this will change
    return 0;
  }

inline means to substitute the body of the function into the call site:

  int y = 3 + 1;

comptime means to substitute the results of the function call into the call site:

  int y = 4;

An inline function only requires that the compiler decide the function body is "small enough" and not recursive. The compiler simply rewrites the AST before codegen.

But comptime evaluation means that the compiler has actually run the function ahead of time. (This might be done via a builtin virtual machine, for example.) The requirements are that the function is "pure" (no side effects) and that the inputs can also be resolved ahead of time.

A recursive function could be comptime but not inline, whereas a function with IO could be inline but not comptime. The above incrementing function just happens to be both inline and comptime.

watersb · on Dec 18, 2022

Thanks for the examples.

I gather that zig "comptime" is somewhat like C++11 "constexpr".

https://www.hackingnote.com/en/cpp/const-vs-constexpr/index....

JonChesterfield · on Dec 18, 2022

Minor point, but inlining recursive functions is fine and is done. It's analogous to loop peeling. Inline it once, see if things got better, then decide whether to go again.

varajelle · on Dec 18, 2022

'comptime' only works if all arguments are known at compile time and the function can be evaluated at compile time. This is not the same as inlining, which puts the body of the function in the caller and remove the call. (And then do further optimisations such as const propagation)

nattaylor · on Dec 18, 2022

Hadn't heard of Bolin. It sounds more approachable than C

necubi · on Dec 18, 2022

Not sure how it's advanced in the past several months, but this thread [0] from their original announcement does not exactly inspire confidence.

[0] https://news.ycombinator.com/item?id=32460070

levodelellis · on Dec 18, 2022

We'd implemented a few things that aren't useful on its own. For example we haven't implemented opaque types. However now that inlining C code works opaque types are at the top of our list. With opaque types and inlining we'll be able to specify simd types and use simd instructions without leaving the function.

The next update should be large

eatonphil · on Dec 18, 2022

> Is this memory safe? How do you handle lifetimes?

> Not to be confused with automatic memory which completely works, memory safety isn't fully implemented. We use invalidation to say all references that came from that object are no longer valid. This is checked at compile time. Not everything has been implemented so there are holes and we haven't chosen alias rules which might be a simple not allowed.

They've certainly made some interesting decisions.

https://bolinlang.com/faq