> Consequently, passing a function pointer (or struct of function pointers), whe...

thrtythreeforty · on June 1, 2022

I still don't quite understand how the technique saves code size versus C++ templates, although I am quite eager to understand it since template instantiation is a thorn in my side when writing code for a microcontroller with 32KB of code space.

Is the idea that the struct contains pointers to always_inline functions, and that the compiler has visibility into this at the top of the always_inline call stack, so the always_inline gets monomorphized even through multiple layers of pointer indirection? If I'm understanding it right, that's a pretty nice technique. Would this allow you to mark a "class" (or individual members of that class) as "hot, please monomorphize" by making it always_inline, or "cold, please use regular pointers" by omitting always_inline?

C++ templates would force you to monomorphize every variant, or use vtables (function pointers) for every variant, without the ability to choose the approach for each subclass. The compiler is also spectacularly bad at recognizing deduplication opportunities between template specializations.

pizlonator · on June 1, 2022

Say you have:

    template<typename T>
    void foo(T& value) { value += 42; }

There's no easy way to tell the C++ compiler, "please make only one version of foo and use callbacks to figure out what it means to += on T".

In libpas, this would be done by first creating:

    always_inline void foo(void* value, void (*add_int)(void* value, int addend))
    {
        add_int(value, 42);
    }

And then you could also create:

    never_inline void foo_outline(void* value, void (*add_int)(void* value, int addend)) { foo(value, add_int); }

Now, if you want monomorphization like C++, call foo() and pass a literal for add_int. But if you don't, then call foo_outline.

Say you have 1000 callsites to foo(), they all use different types, and 5 of those callsites are hot while the other ones are hardly ever called (maybe they only run in unusual-but-necessary situations). So, the 5 hot callsites can call foo(), and the remaining 995 cold ones cal call foo_outline().

thrtythreeforty · on June 1, 2022

> There's no easy way to tell the C++ compiler, "please make only one version of foo and use callbacks to figure out what it means to += on T".

Partial counterpoint: you could use class polymorphism, if T is always a type you control. But you're right in general; C++ doesn't have typeclasses or some other way to create an ad-hoc vtable for, say, int.

> Now, if you want monomorphization like C++, call foo() and pass a literal for add_int. But if you don't, then call foo_outline.

Does this work through structs of function pointers? Is that the reason it's so powerful?

For example, a my_class constructor create_my_class makes the class point to foo_outline, unless it's known to be a hot type, then it points to foo. When you call create_my_class, would the compiler see the values of the pointers, and start inlining foo calls, including if you pass the my_class struct into foo as "this", continuing the inlining?

pizlonator · on June 1, 2022

Yes, it works with structs of function pointers. And it works recursively.

This works (this is slightly shorthand C, fill in the blanks yourself):

    struct config {
        void (*foo)(things bar);
        stuff (*bar)(int baz);
    };
    always_inline stuff doit(config c)
    {
        c.foo(whatever);
        return c.bar(42);
    }
    always_inline void my_foo(...) { ... }
    always_inline stuff my_bar(...) { ... }
    stuff dostuff(void)
    {
        config c;
        c.foo = my_foo; 
        c.bar = my_bar;
        return doit(c);
    }

This'll all get inlined and specialized to death.

thrtythreeforty · on June 1, 2022

Got it. That is definitely a cool insight. Thanks for sharing!

verdagon · on June 1, 2022

I think that's right.

C++ binaries tend to be pretty large because they use templates/monomorphization a lot, even when there aren't really meaningful performance gains. For example, it often doesn't make sense to have 20 instantiations of std::vector, when only two of them are used in the critical path. Rust can be even further in this unfortunate direction because of the borrow checker's restrictions around polymorphism and abstraction.

In comparison, C, Java, and Javascript tend to rely on dynamic dispatch more often (not that C has any built-in dynamic dispatch features, but in practice, C programs tend to use function pointers manually).

A few languages like C# have a really nice balance, and monomorphize in the JIT at run-time.

With this technique they mention, C is the only mainstream language I know of that lets you control it in a decoupled way, which is really cool.

RcouF1uZ4gsC · on June 1, 2022

> Would this allow you to mark a "class" (or individual members of that class) as "hot, please monomorphize" by making it always_inline, or "cold, please use regular pointers" by omitting always_inline?

From my understanding of the technique, this is correct.

verdagon · on June 1, 2022

IIRC, the branch predictor can work with function pointers as well. It can sometimes mitigate problems like this, though not as completely as just monomorphizing away the run-time branching/calling like they did.