Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Consequently, passing a function pointer (or struct of function pointers), where the pointer points to an always_inline function and the callee is always_inline results in specialization akin to template monomorphization. This works to any depth; the compiler won't be satisfied until there are no more always_inline function calls. This fortuitous development in compilers allowed me to write very nice template code in C. Libpas achieves templates in C using config structs that contain function pointers -- sometimes to always_inline functions (when we want specialization and inlining) and sometimes to out-of-line functions (when we want specialization but not inlining). Additionally, the C template style allows us to have true polymorphic functions. Lots of libpas slow paths are huge and not at all hot. We don't want that code specialized for every config. Luckily, this works just fine in C templates -- those polymorphic functions just pass around a pointer to the config they are using, and dynamically load and call things in that config, almost exactly the same way that the specialized code would do. This saves a lot of code size versus C++ templates.

I thought this was super interesting. Seems like a very nice technique for when you use function pointers in C for stuff like sort comparison. There was an earlier thread about PostgreSQL that discussed how they created multiple sort functions to avoid the overhead of a indirect function pointer call when sorting things like integers where the function call dwarfs the comparison cost. This seems like a technique that could have been used to avoid duplicating the sorting functions while still getting the benefits.



I still don't quite understand how the technique saves code size versus C++ templates, although I am quite eager to understand it since template instantiation is a thorn in my side when writing code for a microcontroller with 32KB of code space.

Is the idea that the struct contains pointers to always_inline functions, and that the compiler has visibility into this at the top of the always_inline call stack, so the always_inline gets monomorphized even through multiple layers of pointer indirection? If I'm understanding it right, that's a pretty nice technique. Would this allow you to mark a "class" (or individual members of that class) as "hot, please monomorphize" by making it always_inline, or "cold, please use regular pointers" by omitting always_inline?

C++ templates would force you to monomorphize every variant, or use vtables (function pointers) for every variant, without the ability to choose the approach for each subclass. The compiler is also spectacularly bad at recognizing deduplication opportunities between template specializations.


Say you have:

    template<typename T>
    void foo(T& value) { value += 42; }
There's no easy way to tell the C++ compiler, "please make only one version of foo and use callbacks to figure out what it means to += on T".

In libpas, this would be done by first creating:

    always_inline void foo(void* value, void (*add_int)(void* value, int addend))
    {
        add_int(value, 42);
    }
And then you could also create:

    never_inline void foo_outline(void* value, void (*add_int)(void* value, int addend)) { foo(value, add_int); }
Now, if you want monomorphization like C++, call foo() and pass a literal for add_int. But if you don't, then call foo_outline.

Say you have 1000 callsites to foo(), they all use different types, and 5 of those callsites are hot while the other ones are hardly ever called (maybe they only run in unusual-but-necessary situations). So, the 5 hot callsites can call foo(), and the remaining 995 cold ones cal call foo_outline().


> There's no easy way to tell the C++ compiler, "please make only one version of foo and use callbacks to figure out what it means to += on T".

Partial counterpoint: you could use class polymorphism, if T is always a type you control. But you're right in general; C++ doesn't have typeclasses or some other way to create an ad-hoc vtable for, say, int.

> Now, if you want monomorphization like C++, call foo() and pass a literal for add_int. But if you don't, then call foo_outline.

Does this work through structs of function pointers? Is that the reason it's so powerful?

For example, a my_class constructor create_my_class makes the class point to foo_outline, unless it's known to be a hot type, then it points to foo. When you call create_my_class, would the compiler see the values of the pointers, and start inlining foo calls, including if you pass the my_class struct into foo as "this", continuing the inlining?


Yes, it works with structs of function pointers. And it works recursively.

This works (this is slightly shorthand C, fill in the blanks yourself):

    struct config {
        void (*foo)(things bar);
        stuff (*bar)(int baz);
    };
    always_inline stuff doit(config c)
    {
        c.foo(whatever);
        return c.bar(42);
    }
    always_inline void my_foo(...) { ... }
    always_inline stuff my_bar(...) { ... }
    stuff dostuff(void)
    {
        config c;
        c.foo = my_foo; 
        c.bar = my_bar;
        return doit(c);
    }
This'll all get inlined and specialized to death.


Got it. That is definitely a cool insight. Thanks for sharing!


I think that's right.

C++ binaries tend to be pretty large because they use templates/monomorphization a lot, even when there aren't really meaningful performance gains. For example, it often doesn't make sense to have 20 instantiations of std::vector, when only two of them are used in the critical path. Rust can be even further in this unfortunate direction because of the borrow checker's restrictions around polymorphism and abstraction.

In comparison, C, Java, and Javascript tend to rely on dynamic dispatch more often (not that C has any built-in dynamic dispatch features, but in practice, C programs tend to use function pointers manually).

A few languages like C# have a really nice balance, and monomorphize in the JIT at run-time.

With this technique they mention, C is the only mainstream language I know of that lets you control it in a decoupled way, which is really cool.


> Would this allow you to mark a "class" (or individual members of that class) as "hot, please monomorphize" by making it always_inline, or "cold, please use regular pointers" by omitting always_inline?

From my understanding of the technique, this is correct.


IIRC, the branch predictor can work with function pointers as well. It can sometimes mitigate problems like this, though not as completely as just monomorphizing away the run-time branching/calling like they did.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: