Higher-order programming in C

caladri · on June 30, 2012

For the set of features you have, with most calling conventions on most platforms you don't need to synthesize all that. You could simply have your apply function be written in assembly and handle setting up the parameters fetching them from memory.

That also isn't higher-order programming as such. Can you create new functions from doing that? There are ways to compose functions in C by hand, but the distinction is that it cannot be done elegantly. (One could always just do higher-order programming in C by constructing an abstract syntax tree, after all.)

A good first effort in that you did correctly handle calling functions without requiring them to fetch their arguments in an unusual way, which is what most people would try at first. With a varargs apply function, you can handle the case of passing values larger than a register and passing pointers without having to cast them to a list, too. Consider the case of an apply function which takes an unsigned integer as the size of the following argument. If it's zero, then the argument list is finished. Otherwise, it then fetches an argument of that size from the va_list and goes on to the next one. So then your apply function is slightly uglier (int j = 42; apply(printf, sizeof (const char ), "Hello, %s! %u\n", sizeof (const char ), "world"), sizeof j, j, 0) but perhaps more flexible, and at least avoids relying on a few nasty implicit casts.

Also, function pointers should not necessarily be cast implicitly to void *, but on only a few architectures are function pointers so strange that you'll have any trouble with your approach.

As it is, you could even avoid writing or generating any assembly by just having the apply routine be a static inline function that does a switch on the argument count and then casts the function pointer to a function pointer that takes the right number of arguments. The compiler could do constant propagation for the number of arguments and you'd have little overhead in your function calls. You couldn't handle arbitrarily-large argument lists, but it would work pretty well.

Keep at it!

davvid · on June 30, 2012

For the set of features you have, with most calling conventions on most platforms you don't need to synthesize all that. You could simply have your apply function be written in assembly and handle setting up the parameters fetching them from memory.

Like this? https://github.com/davvid/libapply

unittest.c shows how it can be used: https://github.com/davvid/libapply/blob/master/unittest.c

huhtenberg · on June 30, 2012

  +-----------------------------------------------------------+
  |                                                           |
  |  C has support for dynamic typing (in the form of void*)  |
  |                                                           |
  +-----------------------------------------------------------+

A quotable thinker.

chrisohara · on June 30, 2012

Why are people so scared of void*? Are there any VM's that don't resort to some form of pointer tagging when implementing dynamic typing?

pmelendez · on June 30, 2012

I don't think people is scared of void*. However, it is pretty easy to introduce memory corruption bugs when it is used in big projects.

I find very wise to avoid it when it is not necessary or when there is an equivalent safer alternative (e.g C++ templates if available and applicable).

In this case however, is not only completely fine but I don't think there is a better way to do it.

BTW, my favourite phrase of the article:

"“… Why do this?” Because I can, of course."

The truly spirit of a hacker :)

huhtenberg · on June 30, 2012

Who are these "people" exactly?

Xcelerate · on June 30, 2012

Haha I can't stop cracking up! Although it reminds me of when I wrote a ray-tracer in high school. Every reflection distribution function took some kind of void* structure and did something unique with it.

bengl3rt · on June 30, 2012

Funny, mine was the exact same.

praptak · on June 30, 2012

Ok, so instead of "method not implemented" you get memory corruption. Big deal.

ogoffart · on June 30, 2012

I made it simpler (and slightly more portable ;-) ) :

    typedef void* (*f)();
    extern void* apply(void* func, void** args, size_t argc) {
        switch (argc) {
            case 0: return  ((f)(func))();
            case 1: return  ((f)(func))(args[0]);
            case 2: return  ((f)(func))(args[0], args[1]);
            case 3: return  ((f)(func))(args[0], args[1], args[2]);
            case 4: return  ((f)(func))(args[0], args[1], args[2], args[3]);
            case 5: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4]);
            case 6: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5]);
            case 7: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5], args[6]);
            case 8: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5], args[6], args[7]);
            case 9: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5], args[6], args[7], args[8]);
            /*...*/
        }
    }

ori_b · on June 30, 2012

If you ever want to do this in the real world, look at libffi: http://sourceware.org/libffi/

tspiteri · on June 30, 2012

I don't get the point. Why not use the generic function type?

    void *(*routine)(void *)

If you have a function already, just create a simple wrapper once. In the code shown, the wrapping has to be done by the caller of apply().

And this apply() doesn't work on all functions anyway. Casting an array of long into an array of void* makes quite a few assumptions and is very limited. What would you do if your original function takes a struct as an argument?

mappu · on June 30, 2012

I'd quite like to use bind() in C / nasm for my current project. I think you can use a similar approach - create an executable area on the heap that pushes an extra parameter and jmps to the target function, then just return a function pointer to it.

EDIT: there might be a problem with arranging the stack parameters - if you use call on the target, then you're pushing a return address onto the stack, which messes up the arguments. But if you simply jmp, then neither the caller or callee know exactly how to clean up the stack when the function returns... hmm. I guess you could permanently reserve a register for this shimming but that's hardly an elegant solution.

vedantk · on June 30, 2012

I think the callee expects the return address in %rbp, and by System V convention will pull certain arguments from above %ebp on the stack.

This might not help, but it's an interesting article on stack frames for all interested: http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-o...

mappu · on June 30, 2012

Looks like fastcall, or any convention on x86_64.

I think you can do bound functions if you do callee-cleanup, then your intermediate bound trampolines can just jmp and you don't end up with (too many) weird problems. But it means that the final return address ends up at the bottom of the stack, rather than at the top, like it would with a conventional push/call system.

In any case it looks like cdecl is out of the question.

Some further discussion in a slightly different context, if you're interested: http://stackoverflow.com/questions/11271848/implementing-bou...

forgotusername · on June 30, 2012

The ability to cast a function pointer to a regular pointer is an implementation behavior (supported on every machine you'd want to use, but nonetheless it's not standard C)

zvrba · on July 1, 2012

That's undefined behavior meaning that anything might happen, including appearing to work [most of the time]. C allows casts between two data pointers and between two function pointers, but not cast that cross the data/function boundary.

Any cast that crosses function/data boundary must go through suitably large integer type; UB results if you cast a pointer to a too-small integer type. See my other comment in this thread for a reference to the standard.

MichaelGG · on June 30, 2012

That's interesting to know. Is there no general data type that's guaranteed to hold a function pointer?

caladri · on June 30, 2012

Only function pointer types. An indirect function call on some architecture, for example, could require that when one makes an indirect call one sets up several registers in addition to jumping to a particular location in memory, and none of the registers that need to be set up can be derived from the jump target. So in practice your function pointers would need to actually be structures containing those register values and the jump target. Or you might need to pass around a tuple of target and calling convention, although the C standard limits the ways you could implement that (namely by requiring that function pointers cast to and from other function pointer types will compare as equal, and will still work when they are cast to the correct type.)

Daniel_Newby · on June 30, 2012

There are quite a few Harvard architecture microcontrollers and DSPs for which this is not true. (At least for some values of "want to use" ;-).

forgotusername · on June 30, 2012

I've never written code for these architectures, but my assumption was that a modern GCC will happily bodge this for you in software, so its probably only super ancient/exotic stuff for which the lack of ability to cast applies

Daniel_Newby · on June 30, 2012

Yeah, the compiler will probably force it to work.

On the other hand, some of the the popular PIC microcontrollers have an instruction pointer that is wider than the data registers. Good luck doing functional programming on that!

crb3 · on June 30, 2012

The blog's github interface is annoying: in order to save-to-disk more than just the opening paragraphs in Firefox, I had to bring up Web Developer's View Generated Source function and save the result of that. Not cool for anybody who prefers to study offline.

drucken · on June 30, 2012

Strange. I had zero trouble saving the blog to disk with Firefox on a Windows PC.

- it works with both an old generation of Firefox (3.x) and a post 3.x version.

- it correctly opens with the code, in both FF versions as well as IE.

- NoScript also correctly blocks and unblocks the code depending on what access you give to github.com and tumblr.com.

So, not sure what you are seeing, but I am unable to replicate it. As you can probably guess, I am just as sensitive to these saving glitches when they occur as you are too... :)

crb3 · on June 30, 2012

Firefox 3.5.6 on Mepis Linux 8.0 here; NoScript permissions open to github and tumbler otherwise I wouldn't see code onscreen fwiw, tnx for feedback

chc · on June 30, 2012

Firefox 3.5.6? Holy cow. It probably doesn't run well in NCSA Mosaic either. Is there no newer version you can run? I find at least 4 is necessary to avoid huge with the modern Web most of the time.

dbaupp · on June 30, 2012

Does something like instapaper help?

http://www.instapaper.com/text?u=http://vedantk.tumblr.com/p...

crb3 · on June 30, 2012

Apparently not; sorry.

zvrba · on June 30, 2012

Sigh, if the program is relying on undefined behavior [like this one], it's not C. Plus, gcc has __builtin_apply as extension.

daeken · on June 30, 2012

This is not undefined behavior. It's implementation specific behavior -- there's a huge difference. If you specify the calling convention for your functions, this is 100% deterministic and predictable. Nothing at all wrong with this approach.

zvrba · on June 30, 2012

Not undefined behavior, really? Run it on a 64-bit SPARC system and see if it works. Not to mention that this implementation is dog-slow (two rather slow syscalls [mmap/munmap] for each apply-invocation).

daeken · on June 30, 2012

Again, you're conflating undefined behavior with implementation-specific behavior. By your definition, anything that does implementation-specific typedefs, uses inline assembly, or even makes syscalls is taking advantage of undefined behavior. That's silly and ignores the actual definition.

zvrba · on June 30, 2012

> Again, you're conflating undefined behavior with implementation-specific behavior.

No, I'm not. Casting between data and function pointers (or even a function pointer to a function pointer of different signature!) is undefined behavior. Look up the C spec.

to3m · on June 30, 2012

6.3.2.3.8 - "A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer."

6.5.9.6 confirms that two function pointers that compare equal point to the same function.

From my careful study of the code, I have a nagging sensation that this might not quite be the end of the story as far as portability goes...

zvrba · on July 1, 2012

6.3.2.3.7 allows for conversion between pointers to data ("objects") and 6.3.2.3.8 allows for conversion between pointers to functions; there is no provision for casting between data and function pointers.

6.3.2.3.5 and .6 allow for abritrary (implementation-defined) conversions between integer types and pointer types, so any conversion between function and data pointers must go through a suitably large integer type if such exists (converting a function pointer to a too small integer type is UB).

to3m · on July 1, 2012

Yes - my response is to the suggestion that casting between pointers to functions of different signature is undefined. I took this to be a suggestion that the part of the approach that involves having a general-purpose function that can accept a pointer to any kind of function, and reliably use that pointer (after it's been casted, if necessary) to get a pointer to the function in question, is fundamentally non-portable.

I remain unconvinced that this is a serious issue, all things considered. The odd feeling that there could be other roadblocks to a fully portable implementation still remains. Nevertheless, it appears that this particular part could actually be done portably.

tedunangst · on June 30, 2012

The words inline assembly don't even exist in the C standard. Of course it's undefined. The standard specifically mentions a select finite list of behaviors which may be implementation defined (ones complement, twos complement, ...).

comex · on June 30, 2012

Incidentally, if you want "real" higher-order programming in C, take a look at blocks (Apple).

dchest · on June 30, 2012

...if you want "real" higher-order programming in "C"...

SoftwareMaven · on June 30, 2012

You're just not looking at the problem objectively...

Seriously, though, is there a reason the Objective C blocks wouldn't work in C, just substituting functions for methods?

chc · on June 30, 2012

The blocks are already supported in plain C mode as long as you enable the Clang extension. You just use, for example, block_Copy(aBlock) instead of [aBlock copy]. Apple framed it as a C feature in its proposal for standardization (because Objective-C doesn't have a real standard AFAIK).

maybird · on June 30, 2012

This makes me wish I could use "PROT_WRITE | PROT_EXEC" with mmap() on iOS.

buzzert · on June 30, 2012

You actually can by exploiting a bug in the code signing check in the kernel! http://books.google.com/books?id=1kDcjKcz9GwC&lpg=PT92&#...

aklofas · on June 30, 2012

There is a macro way to do this. It involves lots of magic and beating the preprocessor with a two-by-four: https://groups.google.com/forum/?fromgroups#!topic/comp.lang...

Go down this hole with caution.

bowyakka · on June 30, 2012

I often want to beat the pre-processor with a two-by-four