Using LuaJIT with minimal FFI C code to optimize, seems to be the best way forward in maximizing both performance and maintainability.
What would be really interesting is to see someone highlight specific cases where this approach ultimately fails to measure up in performance with using pure C.
I would think that the LuaJIT approach would be tens of times more maintainable for a sufficiently large application, so it's really imperative here that we ask 'Why not?'
You can always find cases where it would fail. And that's ok. For what it's doing, and what it allows it's simply unbelievable (at least to me) how it succeeds in doing it (I understand very little of compilers, and little of interpretters).
One area which is not easy translatable is OpenMP (www.openmp.org), inlined assembly, and SSE packed floats. But that's okay, and even then there is probably a better alternative - a language more suited to such tasks, instead of "C" - OpenCL (www.khronos.org) or DirectCompute.
What would be really interesting is to see someone highlight specific cases where this approach ultimately fails to measure up in performance with using pure C.
I would think that the LuaJIT approach would be tens of times more maintainable for a sufficiently large application, so it's really imperative here that we ask 'Why not?'