Building a modern database using LLVM (2013) [pdf]

bazizbaziz · on Dec 14, 2016

This is really great work, and awesome that they're developing this as open source. Combined with the results from HyPer folks it sure is starting to look like using LLVM to specialize code on the fly is a good idea for any data processing engine.

Looking more closely at the benchmarking results has me scratching my head, though: their reported 16x performance benefits from codegen for TPCH Q1 has seemingly dropped to 2x when compared to the [REDACTED] database. What's happening?

My guess is that Impala is sort of inefficient in a few places that still need work (which is OK, this is not a criticism of that). I bet that [REDACTED] is quite efficient due to having been in development for least 2x longer than Impala. Maybe even closer to 10x. In which case, getting within 2x is fantastic!

arnon · on Dec 14, 2016

On the grand scheme of things, TPCH-1 is not a very compute intense query, and it doesn't really reflect on most real life scenarios.

I'd love to see some other benchmarks for this kind of operation.

batbomb · on Dec 14, 2016

ha, nice call out to the DeWitt clause instead of just saying DBMS X.

slowmovintarget · on Dec 14, 2016

They're building Datomic with LLVM?