Interesting. MapD and this project both use Thrust under the hood and from what ...

tmostak · on Jan 29, 2019

The MapD (now OmniSci) execution engine is actually built around a JIT compiler that translates SQL to LLVM IR for most operations, but it does use Thrust for a few things like sort. One performance advantage of using a JIT is that the system can fuse operators (like filter, group by and aggregate, and project) into one kernel without the overhead of calling multiple kernels (and particularly the overhead of materializing intermediate results, versus keeping these results in registers). Outputting LLVM IR also makes it easy for us run on CPU as well by for the most part bypassing the need to generate GPU specific code (for example, in CUDA).

It does make coding on the system a bit more difficult, but we found that the performance gains were worth it.

Also I would add that this system seems, at least for now, geared toward solving a specific set of problems Uber is grappling with around time-series data, whereas OmniSci is trying to tackle general purpose OLAP/analytics problems. Not that solving specific use cases is not great/valid, or that they don't plan to expand the functionality of AresDB over time to encompass broader use cases.

yayr · on Jan 29, 2019

Uber's article mentions: "...but as we evaluated the product, we realized that it did not have critical features for Uber’s use case, such as deduplication." however, I also see, they used go quite intensively. Maybe, they also did it for the productivity gains while implementing their additional features

jclay · on Jan 29, 2019

Does the LLVM IR output from the JIT compiler have to then be compiled to CUDA PTX with nvcc? I looked a while back for a JIT compiler for CUDA, but didn't find much.

tmostak · on Jan 29, 2019

LLVM actually has a native PTX backend, NVPTX, but since the Nvidia ISA is proprietary we use the CUDA driver API to generate the binary for the target GPU arch. (see here in MapD where it generates the PTX from the LLVM IR: https://github.com/omnisci/mapd-core/blob/ae88c486d1d790db54..., and here for where it calls the driver API to generate the native code: https://github.com/omnisci/mapd-core/blob/568e77d9c9706049ee...)

polskibus · on Jan 29, 2019

Why do you need to use Thrust to do sorting?

tmostak · on Jan 29, 2019

Why not? The Thrust folks have done a lot of good work on implementing highly optimized radix sort on GPU.

That said, there is interesting academic work around GPU sort that achieves even higher performance than Thrust in many scenarios, and we are looking at the feasibility of incorporating a framework we have found particularly promising.

polskibus · on Jan 29, 2019

Can thrust sort operate on datasets larger than GPU and CPU memory or does it require manually combining smaller sort operations into larger sorted sequences akin to merge sort?

jclay · on Jan 29, 2019

Can you share the academic work you're referring to? Would be interested to read more.

tmostak · on Jan 29, 2019

Elias Stehle's work out of Technical University of Munich is pretty awesome, see here: https://dl.acm.org/citation.cfm?id=3064043.

polskibus · on Jan 29, 2019

MapD is now called OmniSci and is mentioned in Uber article. I bet Uber had a good look at MapD internals.