Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting. MapD and this project both use Thrust under the hood and from what I gather, both attempting to address the same issue. Can anyone speak to the differences?

While I originally didn’t get the case for GPU accelerated databases, it makes more and more sense given that the bandwidth speeds between GPU and CPU are steadily increasing, making the GPU an increasingly attractive option since the latency for GPU<->CPU syncs is diminishing.



The MapD (now OmniSci) execution engine is actually built around a JIT compiler that translates SQL to LLVM IR for most operations, but it does use Thrust for a few things like sort. One performance advantage of using a JIT is that the system can fuse operators (like filter, group by and aggregate, and project) into one kernel without the overhead of calling multiple kernels (and particularly the overhead of materializing intermediate results, versus keeping these results in registers). Outputting LLVM IR also makes it easy for us run on CPU as well by for the most part bypassing the need to generate GPU specific code (for example, in CUDA).

It does make coding on the system a bit more difficult, but we found that the performance gains were worth it.

Also I would add that this system seems, at least for now, geared toward solving a specific set of problems Uber is grappling with around time-series data, whereas OmniSci is trying to tackle general purpose OLAP/analytics problems. Not that solving specific use cases is not great/valid, or that they don't plan to expand the functionality of AresDB over time to encompass broader use cases.


Uber's article mentions: "...but as we evaluated the product, we realized that it did not have critical features for Uber’s use case, such as deduplication." however, I also see, they used go quite intensively. Maybe, they also did it for the productivity gains while implementing their additional features


Does the LLVM IR output from the JIT compiler have to then be compiled to CUDA PTX with nvcc? I looked a while back for a JIT compiler for CUDA, but didn't find much.


LLVM actually has a native PTX backend, NVPTX, but since the Nvidia ISA is proprietary we use the CUDA driver API to generate the binary for the target GPU arch. (see here in MapD where it generates the PTX from the LLVM IR: https://github.com/omnisci/mapd-core/blob/ae88c486d1d790db54..., and here for where it calls the driver API to generate the native code: https://github.com/omnisci/mapd-core/blob/568e77d9c9706049ee...)


Why do you need to use Thrust to do sorting?


Why not? The Thrust folks have done a lot of good work on implementing highly optimized radix sort on GPU.

That said, there is interesting academic work around GPU sort that achieves even higher performance than Thrust in many scenarios, and we are looking at the feasibility of incorporating a framework we have found particularly promising.


Can thrust sort operate on datasets larger than GPU and CPU memory or does it require manually combining smaller sort operations into larger sorted sequences akin to merge sort?


Can you share the academic work you're referring to? Would be interested to read more.


Elias Stehle's work out of Technical University of Munich is pretty awesome, see here: https://dl.acm.org/citation.cfm?id=3064043.


MapD is now called OmniSci and is mentioned in Uber article. I bet Uber had a good look at MapD internals.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: