Anyone used this GPU database that ranks at the top of that list, above the open...

datumdadum · on Feb 1, 2019

Haven't, but it's worth noting that hardware is probably attributable to them edging out Mapd since they're on a 5-node minsky cluster featuring nvlink, hence as arnon said, are benefiting from 9.5x faster transfer from disk than than PCIe 3.0. That blog has not yet tested Mapd's IBM Power version-- would be interesting to see how it compared on that cluster.

CMU has an interesting lecture if you want to learn more about Brytlyt https://www.youtube.com/watch?v=oL0IIMQjFrs

Couple interesting things to note:

- They attain some of their speed by requiring that data be pre-sorted https://youtu.be/oL0IIMQjFrs?t=1092 https://youtu.be/oL0IIMQjFrs?t=1127

- They've built their database on Postgres for query planning, but for any query which does not match what they've accelerated on GPU, they do not have the ability to failover to utilizing postgres on the CPU. https://youtu.be/oL0IIMQjFrs?t=3260

- Data is brought into GPU memory at table CREATE time, so the cost of transferring data from disk->host RAM->GPU RAM is not reflected. Probably wouldn't work if you want to shuffle data in/out of GPU RAM across changing query workloads. https://youtu.be/oL0IIMQjFrs?t=1310

- The blog had to use data type DATE instead of DATETIME for Brytlyt since it doesn't support the latter. DATETIME was used for the other DBs, which is a heavier computation. https://tech.marksblogg.com/billion-nyc-taxi-rides-p2-16xlar...

So all-in-all it seems like a more carefully constructed, hardware balanced comparison would be needed to see which the quickest would be.

bufferoverflow · on Jan 30, 2019

Note that it's at the top of the list probably because it's running on a cluster. It would be awesome to see such a comparison on some standard hardware, like a large AWS GPU instance (eg1.2xlarge).

Also note that the dataset is 600GB, so it won't fit a sinlge GPU, not even close.