Very cool! I love geeking out on analytics tech and look forward to studying its design further. My take as I see it so far (please correct me if I'm wrong)-
* As a datapoint Pinot/Druid/Clickhouse can do 1B timeseries on one server. AresDB sounds like it's in the same ballpark here
* Pinot/Druid don't do cross table joins where AresDB can. My understanding is these are at (or near?) sub-second which would be a very distinguishing feature. I'm not sure how this will translate to when distributed mode is built out, as shuffling would become the bottleneck. Maybe there would be some partitioning strategy that within a partition allows arbitrary joining or something?
* Clickhouse can do cross table joins, but aren't going to be sub-second
* AresDB supports event-deduping. I think this can easily be handled by the upstream systems (samza, spark, flink, ..) in lambda
* Reliance on fact/dimension tables.
- This design/encoding is probably to help overcome transfer from memory to GPU, which in my limited experience with Thrust was always the bottleneck.
- High cardinality columns would make dimension tables grow very large and could become un-unmanageable (unless they are somehow trimmable?)
Regarding your second point: your intuition seems good, as Alipay apparently extended Druid to performs joins this way, with good performance [1]. Unfortunately it looks like they won't finish open-sourcing it, but it at least validates the idea.
* As a datapoint Pinot/Druid/Clickhouse can do 1B timeseries on one server. AresDB sounds like it's in the same ballpark here
* Pinot/Druid don't do cross table joins where AresDB can. My understanding is these are at (or near?) sub-second which would be a very distinguishing feature. I'm not sure how this will translate to when distributed mode is built out, as shuffling would become the bottleneck. Maybe there would be some partitioning strategy that within a partition allows arbitrary joining or something?
* Clickhouse can do cross table joins, but aren't going to be sub-second
* AresDB supports event-deduping. I think this can easily be handled by the upstream systems (samza, spark, flink, ..) in lambda
* Reliance on fact/dimension tables. - This design/encoding is probably to help overcome transfer from memory to GPU, which in my limited experience with Thrust was always the bottleneck. - High cardinality columns would make dimension tables grow very large and could become un-unmanageable (unless they are somehow trimmable?)