I'd argue on ClickHouse not even being that fast (compared to comparable technology like Snowflake, Redshift or BigQuery) but actually the ScyllaDB example being completely misleading. Scylla is probably one of the fastest OLTP datastores, yet they're benchmarking an analytics query — which is pretty easy to crack by any columnar datastore.
The actual point here is that you can execute millions of (different!) individual queries per second on ScyllaDB, which beats any columnar datastore hands down. ClickHouse "cheated" here by translating the (unfortunate) benchmark setup into a single query that's extremely heavily optimized under the hood.
Actually while ClickHouse does not have all features of RedShift, BigQuery etc it usually is much faster than them. It can be slower on some workloads on GPU powered systems, when all data fits in GPU memory but it is not the use case it targets.
ScyllaDB is amazing when it comes to OLTP performance but not in the Analytical ones.
I think they took pretty mediocre Analytical Workload results and shared them as something outstanding.
The restriction to a tiny GPU workload is increasingly wrong for assessments.
GPU compute stacks are increasingly geared towards multi-gpu/multi-node & streaming, esp. given the crazy bandwidth they're now built for (2TB/s for a dgx2 node?). Likewise, per-GPU memory and per-GPU-node memory is going up nicely each year (16-24GB/GPU, and 100GB-512GB/node with TBs connected same-node). Network is more likely to become the bottleneck if you saturate that, not your DB :)
Though I like to do mostly single gpu streaming in practice b/c I like not having to think about multinode and they're pretty cheap now :)
The actual point here is that you can execute millions of (different!) individual queries per second on ScyllaDB, which beats any columnar datastore hands down. ClickHouse "cheated" here by translating the (unfortunate) benchmark setup into a single query that's extremely heavily optimized under the hood.