The notable difference here is that this presents a lot more optimization potential, as the entire pipeline can conceivably be applied in one pass through the table.
The UDDSketch (default) implementation will allow rolling percentiles, though we still need a bit of work on our end to support it. There isn't a way to do this with TDigest however.
Sure there is. You simply maintain N phases of digests, and every T time you evict a phase and recompute the summary (because T-digests are easily merged).
I think this would be a tumbling window rather than a true "rolling" tdigest. I suppose you could decrement the buckets, but it gets a little weird as splits can't really be unsplit. The tumbling window one would probably work, though Tdigest is a little weird on merge etc as it's not completely deterministic with respect to ordering and merging (Uddsketch is) so it's likely you get something that is more than good enough, but wouldn't be the same as if you just calculated it directly so it gets a little confusing and difficult.
We actually haven't been running against any limits here. One thing to keep in mind is that postgres remote-fetch operations aren't tuple-at-a-time, so this shouldn't be a bottleneck for our multi-node operations.
Have you done any analysis of your per-core scan rates for simple aggregations like sum/count + group by with a reasonably large cardinality key? Or has anyone published a benchmark you trust on queries of that variety?
An example would be TPC-H Q1, which is a little weak on the group by cardinality, but is good for testing raw aggregation performance.
SELECT device_id, arrow_run_pipeline(timevector(ts, val), arrow_add_element(sort(), arrow_add_element(delta(), arrow_add_element(abs(), sum())));
The notable difference here is that this presents a lot more optimization potential, as the entire pipeline can conceivably be applied in one pass through the table.