Hacker Newsnew | past | comments | ask | show | jobs | submit | WireBaron's commentslogin

This is essentially correct. It's a little more complicated inside, and the expansion's really more like:

SELECT device_id, arrow_run_pipeline(timevector(ts, val), arrow_add_element(sort(), arrow_add_element(delta(), arrow_add_element(abs(), sum())));

The notable difference here is that this presents a lot more optimization potential, as the entire pipeline can conceivably be applied in one pass through the table.


The UDDSketch (default) implementation will allow rolling percentiles, though we still need a bit of work on our end to support it. There isn't a way to do this with TDigest however.


Sure there is. You simply maintain N phases of digests, and every T time you evict a phase and recompute the summary (because T-digests are easily merged).


I think this would be a tumbling window rather than a true "rolling" tdigest. I suppose you could decrement the buckets, but it gets a little weird as splits can't really be unsplit. The tumbling window one would probably work, though Tdigest is a little weird on merge etc as it's not completely deterministic with respect to ordering and merging (Uddsketch is) so it's likely you get something that is more than good enough, but wouldn't be the same as if you just calculated it directly so it gets a little confusing and difficult.

(NB: Post author here).


This is what I do, it's not a true rolling digest but it works well enough for my purposes.


We actually haven't been running against any limits here. One thing to keep in mind is that postgres remote-fetch operations aren't tuple-at-a-time, so this shouldn't be a bottleneck for our multi-node operations.


Have you done any analysis of your per-core scan rates for simple aggregations like sum/count + group by with a reasonably large cardinality key? Or has anyone published a benchmark you trust on queries of that variety?

An example would be TPC-H Q1, which is a little weak on the group by cardinality, but is good for testing raw aggregation performance.


We actually have done fairly extensive benchmarking of high cardinality data on our single-node product (we have a blog entry detailing at least our insert performance here: https://blog.timescale.com/blog/what-is-high-cardinality-how...)

We're actually currently focused on query optimization for our multi-node product, but we don't have any numbers we're currently ready to share.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: