Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, this is the basic logic: for any incremental aggregation we need to detect groups which can be influenced by this new record or updated record. If we do row-based rolling aggregation then then indeed we need to update records (i-n, i+n). Yet, the following difficulties may arise:

o Generally, we do not want to re-compute aggregates - aggregates should be also updated, particularly, if n is very large

o In real applications, rolling aggregation is performed using partitioning on some objects. For example, we append new events from many different devices to one table and want to compute rolling aggregates for each individual device. Hence, this (i-n, i+n) will not work anymore.

o Rolling aggregation using absolute time windows will also work differently. Although, if records are ordered (like in stream processing) and there are no partitions, then it is easy.



Myself and a few others have done a lot of research on performing sliding window aggregations updates without recomputing everything. Our code is on github, and the README has links to the papers: https://github.com/IBM/sliding-window-aggregators




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: