Hacker News new | past | comments | ask | show | jobs | submit login

A fantastic thing about HyperLogLog is that it can be merged, so you can split your data between multiple server, precompute HLL for all IPs every minute, and then ask "how many unique IPs was there yesterday".

Discovered HLL because it's used in ClickHouse, which employ a ton of cool but obscure data structure.




Works well in analytics cubes since they can be combined.

You can retain them across time too, such that you can ask questions like "how many unique users were there over the last N days?" without needing the source data. Great for privacy-aware analytics solutions.


Love DataSketches but I was wondering if there is a way to compute datasketches across time for e.g. I want to compute the users who did X and then Y in that order. Since intersection is commutative it doesnt give an answer for time ordering.

Nonetheless the best data structure I have read over last 10 years.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: