Basically, hash all your data items, keep track of the maximum number of leading 0 bits seen in a hash so far.
This is a decent proxy (but not without error) for how many distinct items you've seen so far since large numbers of leading zeros should be uncommon. Finding this means you probably saw a lot of other things to get to that point (intuitively, about N leading zeroes are expected after seeing 2^N items).
This is actually the same thing that a proof of work cryptocurrency does to control difficulty: change target number of leading zeros so miners have to do more/less work to find a match.
Of course, you could get "lucky" with a single counter and the resolution isn't great, so HLL first separates the values into buckets which are estimated separately and then combined to give a more robust estimate.
Hey I wrote the article —- the way I think of HyperLogLog and similar (but different) approaches like bloomfilters and count-min sketches is that the key insight is hashing.
The fact that we can reduce some piece of data to a hash, then work on the distributions of data/entropy of those hashes in number space many different ways is what makes these data structures work.
I think of it this way —- if you have a hashing algo that turns “someone@example.com” into “7395718184927”, you can really easily count with relative certainty how often you see that email address by keeping a sparse numeric associative array. Getting the same exact value again produces the same hash. Obviously doing this isn’t SUPER useful because then you just get strict cardinality amount (same as just checking equality with the other keys). but if you choose a weak enough, fast hashing function (or multiple, in the case of a count-min sketch), you can have collisions in what some values hash to but others do not — so that means there will be some amount of error — you can control that to suit your needs, on a scale of everything in one bucket to exact cardinality # of buckets.
Here’s an actual proper guide (CS lectures, as you might expect!) I found after a quick search, which you might find useful:
I don't have a good enough grasp HLL to summarize accurately, so for convenience here is the original paper[0] and hopefully HN comes through with a more digestible explanation.
The blog post uses Citus' Postgres extension[1] which uses an augmented various of the original algorithm to reduce the size of the data structure so that it is fixed-size
Can't screenshot it, but the background goes gray-ish, and you see a word white appear with a lot more brightness than I thought my display was capable of.
I use the excellent HashBackup [1], unfortunately not available on Windows without trickery.
It backs up to a local hard disk and to the cloud, fully-encrypted.
During that first server we were using a 3rd party minecraft hosting provider, so no way of knowing if that was the case. I'll double check my current server though, thanks.
Edit dug it out of my old emails, it was actually actually much longer than 6 years ago! The host was Phoenixerve, seems they don't exist anymore.