More

WayToDoor · on Nov 9, 2022

It seems pinned, not deleted

WayToDoor · on Sept 29, 2022

Dupe of https://news.ycombinator.com/item?id=33019316

dang · on Sept 29, 2022

Comments moved thither. Thanks!

WayToDoor · on July 18, 2022

So how does the HyperLogLog work exactly? It seems like the solution of choice, but I'd like to know more about the internals.

periodontal · on July 18, 2022

Basically, hash all your data items, keep track of the maximum number of leading 0 bits seen in a hash so far.

This is a decent proxy (but not without error) for how many distinct items you've seen so far since large numbers of leading zeros should be uncommon. Finding this means you probably saw a lot of other things to get to that point (intuitively, about N leading zeroes are expected after seeing 2^N items).

This is actually the same thing that a proof of work cryptocurrency does to control difficulty: change target number of leading zeros so miners have to do more/less work to find a match.

Of course, you could get "lucky" with a single counter and the resolution isn't great, so HLL first separates the values into buckets which are estimated separately and then combined to give a more robust estimate.

hardwaresofton · on July 18, 2022

Hey I wrote the article —- the way I think of HyperLogLog and similar (but different) approaches like bloomfilters and count-min sketches is that the key insight is hashing.

The fact that we can reduce some piece of data to a hash, then work on the distributions of data/entropy of those hashes in number space many different ways is what makes these data structures work.

I think of it this way —- if you have a hashing algo that turns “someone@example.com” into “7395718184927”, you can really easily count with relative certainty how often you see that email address by keeping a sparse numeric associative array. Getting the same exact value again produces the same hash. Obviously doing this isn’t SUPER useful because then you just get strict cardinality amount (same as just checking equality with the other keys). but if you choose a weak enough, fast hashing function (or multiple, in the case of a count-min sketch), you can have collisions in what some values hash to but others do not — so that means there will be some amount of error — you can control that to suit your needs, on a scale of everything in one bucket to exact cardinality # of buckets.

Here’s an actual proper guide (CS lectures, as you might expect!) I found after a quick search, which you might find useful:

https://williams-cs.github.io/cs358-f21/lectures/lecture10/l...

(IMO focus on the algorithm description)

And here’s some old HN discussion:

https://news.ycombinator.com/item?id=7508658

To really try and understand the concept I found it useful a while back to actually try and build it:

https://vadosware.io/post/countmin-sketch-in-haskell/

http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf (This paper has a really good visual depiction)

I find the count-min sketch to be much more approachable

kiwicopple · on July 18, 2022

I don't have a good enough grasp HLL to summarize accurately, so for convenience here is the original paper[0] and hopefully HN comes through with a more digestible explanation.

The blog post uses Citus' Postgres extension[1] which uses an augmented various of the original algorithm to reduce the size of the data structure so that it is fixed-size

[0] HLL paper: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

[1] postgresql-hll: https://github.com/citusdata/postgresql-hll

WayToDoor · on May 16, 2022

Can't screenshot it, but the background goes gray-ish, and you see a word white appear with a lot more brightness than I thought my display was capable of.

WayToDoor · on May 16, 2022

I use the excellent HashBackup [1], unfortunately not available on Windows without trickery. It backs up to a local hard disk and to the cloud, fully-encrypted.

[1] https://www.hashbackup.com/hashbackup/index.html

WayToDoor · on April 14, 2022

Make sure you are running in online mode. If you aren't, yeah, it's very easy to spoof the username, by using any of the hacked clients in existence.

squeaky-clean · on April 14, 2022

During that first server we were using a 3rd party minecraft hosting provider, so no way of knowing if that was the case. I'll double check my current server though, thanks.

Edit dug it out of my old emails, it was actually actually much longer than 6 years ago! The host was Phoenixerve, seems they don't exist anymore.

WayToDoor · on March 29, 2022

Have you tried using rclone with their S3 backend? https://forum.rclone.org/t/setting-rclone-for-idrive/20822

kingcharles · on March 30, 2022

Thank you. I tried this some time ago from another article and could not get it to work. I'll try from your link and see how I get on.

WayToDoor · on March 25, 2022

Link to the rent https://superuser.com/questions/419070/transatlantic-ping-fa...

WayToDoor · on March 13, 2022

Can't you just add an extra number?

WayToDoor · on March 3, 2022

Or you can just use dc3dd that features a progress bar by default.