I'm curious about the setting in which you saw these failures, could you elabora...

vardump · on March 13, 2017

> This means if your bitrot is random and you're using 16KB blocks, you will need to see on the order of ((2 * * 32) * 16KB)=64TB of corrupted data to get a random failure.

No, that's what you need to generate one guaranteed failure, when enumerating different random corruption possibilities. Simply because 32-bit number can at most represent 2^32 different states.

In practice, you'd have 50% probability to have a collision for every 32 TB... assuming perfect distribution.

By the way, 32 TB takes just 4-15 hours to read from a modern SSD. Terabyte is just not so much data nowadays.

leni536 · on March 13, 2017

Just a nit: you don't get a guaranteed failure at 64TB, you get a failure with approx 1-1/e ~= 63% probability. At 32TB you get a failure with approx 1-1/sqrt(e) ~= 39% probability.

I do agree that tens of TBs are not too much data, but mind that this probability means that you need to feed your checksum 64TB worth of 16KB blocks, every one of them being corrupt, to let at least one of them go trough unnoticed with 63% probability. So you don't only need to calculate with the throughput of your SSD, but the throughput multiplied with the corruption rate.

marcosdumay · on March 13, 2017

For disk storage CRC-32C is still non-broken. You can't say the same about on-board communication protocols or even some LANs.

When people started using CRC-32 it was because with the technology of the time it was virtually impossible to see collisions. Nowadays we are discussing if it's reasonable to expect a data volume that gives you 40% or 60% of collision chance.

CRC32 end is way overdue. We should standardize on a CRC64 algorithm soon, or will have our hands forced and probably stick with a bad choice.

speleo_engr · on March 13, 2017

Posts like this are what keep me coming back to HN.