Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing is, CRC is the "cheap" trick.

The "expensive" trick is to go full Reed Solomon error correction if you really care about data (aka: RAID5, RAID6, etc. etc.)... which is built up from the same theory as CRCs except more comprehensive at errors (fulfilling error correction as well).

> If you want to decide which is the acceptable error detection length, the best way would be to make a simulation, with random data of the expected length to which various random errors having the expected number should be applied. Then you can test the alternative error detection codes to see whether they detect the errors or not.

If that error is uniformly random, then a 128-bit sum is optimal and no different than 128-bit hash.

The reason we have this debate at all, is because different groups of people make different assumptions about the distribution of errors.

EDIT: I think it is reasonable to assume that burst-errors would occur on a storage medium: perhaps per-sector (Hard Drive) or per-block (SSD). Reed Solomon when applied to RAID assumes that an entire hard drive fails all at once and all together, for example.

As such: the key when discussing checksums / error detection / error correction is figuring out exactly how errors occur, and what to do about them (aka: detect vs correct).

The more modern schools of thought are to force errors to look as random as possible (hash and LDPC), while older schools of thought are to analyze the nature of errors and focus on them (CRC and Reed Solomon)



> If that error is uniformly random, then a 128-bit sum is optimal and no different than 128-bit hash.

If the data you read is uniformly random then what you say is correct. But if there is a single error (bit/byte/word whatever) which is placed uniformly randomly and takes a uniformly random value then other codes are vastly stronger than using a hash.

While what you say is technically correct-- that a 'uniformly random error' means replacing the input with an entirely independent input, that isn't what most people would call an 'error' and isn't the kind of error that most applications of a check value would detect.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: