And my point: assuming a random error, your SHA1 is no better than a simple checksum.
So what's your error model? You make a big post about the nature of errors but you fail to specify the distribution of errors. If your distribution of errors is uniformly random, then a 128-bit checksum is optimal.
The reason why checksum isn't used, is because in practice, there are long-strings of "0" in data and "0" for errors / communications. Checksum fails to distinguish between 20x 0s in a row and 200x 0s in a row. Is that a problem you expect to find in files?
If your distribution of errors is bursty, then 128-bit CRC is optimal.
What distribution of errors are you assuming for SHA1? What kind of file errors are you expecting that makes SHA1 better than other methodologies?
---------
If all errors are random, and you want to distinguish from the "multiple 0s" problem, then the Adler32-like checks are sufficient at fixing both those problems, and the Adler32 methodology obviously extends out to 128-bits (just have two 64-bit sums)
CRC, specifically, is designed to optimally check for burst errors of its full length. (32-bit CRC finds all 32-bit burst errors. 128-bit CRC would find all 128-bit burst errors). That all CRC is, and that's all CRC is designed / math'd out to do.
The key thing about cryptographic hashes (not mere checksums) like SHA1 is that the distribution of the errors doesn't matter. Effectively, they're all the same. That's the point. If mere "runs" of bits were sufficient to trigger a collision, then the hash wouldn't be strong enough for cryptography!
This means that you can simply throw out any such modelling, as it is no longer relevant. You "care" only about bit-error-rates and hash collision rates, but even mere SHA1 is so thoroughly past the BER that it is essentially perfect. That is, it is indistinguishable from perfect on all pratical computers for all intents and purposes.
CRC codes have their uses, but if you just need to detect any corruption of a large-ish file (over 10KB), then cryptographic hashes are both fast and "perfect" in this physical sense. You will never get a collision with SHA256 or SHA512, even including adversarial, crafted inputs. The same "strength" attribute is not valid for CRC codes, they're vulnerable to deliberate corruption by an attacker.
So in that sense, SHA hashes are stronger that CRC checksums.
Birthday attack says that a 160-bit perfect cryptographic hash will have a collision with just 80-bits, on the average. This means that an 80-bit burst-error would probabilistically contain a potential SHA1 collision. (80-bits burst error doesn't mean that all the bits are flipped btw: it means that 80-bits have been randomized)
In contrast, CRC is designed specifically against burst errors. CRC is "regular" and "tweaked" in such a way that a 160-bit CRC would be immune to 160-bit burst errors of any and all kinds!
So if you care about burst errors, then CRC is in fact, better, than crypto-level hashes. And in practice, burst errors are the primary error that occurs in practice (scratches on a CD-ROM, bad sectors on a hard drive, lightning storm cuts out a few microseconds of WiFi, etc. etc.)
That is: noise isn't random in the real world. Noise is "clustered" around bursty events in practice.
--------
If burst-error is king, you can do far, far better than random methodologies. CRC is proof of that. That's why error distributions matter.
It only applies if you're comparing a large set of samples against each other. An example would be a "content-based indexing" system where a database primary key is the hash. Every insert then compares the hash against every entry that already exists. If there are 1 billion stored items, each 1 insert can have a potential collision with all 1 billion.
For validation, you have 1 input being compared against 1 valid value (or its hash/crc). There's no "billion inputs" in this scenario... just 1 potentially corrupt vs 1 known good.
Hence, no birthday attack.
It's the difference between two random people meeting and having the same birthday, versus any two people in a room full of people having the same birthday. Not the same scenario!
In practice, cryptographic hashes are always superior to checksums, once both have more than 128 bits. They're both strong enough, but the cryptographic has is resistant to deliberate attacks. The CRC won't be.
So what's your error model? You make a big post about the nature of errors but you fail to specify the distribution of errors. If your distribution of errors is uniformly random, then a 128-bit checksum is optimal.
The reason why checksum isn't used, is because in practice, there are long-strings of "0" in data and "0" for errors / communications. Checksum fails to distinguish between 20x 0s in a row and 200x 0s in a row. Is that a problem you expect to find in files?
If your distribution of errors is bursty, then 128-bit CRC is optimal.
What distribution of errors are you assuming for SHA1? What kind of file errors are you expecting that makes SHA1 better than other methodologies?
---------
If all errors are random, and you want to distinguish from the "multiple 0s" problem, then the Adler32-like checks are sufficient at fixing both those problems, and the Adler32 methodology obviously extends out to 128-bits (just have two 64-bit sums)
CRC, specifically, is designed to optimally check for burst errors of its full length. (32-bit CRC finds all 32-bit burst errors. 128-bit CRC would find all 128-bit burst errors). That all CRC is, and that's all CRC is designed / math'd out to do.