I wonder if it would make sense to use `concat(sha1, sha256)` hash algorithm. Th...

patrec · on Feb 4, 2020

I supposed you are advocating two distinct Merkle trees? Because otherwise the prefixes will change anyway.

But the only reason this would be attractive is because then people could keep using the existing prefixes to refer to the whole commit. But of course doing this would be insecure. So for this to make any sense at all, people would need to make good choices on when to use an insecure prefix and when to use the whole hash, because it's security relevant. This seems a bit doubtful to me.

GlitchMr · on Feb 4, 2020

To be fair, the prefix problem would exist no matter what hash function would you pick. GitHub displays 7 characters of a hash, giving 28 bits. You could very quickly generate collisions with birthday attack in pretty much no time. Prefixes are always going to be insecure because they are so short.

In fact, https://github.com/bradfitz/gitbrute exists.

patrec · on Feb 4, 2020

Correct, but backwards compatibility does make a difference here, as in: there are surely quite a few cases where it would not be attractive to use a shortened hash if git hashes are changed incompatibly anyway, but where it will be attractive to use the shortened hash, because that keeps an existing setup working as before.

Also: the prefixing increases the length of the hash (and hence the desire to shorten it) without adding any security.

GlitchMr · on Feb 4, 2020

Yeah, kinda agreeing here. The hash length will need to be increased anyway, but concatenation of SHA1 and SHA256 will be 104 bytes in total when displayed (40 + 64), which is a lot.

It may be a better to display SHA-256 commit hashes, but accept SHA-1 hash prefixes for old commits. It may be confusing for git to accept hashes that aren't visible in `git log`, but it's probably for the better.

dchest · on Feb 4, 2020

Something to remember about the security of concatenated hashes: https://crypto.stackexchange.com/a/63543/291

GlitchMr · on Feb 4, 2020

I'm well aware concatenation wouldn't necessarily improve the strength. However, the idea is, even if SHA-1 was hopelessly broken. CONCAT(SHA1(x), SHA256(x)) would be at least as strong as SHA-256 (where "at least" means it may have the same strength).

Double_a_92 · on Feb 4, 2020

If you know that it's a concatenation, couldn't you only look at the SHA1 part and completely bypass any other strong hash? On second thought probably not, because you might find any possible collision, that isn't a collition on all the other hash algorithms. If you bruteforce through a password list it would still apply though.

GlitchMr · on Feb 4, 2020

This doesn't work for collision resistance attacks. git commits aren't password hashes. Specifically, the attacker's goal in this case is to find different values a and b for which hash(a) = hash(b), rather than finding a value of m in h = hash(m) for known h.

simias · on Feb 4, 2020

Things like signed commits would still use the full hash, so that would make tampering with that impossible.

This solution would basically just make the UI backward-compatible while still requiring the complete modification of the internal to change the hash function.

You'd still risk a collision if you refer to commits using a shortened hash outside of git but something tells me that you don't even need a vulnerability to take advantage of that if you have an attack vector. For instance github seems to use 7 hex digit in short hashes, this could probably be bruteforced relatively easily (be it for SHA-1 or SHA-256). To give you an idea I looked at the current bitcoin difficulty (which AFAIK uses two rounds of SHA-256 internally and works by bruteforcing hashes with a certain number of leaning zeroes) and the hashes look like this: 000000000000000000028048b31e42bd53d3b36da90d1a840ae695ec1a5ee738

pwagland · on Feb 4, 2020

This would help if you _only_ shared the prefix, however git would still use the full hash.

The proposed method would have the advantage of keeping existing known abbreviations, which are _already_ less secure than SHA-1, while keeping the security of the second hash.

It also has the disadvantage that the full hash would become excessively large and unwieldy, so pros and cons.

bangboombang · on Feb 4, 2020

This is pretty interesting and shows you shouldn't try to pull any sort of stunts if you're not a crypto expert. I've actually wondered before whether md5 + sha1 would result in something stronger than those two used individually. Now I know.

GlitchMr · on Feb 4, 2020

By the way, this may be rather obvious, but concatenating hash algorithms is a terrible idea for passwords. A password cracker could easily pick the less secure algorithm to crack, and ignore the other hash.

Note that git doesn't concern itself with reversing a hash function. The commit contents are part of a repository, there is no value in guessing the commit contents basing on its hash. Here, the hash function choice is purely about collision resistance.

But yeah, don't do weird things with hashes. Cryptography is hard. Don't invent memecrypto: https://twitter.com/sciresm/status/912082817412063233, it's not going to increase the security. Use a single algorithm if you can. Don't transform the output of a hash function in any way.

bawolff · on Feb 4, 2020

The linked article doesnt contradict the original post. Linked article says strength of 2 hash algos (of this type) is only as strong as the strongest and not the sum of their strengths. But original poster only needed the combined hash to be as strong as the sha256 for his/her purpose.

Notwithstanding, i still dont like it as an idea.

rocqua · on Feb 4, 2020

There is a downside that this would mean commit-prefixes remain sensitive to collisions. Hence anyone checking out a commit by a hash-prefix would still be vulnerable.

Not a dealbreaker by far, but still a slight mark against this solution.

IshKebab · on Feb 4, 2020

I don't see how it would change anything. A collision of a short prefix is trivial to generate with any hash.

u801e · on Feb 4, 2020

Does git have code to detect whether a hash prefix is ambiguous? I know that if you use a short prefix (which is more likely to be shared by multiple objects), git will output an error message staying that the object reference is ambiguous IIRC.

loeg · on Feb 4, 2020

ascar · on Feb 4, 2020

I'm probably missing something, but isn't it simpler to just make both available separately and allow users to still reference by sha1, if they want to, while sha256 can be used for collision detection by git operations internally?

patrec · on Feb 4, 2020

Correct, and I think this is what they are doing -- you can optionally keep the sha1s around.

timvisee · on Feb 4, 2020

Very interesting idea. But, wouldn't existing hashed be kept intact anyway.