Hacker News new | past | comments | ask | show | jobs | submit login

I wonder if it would make sense to use `concat(sha1, sha256)` hash algorithm. This wouldn't change the prefixes while improving strength of an algorithm (by including SHA256 in a hash).



I supposed you are advocating two distinct Merkle trees? Because otherwise the prefixes will change anyway.

But the only reason this would be attractive is because then people could keep using the existing prefixes to refer to the whole commit. But of course doing this would be insecure. So for this to make any sense at all, people would need to make good choices on when to use an insecure prefix and when to use the whole hash, because it's security relevant. This seems a bit doubtful to me.


To be fair, the prefix problem would exist no matter what hash function would you pick. GitHub displays 7 characters of a hash, giving 28 bits. You could very quickly generate collisions with birthday attack in pretty much no time. Prefixes are always going to be insecure because they are so short.

In fact, https://github.com/bradfitz/gitbrute exists.


Correct, but backwards compatibility does make a difference here, as in: there are surely quite a few cases where it would not be attractive to use a shortened hash if git hashes are changed incompatibly anyway, but where it will be attractive to use the shortened hash, because that keeps an existing setup working as before.

Also: the prefixing increases the length of the hash (and hence the desire to shorten it) without adding any security.


Yeah, kinda agreeing here. The hash length will need to be increased anyway, but concatenation of SHA1 and SHA256 will be 104 bytes in total when displayed (40 + 64), which is a lot.

It may be a better to display SHA-256 commit hashes, but accept SHA-1 hash prefixes for old commits. It may be confusing for git to accept hashes that aren't visible in `git log`, but it's probably for the better.


Something to remember about the security of concatenated hashes: https://crypto.stackexchange.com/a/63543/291


I'm well aware concatenation wouldn't necessarily improve the strength. However, the idea is, even if SHA-1 was hopelessly broken. CONCAT(SHA1(x), SHA256(x)) would be at least as strong as SHA-256 (where "at least" means it may have the same strength).


If you know that it's a concatenation, couldn't you only look at the SHA1 part and completely bypass any other strong hash? On second thought probably not, because you might find any possible collision, that isn't a collition on all the other hash algorithms. If you bruteforce through a password list it would still apply though.


This doesn't work for collision resistance attacks. git commits aren't password hashes. Specifically, the attacker's goal in this case is to find different values a and b for which hash(a) = hash(b), rather than finding a value of m in h = hash(m) for known h.


Things like signed commits would still use the full hash, so that would make tampering with that impossible.

This solution would basically just make the UI backward-compatible while still requiring the complete modification of the internal to change the hash function.

You'd still risk a collision if you refer to commits using a shortened hash outside of git but something tells me that you don't even need a vulnerability to take advantage of that if you have an attack vector. For instance github seems to use 7 hex digit in short hashes, this could probably be bruteforced relatively easily (be it for SHA-1 or SHA-256). To give you an idea I looked at the current bitcoin difficulty (which AFAIK uses two rounds of SHA-256 internally and works by bruteforcing hashes with a certain number of leaning zeroes) and the hashes look like this: 000000000000000000028048b31e42bd53d3b36da90d1a840ae695ec1a5ee738


This would help if you _only_ shared the prefix, however git would still use the full hash.

The proposed method would have the advantage of keeping existing known abbreviations, which are _already_ less secure than SHA-1, while keeping the security of the second hash.

It also has the disadvantage that the full hash would become excessively large and unwieldy, so pros and cons.


This is pretty interesting and shows you shouldn't try to pull any sort of stunts if you're not a crypto expert. I've actually wondered before whether md5 + sha1 would result in something stronger than those two used individually. Now I know.


By the way, this may be rather obvious, but concatenating hash algorithms is a terrible idea for passwords. A password cracker could easily pick the less secure algorithm to crack, and ignore the other hash.

Note that git doesn't concern itself with reversing a hash function. The commit contents are part of a repository, there is no value in guessing the commit contents basing on its hash. Here, the hash function choice is purely about collision resistance.

But yeah, don't do weird things with hashes. Cryptography is hard. Don't invent memecrypto: https://twitter.com/sciresm/status/912082817412063233, it's not going to increase the security. Use a single algorithm if you can. Don't transform the output of a hash function in any way.


The linked article doesnt contradict the original post. Linked article says strength of 2 hash algos (of this type) is only as strong as the strongest and not the sum of their strengths. But original poster only needed the combined hash to be as strong as the sha256 for his/her purpose.

Notwithstanding, i still dont like it as an idea.


There is a downside that this would mean commit-prefixes remain sensitive to collisions. Hence anyone checking out a commit by a hash-prefix would still be vulnerable.

Not a dealbreaker by far, but still a slight mark against this solution.


I don't see how it would change anything. A collision of a short prefix is trivial to generate with any hash.


Does git have code to detect whether a hash prefix is ambiguous? I know that if you use a short prefix (which is more likely to be shared by multiple objects), git will output an error message staying that the object reference is ambiguous IIRC.


Yes.


I'm probably missing something, but isn't it simpler to just make both available separately and allow users to still reference by sha1, if they want to, while sha256 can be used for collision detection by git operations internally?


Correct, and I think this is what they are doing -- you can optionally keep the sha1s around.


Very interesting idea. But, wouldn't existing hashed be kept intact anyway.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: