The hash is based on the content of the repo (and its history), is it not? Or am I misunderstanding? It's not that they can make a repo with arbitrary code of their choice cause a collision, correct? The chance of the collision be something meaningful in the context of the code at hand is vanishingly small, isn't it? Is it the concern that this can happen at all?
> The hash is based on the content of the repo (and its history), is it not? Or am I misunderstanding?
That is correct. The hash of a commit is based on the parent commit's hash, the hash of the tree (file state), and commit message (and maybe more, I don't remember).
>It's not that they can make a repo with arbitrary code of their choice cause a collision, correct?
They can make a repo with arbitrary code, but they need to change make specific (potentially wierd) commits to get a specific hash.
> The chance of the collision be something meaningful in the context of the code at hand is vanishingly small, isn't it? Is it the concern that this can happen at all?
The chance by normal users is really small. But someone malicious could intentionally try to manipulate it. This could be done by varying commit messages.
So if you trust such a person does not have control over the repo (Linus's position)its fine, but if a hash isn't cryptographically strong, malicious actors can make repos with commits pointing to arbitrary code with a specified hash.
Thanks for confirming. Getting back to the issue of using Github as a source of truth for deployment, that's got a host of issues besides reliance on SHA1, but I appreciate the reasoned response to the issues surrounding the hash itself as well. Appreciated!