Hacker News new | past | comments | ask | show | jobs | submit login

Fossil's main author is chiming in the discussion of this on Fossil's forums:

(https://fossil-scm.org/forum/forumpost/50a5bea5fb)

> That's appalling. Fossil's implementation doesn't require a conversion.

“This is a key point, that I want to highlight. I'm sorry that it wasn't made more clear in the LWN posting nor in the HN discussion.

“With Fossil, to begin using the new SHA3 hash algorithm, you just upgrade your fossil binary. No further actions, workflow changes, disruptions, or thought are required on the part of the user.

* “Old check-ins with SHA1 hashes continue to use their SHA1 hash names.”

* “New check-ins automatically get more secure SHA3 hash names.”

* “No repository conversions need to occur”

* “Given a hash prefix, Fossil automatically figures out whether it is dealing with a SHA1 or a SHA3 hash”

* “No human brain-cycles are wasted trying to navigate through a hash-algorithm cut-over.”

“Contrast this to Git, where a repository must be either all-SHA1 or all-SHA2. Hence, to cut-over a repository requires rebuilding the repository and in the process renaming all historical artifacts -- essentially rebasing the entire repository. The historical artifact renaming means that external links to historical check-ins (such as in tickets) are broken. And during the transition period, users have to be constantly aware of whether they are using SHA1 or SHA2 hash names. It is a big mess. It is no wonder, then, that few people have been eager to transition their repositories over to the newer SHA2 format.”




The way I read the fossil's authors comments, old commits continue to use sha1 hashes. A repository will be vulnerable to sha1 collision attacks as long as there is an object in the repository that has not been hashed with the new algorithm.

For example, floppy.c could be replaced in a repo with file with the same sha1 hash as long as the last commit that modifies floppy.c used a sha1 hash.

Right?


Just to be clear: Every time you modify a file, the new changes get put in using SHA3. In an older repository, any given commit might have some files identified using SHA1 (assuming they have not changed in 3 years) and others identified using SHA3.

For example, the manifest of the latest SQLite check-in is see at (https://www.sqlite.org/src/artifact/29a969d6b1709b80). You can see that most of the files have longer SHA3 hashes, but some of the files that have not been touched in three years still carry SHA1 hashes.

An attack like what you describe is possible if you could generate an evil.c file that has the exact same SHA1 hash as the older floppy.c file. Then you could substitute the evil.c artifact in place of the floppy.c artifact, get some unsuspecting victim to clone your modified repository, and cause mischief that way. Note, however, that this is a pre-image attack, which is rather more difficult to pull off than the collision attacks against SHA1, and (to my knowledge) has never been publicly demonstrated. Furthermore, the evil.c file with the same SHA1 hash would need to be valid C code that does something evil while still yielding the same hash (good luck with that!) and Fossil (like Git) has also switched over to Hardened SHA1, making the attack even harder still.

As still more defense, Fossil also maintains a MD5 hash against the entire content of the commit. So, in addition to finding evil.c that compiles, does your evil bidding, has the same hardened-SHA1 hash as floppy.c, you also have to make sure that the entire commit has the same MD5 hash after substituting the text of evil.c in place of floppy.c.

So, no, it is not really practical to hack a Fossil repository as you describe.


Isn't this the same attack given as an example why git is migrating hash functions in the subject article?

The attack may be difficult and unlikely I'm not questioning that, but if I understand correctly then Fossil's migration is straightforward because they did not address the same issues Git chose to.


> if I understand correctly then Fossil's migration is straightforward because they did not address the same issues Git chose to.

I think more is at play here.

(1) You can set Fossil to ignore all SHA1 artifacts using the "shun-sha1" hash policy.

(2) The excess complication in the Git migration strategy is likely due to the inability of the underlying Git file formats to handle two different hash algorithms in the same repository at the same time.

But, I could be wrong. Post a rebuttal if you have evidence to the contrary.


(2) The excess complication in the Git migration strategy is likely due to the inability of the underlying Git file formats to handle two different hash algorithms in the same repository at the same time.

But, I could be wrong. Post a rebuttal if you have evidence to the contrary.

It seems unfair to demand a rebuttal when you are the one who made the claim.

According to the article at least, the difficulty stems mainly from their migration strategy, for converting all existing SHA1 hashes.


> the difficulty stems mainly from their migration strategy, for converting all existing SHA1 hashes.

That's essentially the same difficulty, since the only strategy for doing this that has been historically proven to work seamlessly and painlessly involves being able to handle both hash algorithms in the same repository at the same time.


> Furthermore, the evil.c file with the same SHA1 hash would need to be valid C code that does something evil while still yielding the same hash

...and also produce an innocent-looking diff!

I mean, you could stuff a bunch of random bytes into a C comment to force the desired hash in the output using these documented attack techniques, but anyone inspecting the diffs between versions is likely to see such an explosion of noise and call foul.

If you want an analogy, it's like someone saying they've learned to impersonate federal agent identification cards, only it requires that the person carrying the fake ID to have a thousand rainbow-dyed ducks on a leash in tow behind him.

Such attacks are fine when it's dumb software systems doing the checks, but for a source code repository where people do in fact visually check the diffs occasionally?

Well, let's just say that when someone manages to use SHAttered and/or SHAmbles type attacks on Git (or even Fossil) I expect that it won't take a genius detective to see that the repo's been attacked.


Many diff tools don't highlight whitespace-only changes. Or at least not in a clear manner.

Also, if something is replaced in the history how often do people go back and view diffs in old code? Hardly often enough to rely on it being spotted.


It only takes one person to raise the flag.

Sure, many thousands of people doing blind "git clone && configure && sudo make install" could be burned by a problem like this, but someone would eventually do a diff and see the problem on any project big enough to have those thousands of trusting users in the first place.

I'm not excusing these SHA-1 weaknesses, only pointing out that it won't be trivial to apply them to program source code repos no matter how cheap the attacks get.

For instance, the demonstration case for SHAttered was a pair of PDFs: humans can't reasonably inspect those to find whatever noise had to be stuffed into them to achieve the result.

I also understand that these SHA-1 weaknesses have been used to attack X.509 certificates, but there again you have a case very unlike a software code repo, where the one doing the checking isn't another programmer but a program.


The problem is that we are considering an issue where different people can get different objects for the same hash. If the people checking all see the valid files, they cannot raise any alarms to save the poor victims who got poisoned with the wrong objects. They'll clone from the wrong fork, and no amount of checking hashes or signed tags will prevent them from running compromised code.


> If the people checking all see the valid files

...which will likely contain thousands of bytes of pseudorandom data in order to force the hash collision...

> they cannot raise any alarms

You think a human won't be able to notice that the diff from the last version they tested looks awfully funny? Code that can fool the compiler into producing an evil binary is one thing, but code that can pass a human code review is quite another.

You might be surprised how often that occurs.

I don't do a diff before each third-party DVCS repo pull, but I do diff the code when integrating such third-party code into my projects, if only so I understand what they've done since the last time I updated. Commit messages, ChangeLogs, and release announcements only get you so far.

Back when I was producing binary packages for a popular software distribution, I'd often be forced to diff the code when producing new binaries, since several of the popular binary package distribution systems are based on patches atop pristine upstream source packages. (RPM, DEB, Cygwin packages...)

Each time a binary package creator updates, there's a good chance they've had to diff the versions to work out how to apply their old distro-specific patches atop the new codebase.

Someone's going to notice the first time this happens, and my guess is that it'll happen rather quickly.


If this is your threat model, you don't need hashes or signed tags at all. Good for you. Thankfully both Fossil and Git disagree with you and take the threat seriously :)


That's an argument for why you shouldn't worry about sha1 attacks in source control, but we should take the attack for granted when discussing how to mitigate the attack.

If we weren't worried about sha1 collisions in git then we wouldn't switch to a new hash function.


When is the right time to worry? Maybe wait until someone publishes a practical attack, then wait years for the new code to get sufficiently far out into the world that you can switch to it?

I mean, I see you're expressing concern, but the first major red flag on this went up three years ago, and another big one went up last month. (https://sha-mbles.github.io/)

When we dealt with this same problem over in Fossil land, we ended up needing to wait most of three years for Debian to finally ship a new enough binary that we could switch the default to SHA-3. Fortunately (?) RHEL doesn't ship Fossil, else we'd likely have had to wait even longer.

Atop that same problem, Git's also got tremendously more inertia. Git has to wait out not only the Debian and RHEL stable package policies but also all of that infrastructure tooling they brag on. Every random programmer's editor, merge tool, Git front end... all of that which a project depends on will have to convert over before that one project can move to a post-SHA-1 future.

This is going to be a colossal mess.


Doesn't all of this apply to git just as well, except for the last bit about the MD5 hash?

It just seems to me that the Fossil maintainers have decided that keeping all old SHA1 hashes is acceptable, while the git maintainers have decided that it is not.

Unless I've misunderstood, this is why it was "so easy" for Fossil to transition to a new hashing algorithm. Not some superiority in the design of Fossil, as implied on the Fossil forums.


And if you are that concerned about this type of attack, it may be worth your time to simply start a new Fossil repository using the sha3-only hash policy (writing a script to replay commits into the new repo, so you don't lose history).

It seems like a problem very few people need to worry about and Fossil has made the right trade-offs.


In addition to D. Richard Hipp's thoughts as HN user SQLite — author also of Fossil, so he oughtta know — I offer these:

1. Keep in mind that Fossil and Git are both applications of blockchain technology, which in this particular practical case means you must not only forge a single artifact's hash, you must also do it in a way that allows it to fit into the overall blockchain.

2. Fossil's sync protocol purposefully won't apply Dr. Hipp's hypothetical evil.c to an existing Fossil blockchain if presented it. Fossil will say, "I've already got that one, thanks," and move on. Only new or outdated clones could be so-fooled.


> applications of blockchain technology

Are we saying this now? More like blockchain is an application of git technology.


No. We are not.

If you're looking for prior art, ZFS's application of Merkle trees predates both. I think there was some other public use before that, but I can't recall right now.



"blockchain" is self-descriptive, easier to pronounce (only two syllables instead of three), and easier to spell correctly. :-)


"blockchain technology" is a lot more syllables. Plus there's the downside of sounding like a loon.


They are also using "Hardened SHA1", which detects collision attacks, and assigns a longer id to commits which seem malicious, while being backwards compatible.


So if a repo has anyone commit to it using a new binary, then anyone accessing the repo will need the new binary as well?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: