Why it’s harder to forge a SHA-1 certificate than to find a SHA-1 collision

agwa · on Dec 22, 2015

We're lucky that hash collision attacks have a relatively simple mitigation like this. (Although you have to trust CAs to follow the rules and implement it properly, and events of the last few years indicate that CAs need to have as few things to screw up as possible.)

However, we're not always going to be so lucky. The next major transition in digital certificates could very well be to post-quantum crypto due to advancements in quantum computing. Under that scenario, attackers will be able to simply compute a CA's private key and sign arbitrary certificates. There will be no mitigation short of clients ceasing to trust pre-quantum certs. But clients won't be able to do that unless servers are using post-quantum certs, and server operators won't want to do that if it would mean cutting off legacy clients that don't support post-quantum certs.

The solution to this first mover problem is to set a hard deadline after which legacy certs are retired. This forces clients and server operators to act. Pushing back the SHA-1 deadline at the 11th hour as CloudFlare proposes sends a dangerous message that such deadlines don't have to be taken seriously. This message will come back to haunt the Internet in the future.

EGreg · on Dec 22, 2015

Can someone explain to me in plain language / layman's terms how a quantum computer is supposed to reverse cryptographic hash functions? What would be the process EXACTLY?

agwa · on Dec 22, 2015

It has nothing to do with "reversing" hashes. The attacker would use the quantum computer to determine the CA's private key (e.g. by factoring the RSA modulus using Shor's Algorithm), and would then be able to sign any hash they want. No need to attack the hash function; in fact hashes remain secure under quantum computing.

tromp · on Dec 22, 2015

> in fact hashes remain secure under quantum computing.

Hashes will see their security cut in half, in terms of the effort needed to find a pre-image. (EDIT: security in bits = log of #evaluations needed)

E.g. finding a SHA256 pre-image, which amounts to a search over a space of 2^256 candidates, can be sped up using Grover's algorithm, to roughly 2^128 hash evaluations.

mmozeiko · on Dec 22, 2015

That's not half. Its square root of n. By cutting effort in half of 2^n you get only 2^(n-1).

droffel · on Dec 22, 2015

The effective number of bits of entropy are cut in half.

mmozeiko · on Dec 22, 2015

Well he said "in terms of the effort needed to find a pre-image". For that effort won't be half.

agwa · on Dec 22, 2015

That's true, but it's easy to fix by using longer hashes (e.g. SHA-512). RSA and ECC are broken beyond hope.

tromp · on Dec 22, 2015

Hashcash, the proof of work system employed by Bitcoin with hash function SHA256, is also vulnerable. Where a current mining chip might search a space of 6e13 nonces in 10mins, a quantum computer employing Grover's algorithm might be able to search a space of 1e18 nonces, even with much fewer circuits and slower cycle time (although in Bitcoin this is complicated by having limited nonce entropy in the header)

Note that the potential for speedup is due to the extremely small time needed for a single proof attempt. A PoW that only allowed a hundred proof attempts in the block interval time would hardly be vulnerable.

pyre · on Dec 23, 2015

Theoretically, a quantum computer could reduce brute force time to a minimal amount. But really there are no "general purpose" quantum computers at the moment (so far as I know).

paulirwin · on Dec 23, 2015

When you say post-quantum certs, do you mean using a technique like key stretching (i.e. PBKDF2) to slow down the brute force time?

agwa · on Dec 23, 2015

I mean certs using post-quantum crypto for signatures (e.g. NTRU, hash-based signatures) rather than RSA or ECC, which are utterly defeated by quantum algorithms.

guelo · on Dec 23, 2015

Also when we develop cold fusion, faster than light time travel, and exploit the many worlds theory crypto will be harder.

TazeTSchnitzel · on Dec 22, 2015

  $ curl -s https://blog.cloudflare.com/content/images/2015/08/white.jpg | md5
  ccf22bc377846166ed65cd3cd58d2e3d
  $ curl -s https://blog.cloudflare.com/content/images/2015/08/brown.jpg | md5
  810cac197d97da7b216c7883be523495
  $ curl -s https://blog.cloudflare.com/content/images/2015/08/black.jpg | md5
  6bede506abffe08d0c2406d92fbff393

Let me guess, the CloudFlare CDN is recompressing the images? :D

diziet · on Dec 22, 2015

The original article has the correct images:

  $ curl -s http://www.fishtrap.co.uk/black.jpg.coll | md5
  b69dd1fd1254868b6e0bb8ed9fe7ecad
  $ curl -s http://www.fishtrap.co.uk/brown.jpg.coll | md5
  b69dd1fd1254868b6e0bb8ed9fe7ecad
  $ curl -s http://www.fishtrap.co.uk/white.jpg.coll | md5
  b69dd1fd1254868b6e0bb8ed9fe7ecad

grittygrease · on Dec 22, 2015

Good catch. CloudFlare's optimizations are often too good for their own good. Fixing.

pcl · on Dec 22, 2015

If you download the files to disk, you'll get the right values:

    $ curl -s https://blog.cloudflare.com/content/images/2015/08/white.jpg > file && md5 file                                                 
    MD5 (file) = b69dd1fd1254868b6e0bb8ed9fe7ecad

I've seen this same sort of thing happen with curl in other contexts, but I've never tracked down the details. I assume it has something to do with file stream handling in certain versions of curl. You'll see a discrepancy if you look at the output of piping to 'wc' vs. examining the downloaded file.

schoen · on Dec 22, 2015

I ran the same curl command twice in a row and got files with different sizes! I think it actually does have to do with Cloudflare rather than with curl in this case.

richardwhiuk · on Dec 23, 2015

Apparently sometimes the CDN is stripping the EXIF data

KirinDave · on Dec 22, 2015

It's worth noting that SHA1 is also suitable for use in HMAC on older hardware, security is not significantly compromised by SHA1's properties.

You can move to more modern algorithms, but there isn't a pressing need to remove SHA1 implementations for that application.

jud_white · on Dec 22, 2015

I have a question I haven't been able to find an answer to, hopefully someone here can help.

Why is HMAC+(hash) considered secure, while being considerably faster than say bcrypt with a cost of 12? For example, if a service used a user provided password to validate a "secret" (what would normally be the signed message), is that less secure than bcrypt? If so, what makes guessing the secret used in HMAC difficult?

aidenn0 · on Dec 23, 2015

Presumably you will use a key-stretching algorithm before applying HMAC.

[edit]

So yes, directly using a password in HMAC is a bad idea, and less secure than using some function designed for deriving keys from passwords.

You are confusing two things about sha-1. Assuming a 100% secure cryptographic hash function that is as fast as sha-1, you should not use it directly for hashing passwords (though it can be part of a larger construction like PBKDF2).

This is because the number of passwords you can check per second in an offline attack is related to the speed of the hash function, and bcrypt (and pbkdf2, and scrypt, argon2) are all various ways of slowing down the hashing process.

Similarly it is likely that md5 would be roughly as secure for protecting passwords as sha-256 when used in pbkdf2 because the known weaknesses of md5 would not be of assistance in performing a dictionary attack against a hashed password.

If you have a high-quality key then HMAC is secure without needing the hash function to be slow, so if you wanted to use a password with HMAC, you would first use a KDF to generate a high-quality key from the password, and then use the key in hmac. This is similar for any cryptographic tool that wants a high-quality key as input (i.e. most of them).

[edit2]

A shorter answer is that HMAC is secure and fast because its input key already has sufficient entropy. bcrypt is slow because its whole point is to make it difficult to attack a key with low entropy.

jud_white · on Dec 23, 2015

Thanks for the varying levels of explanation (thanks to viraptor too). I think part of the reason I was confused is because GitHub's web hook setup allows for a supplied shared secret which, based on what I understand from above, is not as secure as it could be unless the user ensures the shared secret has sufficient entropy. If I'm still not getting it please let me know. Thanks again.

aidenn0 · on Dec 23, 2015

A quick 30s scan of the webhooks docs looks like that is correct; if you used e.g. 12345 as your secret, you would be susceptible to a dictionary attack from anybody who was able to record a message, on the order of 10s of billions of keys per second can be tested with a multi-GPU setup.

I suspect that the web hooks typically run over TLS, so recording the plaintext of a request would be a challenge in and of itself.

KirinDave · on Dec 23, 2015

If your shared secret is vulnerable to brute forcing, it's vulnerable to brute forcing. An easy fix for this: generate your shared secret by hashing or salthashing a low-entropy password.

As a general rule though, HMAC is used with randomly generated secrets. I don't know why GitHub doesn't just tell you the secret.

Amazon's implementation is much more correct.

viraptor · on Dec 22, 2015

I'm not a crypto master, but as I understand it, while it's possible to find md5 collision, it's many times more difficult in hmac, because it's:

    hash((secret+pad1) | hash((secret+pad2) | message))

so you would have to find a collision of one key that matches collision of another key, so you're back to relying on basic birthday attack rather than any specific hash weakness.

If you're looking for an actual proof, it's at http://cseweb.ucsd.edu/~mihir/papers/hmac-new.html

taejo · on Dec 23, 2015

bcrypt and HMAC fill different roles and have different security properties: bcrypt is a key-derivation function, and HMAC is a message authentication code. They're not comparable. In particular, HMAC should be used with a high-quality key; bcrypt is for deriving high-quality keys from lower-quality keys.

nly · on Dec 22, 2015

Aren't there more secure hashes than SHA1 that are also faster? Like BLAKE2, which can be configured for 128 or 160 bit output?

KirinDave · on Dec 23, 2015

The security of hmac is not based strictly on the collision resistance of the hashing function.

Oh, and if your HMAC seals over an origin timestamp which your API respects, you've gone and made things even harder.

ben174 · on Dec 22, 2015

Odd, I'm getting completely different MD5s from the three example images.

  ~  curl -s https://blog.cloudflare.com/content/images/2015/08/white.jpg | md5

ccf22bc377846166ed65cd3cd58d2e3d

  ~  curl -s https://blog.cloudflare.com/content/images/2015/08/brown.jpg | md5

810cac197d97da7b216c7883be523495

  ~  curl -s https://blog.cloudflare.com/content/images/2015/08/black.jpg | md5

6bede506abffe08d0c2406d92fbff393

davesque · on Dec 22, 2015

Cloud flare is mucking around with the payload to save some bytes, probably.

viraptor · on Dec 22, 2015

I don't get the issue with serial numbers. For the browser, it's a completely opaque random number - it doesn't matter what it is and how it changes.

Is the issue here that they're talking about CA collisions and need the "authority key identifier" extension to match? This shouldn't matter when colliding with service certificates.

ademarre · on Dec 22, 2015

Nice write-up, but it's slightly misleading or confusing to not explain that Nat McHugh's image collisions were chosen-prefix attacks. The post makes it sound like the images were the product of some unexplained collision, and then goes on to explain how chosen prefix can be used to forge certificates.

dsp1234 · on Dec 22, 2015

to not explain that Nat McHugh's image collisions were chosen-prefix attacks.

In the paragraph right before those images, it says:

"This was cleverly demonstrated by Nat McHugh, who used a chosen-prefix hash collision"

ademarre · on Dec 22, 2015

Thanks for pointing that out. It was updated after initial publication.

Originally it said:

> This was cleverly demonstrated by Nat McHugh, who used the broken hash function MD5 to create two images with the same hash...

kazinator · on Dec 23, 2015

Let me guess: birthday paradox? It's harder to find someone in the room who has the same birthday as you, than to find a pair of people who have the same birthday.

bracewel · on Dec 22, 2015

Guh, until CF/FB can provide some data that shows users with no upgrade path are genuinely going to be effected by this, and not connections MITM'd by some crappy AV or other random middlebox the LV proposal seems like a pretty silly idea...

https://www.cabforum.org/pipermail/public/2015-December/0064...

gue5t · on Dec 22, 2015

Software "with no upgrade path" that hasn't been proven correct relative to a formal specification and deployed in an appropriate context (sufficient conditions to consider it a "not software" black box) should be considered a mistake and the responsible parties should be held accountable for reparations. Complacency with its continued existence is unsustainable.

Obviously this is an idealistic "should", but we need to take every available step to move towards this stance because a policy of limiting the networked universe based on the worst common denominator client results in an inescapable black hole of technical debt.

Open-source software plays into this model beautifully because it makes it much easier to propagate improvements across the entire space of systems. There remain some problems such as langauge interoperability ("best implementation of protocol X is in language Y but our system is written in Z and the Y-Z interop story is bad") which I'd like to see people give more attention. We need to address the quadratic workload of porting/binding every library to every language.

specialk · on Dec 23, 2015

There is a great clourflare article [1] that has talked about this before when they first suggested LV certs.

> The seemingly good news is that globally, SHA-2 is supported by at least 98.31% of browsers. Cutting 1.69% off the encrypted Internet may not seem like a lot, but it represents over 37 million people.

There is also an interesting discussion in Security Now #538 [2] there is also a transcript of the show [3]. Skip to page 2 of 39 just where Leo says "Yeah". Android 2.2 and Windows XP SP 2 are on the list of things that don't support SHA-2. These devices exist particularly in the developing world. It sends the wrong message, to the developing world in particular, if we don't support HTTPS for them. It encourages websites in areas where it isn't 1.69% of their users but maybe 5% of their users to just not enforce TLS. TLS with a SHA-1 signed LV cert is better than no security at all.

Facebook's also has a cool server add-on to dynamically serve LV certs to those who need them is very promising. If it is in-production at Facebook it is bound to be good.

[1] https://blog.cloudflare.com/sha-1-deprecation-no-browser-lef...

[2] https://www.grc.com/securitynow.htm

[3] https://www.grc.com/sn/sn-538.pdf

yuhong · on Dec 23, 2015

The Android one is an error from a GlobalSign page later corrected. Android uses OpenSSL I think and has supported SHA2 certificates since 1.0.

toast0 · on Dec 23, 2015

Most releases of Symbian don't support sha2 certs with the system SSL library (which is what the built in browser uses). Many Symbian devices are not upgradable to the release that does support it.

yuhong · on Dec 22, 2015

I was proposing restricting to manual issuance of OV certs given that automated issuance is the source of most attacks.

leereeves · on Dec 23, 2015

Can an attacker bypass this by using the serial number and validity period from an already issued certificate?