Hacker News new | past | comments | ask | show | jobs | submit login
Lifetimes of cryptographic hash functions (valerieaurora.org)
153 points by tswicegood on July 29, 2013 | hide | past | favorite | 52 comments



I think SHA-2 should be "minor weakness discovered" (if not outright "unbroken"), not "weakened".

At the onset of the SHA-3 competition, everyone was nervous about SHA-2: it appeared as though a good attack was inevitable, what with the cryptanalytic attacks on SHA-1.

But as the competition went on, things got calmer. The attacks against SHA-2 that were so expected simply weren't coming[1]. And so now the status quo is that SHA-2 seems pretty darn safe, and the real focus of the SHA-3 competition shifted towards not necessarily having a direct replacement for SHA-2, in the sense of performance, but instead having a design that was sufficiently different to not allow SHA-2 attacks to apply to it. And Keccak is just that: quite different.

Anyway, my point is that SHA-2 is mislabeled. Honestly, I think cryptographers recommend it the most out of any of the hash functions currently; SHA-3's software performance is rather... lacking.

[1] Some may argue that this is because cryptographers were focused on the SHA-3 candidates, but I'm not so sure


BLAKE2 is a very good alternative if you want software performance: https://blake2.net/.

Just stop what you're doing and look at scrypt, bcrypt or even PBKDF2-HMAC-SHA512 if you're thinking something that involves the words "passwords" and "fast hash function." (http://throwingfire.com/storing-passwords-securely/#notpassw...)



Why is SHA-2 orange? As far as I know, besides length-extension, there's no known weakness on the full hash function.



The attacks are on reduced-round versions of SHA-256 and SHA-512. Those do not describe any attacks faster than (dumb) brute force against SHA-256 or SHA-512.

I think labeling SHA-256 as orange is highly misleading. The SHA-2 family of hashes is going nowhere unless the partial-round attacks get a lot closer to the full-round versions. They're still about 20+ rounds away.


If the SHA-2 family have weaknesses, and SHA-2 is used for generating Bitcoin blocks, whoever breaks this first will be an overnight millionaire, just make sure you break them slowly (about 20 a day max) to avoid suspicion that the hashing is compromised. Sell as much as possible and then release your paper.


Indeed, bitcoin is like a cryptography competition with ridiculously huge prize.

1. Break SHA2 -> control bitcoin generation ($2500 each generated block at current prices)

2. Break ECDSA -> unlock any addresses that have ever sent money on the blockchain

3. Break ECDSA+SHA2+RIPEMD160 -> break ALL addresses, even those that have never sent money.

Incidentally, the difference between 2 and 3 is why it is not recommended to reuse bitcoin addresses.


Breaking SHA2 (i.e. developing an economically-feasible preimage attack) would indeed crash bitcoin, but the converse is not true.

A near-collision attack on double SHA256 (if you treat it as a single hash not a pair of independent hashes) would also crash bitcoin, but would not necessarily be a threat to use of SHA256 for authentication purposes.

A bitcoin block solution just needs the hash to include enough leading zeros. Authentication (nearly always being automated) requires every bit to match - hitting 255 of 256 bits is no better than hitting 0 bits, as either way your message will be rejected.


Break any and the value of bitcoins will crash hard.

You can't take millions out of a system which doesn't actually have millions worth of liquidity backing it. The total worth of all bitcoins is actually far smaller than the market capitalization of all bitcoins because the trade of them is based off an assumption that many if not most coins are dead coins.


Well, MtGox monthly volume is over 1 million BTC [1] or $100 million at current prices. I think you should be able to take out at least a few million without raising suspicion

In case of SHA2 compromise in particular, generating, say 20% of daily blocks would hardly be noticed (and can be easily explained as a new shipment of ASICs). This is about 720 BTC or $72000 daily.

[1] http://bitcoincharts.com/markets/mtgoxUSD.html


The "slashdotter reaction" column is priceless!


Use of SHA-1 for digital signature generation has been deprecated by NIST since 2011. It's disallowed after 2013-- which is important for software aiming for government use.

http://csrc.nist.gov/publications/nistpubs/800-131A/sp800-13...


I guess 2004 was a crazy year for cryptography.


Indeed it was. Wang's breakthrough work [1,2] broke most of the common hash functions at the time, and later also SHA-1 [3]. The SHA-3 competition was motivated by this streak of new successful attacks.

[1] http://eprint.iacr.org/2004/199

[2] http://www.iacr.org/cryptodb/archive/2005/EUROCRYPT/2868/286...

[3] http://link.springer.com/chapter/10.1007%2F11535218_2


It's the year 64-bit CPUs became consumer commodoties.

http://en.wikipedia.org/wiki/64-bit_computing#64-bit_process...


>[1] Note that 128-bit hashes are at best 2^64 complexity to break; using a 128-bit hash is irresponsible based on sheer digest length.

Can a short hash which has not been weakened be lengthened by taking two hashes and concatenating?

    fixedSalt = "blah"
    longerHash = (salt, input) ->
        hash(salt + input) + hash(salt + fixedSalt + input)
Edit: Never mind. Obviously an attacker would only have to break the first half of the resulting hash.

But is there any valid way to lengthen a too-short hash? Not that it's of practical importance; I'm just curious academically.


This is actually a pretty interesting question. The answer, at least for Merkle-Damgard hash functions (MD5, SHA-1, SHA-2, etc) is that concatenating (or "cascading") hash functions doesn't really improve the strength of the resulting construction.

Merkle-Damgard hash functions look like this:

  function MD(M, H, C):
    for M[i] in pad(M):
      H := C(M[i], H)
    return H
For message M, initial state H, and compression function C. In other words, pad the message, break it into blocks, and use the compression function to "mix" each block into the hash function's state. The final state is the result.

Antoine Joux showed that for any MD hash function, generating many collisions is not much more difficult than generating one. Here's how:

  1. Take the initial state H[0].
  2. Find two single-block messages that collide under C with H[0] as the input state. Call the result H[1].
  3. Now find another pair of single-block messages that collide under C with H[1] as the input state. Call the result H[2].
  4. Iterate as needed.
Here's the trick: each single-block collision you find is actually doubling your set of colliding messages. This is because for each block of the message, you can select either of two candidate blocks. So if the effort required to find a collision in a b-bit hash function is 2^(b/2), the effort to find 2^n such collisions is only n * 2^(b/2).

What does this mean for cascaded hash functions? Well, consider a hypothetical construction that simply concatenates a 128-bit hash function with a 160-bit function. A plan of attack might look like this:

  1. Find 2^80 colliding messages under the 128-bit function. This should take roughly 80 * 2^64 ~= 2^70.3 units of "effort".
  2. Evaluate each message under the 160-bit function. There's probably a collision in there somewhere. This will take around 2^80 work, dwarfing what we did in the first step.
The effort to find a collision under both functions is thus only about what it takes to find a collision under the stronger of the two.

Shameless plug: we will have a few problems exploring the consequences of Joux collisions in set seven of the Matasano crypto challenges.


Ah, so in fact the naive concatenating solution I gave, in addition to being just as easy because the attacker only has to break half of it, is actually even easier because the attacker has two targets to collide with.

What about non-concatenative methods? For example, could you do something like "shift and xor". For example, say you have a 4-byte hash function, so that:

   hash(salt + input)             :  0x3AF9
   hash(salt + fixedSalt + input) :  0x8034
then you shift one by a byte and xor like so

       3AF90
   xor 08034
     = 32FA4
And then you could iterate that to extend to as many bytes as you want? And then maybe you'd want to xor the first and last byte together so that nothing from the first hash remains - although now I'm thinking intuitively which generally seems to be a bad idea with crytpography.


> Ah, so in fact the naive concatenating solution I gave, in addition to being just as easy because the attacker only has to break half of it, is actually even easier because the attacker has two targets to collide with.

I wouldn't say it's easier. Remember that we need to find a single message that generates a collision under both hash functions. So our strategy is to generate a massive number of collisions for the shorter function and hope that there's one pair of messages in there that collide under the longer function.

> What about non-concatenative methods?

I think this will again boil down to finding a single message that generates a collision under both hash functions. It won't matter too much whether you XOR the hashes or concatenate them.


Hmm. It still seems to me that the simple concatenative method could still be broken at least slightly more quickly, since each function can be collision tested separately, but my brain is vaguely gesturing at comprehension about why xoring is still weak.

I feel like we should be teaching the principles of crypto to young children so that we end up with some humans that can grok it as easily as the rest of us do algebra. But there are a great many things I would want young children to be taught if I were made the Benevolent Dictator of All School Boards.


As an aside, you can use PBKDF2 with an iteration count of 1 to extend the output of a hash function to arbitrary lengths:

    output = HMAC-SHA256("passphrase", "salt" + x"00000001")
           + HMAC-SHA256("passphrase", "salt" + x"00000002")
          ...
This is useful when you want to derive keys from a (strong) master secret, but if you don't trust the underlying algorithm, all bets are off.


Hmm. That seems like it's still susceptible to the weakness sdevlin mentioned. For c = 1, it is the same thing that I suggested initially, and for larger c, it merely makes the hash more expensive to compute by iterating it, but the concatenation weakness would still hold, wouldn't it? It seems to me that this is perhaps useful for generating keys from passphrases, but not for lengthening a hash used to store a password in a database.


Yes, I think so. IIUC, with this construction, the difficulty only increases linearly with the length of the output, rather than exponentially, as one might expect.

I don't think there's a generically secure way to extend short hash functions to get an exponential difficulty increase. Otherwise, we could just construct arbitrary-length hash functions using small (e.g. 32-bit) building blocks without needing to cryptanalyze the result.

But then again, I haven't been paying attention to the literature lately.


Sorry, why is RIPEMD-160 deprecated? I've been unable to find any supporting information as to why.


it appeared after sha1, receives less attention than sha1, and is slower than sha1... so why use it at all?


Many use it because it wasn't developed by the NSA. It is a default in TrueCrypt, for example.


It's used in Bitcoin as well (after SHA-2) to shorten up addresses. Come to think of it, I think Tor uses it for .onion addresses... which they shorten to 80 bits (of preimage resistance!)


Confession time: i still have some apps with salted md5 hashed passwords


The first step is accepting that you have a problem. Fortunately, it's fairly straightforward to fix the problem!

Previous discussion of how to handle this scenario: https://news.ycombinator.com/item?id=2689149


You shouldn't be using any of the functions on that page directly, anyway: http://throwingfire.com/storing-passwords-securely/#notpassw...


But it does matter, because if you can exploit a weakness in the hash function you can figure out the salt, strip it, and then use your precomputed dictionary.


That's not really how it works. Either way, you shouldn't be using just a regular "hash function" anyway. Even basic constructions like PBKDF2 use HMAC constructions where SHA1 and even MD5 are still pretty safe to use (although not very computationally expensive.)


So migrate? Hash the hashes with Bcrypt or PBKDF2.

Bcrypt(MD5(password)) is just as effective as Bcrypt(password) at knocking brute force attempts on the head.


Unless the Bad Guys already have the MD5 hashes


Then the problem is not "I support an app that uses md5 hashes" any more. Your problem in that case is "all my users accounts were broken."


Right, but you can still perform the migration I suggested and flag accounts for a password reset on next login.


I was supporting an installation with md5 unsalted passwords

let us hang our heads in shame.


For more information on the 'weakened' state of SHA-2, see http://eprint.iacr.org/2004/207

(Full text PDF: http://eprint.iacr.org/2004/207.pdf)


It'd be nice to have whirlpool in the list - I remember when it was seen as the great new hope for a good hash, but I haven't heard anything about it in recent years.


The Expert/Programmer/Non-expert reactions at the bottom are priceless.


The "reactions" table at the bottom made my morning.


2004 was a bad year for cryptographic hash functions...


It's missing the most important ones: Scrypt, PBKDF2, Bcrypt.

https://en.wikipedia.org/wiki/Scrypt

https://en.wikipedia.org/wiki/Bcrypt

https://en.wikipedia.org/wiki/PBKDF2

Scrypt being an absolute nightmare to bruteforce, even for short passwords.

http://i.stack.imgur.com/sOMvu.png


These aren't cryptographic hash functions exactly, though, at least not in the sense that a cryptographer would think. I mean, they will fit just about any definition of a cryptographic hash function you can think of, but really it's not that useful to label them as such. Instead, they're usually called key derivation functions.

On top of that, even if we were to include these in a hash function list, they're decidedly not the most important ones. Most important for password storage and other key derivation, perhaps, but the applications of hash functions are far more general. The preimage resistance of scrypt is reliant on SHA256, for instance.


Cryptographic hash functions must be efficient to compute. Those examples (scrypt, bcrypt, etc) were designed to be difficult to compute. Those are password hash functions, not cryptogrpahic hash functions. Two totally different things with different purposes.


Well, in complexity theory (which theoretical cryptography uses heavily), an efficiently computable function is one that has a polynomial-time algorithm to compute it.

I mean, wouldn't you say that scrypt is efficient to compute? For instance, is 5 seconds not a relatively quick function evaluation? Compare that to super-polynomial-time attacks, some of which wouldn't succeed before our Sun burned out and Earth died. And if you ramp up the security parameters to an insane degree, the user can no longer compute the function themselves. That's the reason for the "efficient to compute" clause in most definitions.

So, while you're right that the fact that KDFs are designed to be much slower than hashes is what really separates them, that doesn't disqualify KDFs from (technically) being cryptographic hash functions. At least, not if you view the definition in a theoretical sense, which is the appropriate way to do so. Still, I agree with your premise; in a practical sense, KDFs shouldn't feel like they are cryptographic hashes, since their purpose is markedly different.


Scrypt is being used by a few altcoins (litecoin for one), that's what makes it interesting for some people.


Can you explain why a 40 char text is 1000 times easier to crack than a 10 char password according to the graphic? Is it assumed you don't use any numbers/symbols and the attacker knows your dictionary?


The "40 char text" is based on NIST guidelines for estimating the entropy in English text -- i.e., dictionary words which make grammatical sense together. The "10 char password" is for 10 random printable ASCII characters.


I think "text" is english + whitespace + punctuation only, whereas "password" is any kind of character.

I didn't create that image, so I'm not 100% sure.


You're correct. See the original scrypt paper by Percival [1], halfway down page 13, for a description of the categories. The table itself is at the top of page 14.

[1] https://www.tarsnap.com/scrypt/scrypt.pdf




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: