Agree, but a collision even for md5 is a relatively rare event. When brute-forci...

jsjohnst · on Dec 15, 2016

I've done it before on a 1 billion word / password list and didn't get any collisions.

cm2187 · on Dec 15, 2016

That being said md5 does generate collisions. I was playing with the IMDB movie database that you can download. They use a combination of the title and the year as a primary key. I tried using an md5 instead to save space (but giving a reproducible ID instead if an identity column), and got many collisions. No collision with SHA256.

schoen · on Dec 15, 2016

Wait, what? No MD5 collisions at all were publicly known until Xiaoyun Wang disclosed one in 2004 using a new cryptographic technique she invented (explained in Wang and Yu's "How to Break MD5 and Other Hash Functions").

MD5 has a 128-bit output so collisions that occur by chance should require about 2⁶⁴ inputs (18 exa-inputs). Surely your database didn't contain over 2⁶⁴ different movie records.

Could you take a look at what you were doing again? Your description doesn't really make sense mathematically.

cm2187 · on Dec 15, 2016

You must be right. I can't reproduce it. I must have fucked something up then.

danielweber · on Dec 15, 2016

You likely goofed something up. No one has demonstrated two strings that are conceivably used as passwords that users type in -- and that includes the tuple {movie title:year} -- that have MD5 collisions.

The security problem with MD5 isn't collisions.

cm2187 · on Dec 15, 2016

I think you are right, I can't reproduce it.

b2600 · on Dec 15, 2016

What you're describing is not possible given the database you tested. Are there more details that would clarify your post?

jsjohnst · on Dec 15, 2016

Oh, of course md5 has collisions. It's relatively easy (not computationally easy, but there are known methods) to find two random strings that hash to the same value, it's just very difficult to find a string that hashes to the value of a specific other string.

schoen · on Dec 15, 2016

Not "relatively easy" by chance: it should require 2⁶⁴ entries in your database to see a single collision happen at random! It's only "relatively easy" following cryptographic research in the early 2000s that exploits structure in MD5 to produce collisions deliberately.

Yes, collisions are easier than preimages, but they still shouldn't occur by chance in real applications!

jsjohnst · on Dec 15, 2016

Realized my wording was way to ambiguous, clarified. Thanks!

noonespecial · on Dec 15, 2016

Very nice. Thanks for that. So yes, this is likely the thing to do in this situation.