collisionS for the full MD5 hash function was published by Xiaoyun Wang, Dengguo Feng, Xuejia Lai and Hongbo Yu in 2004, in their paper "*Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD*" (the other hash functions were considerably weaker, and of less interest). The novelty in their technique, from what I've been able to glean, uses differential cryptanalysis, but simultaneously examining differentials of 32-bit subtraction and XORs ("1-bit subtractions"). The generated collisions have very few (6) different bits -- these are *not* random collisions, in any form.

Here's one such collision:

d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 **8**7 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 **7**1 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd **f**2 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 **b**4 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 **a**8 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 **2**b 6f f7 2a 70

and
d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 **0**7 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 **f**1 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd **7**2 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 **3**4 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 **2**8 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 **a**b 6f f7 2a 70

Rather than bore you with number of just how quickly such collisions can be found today, I give an anecdote: The paper was given (in "extended abstract" form) at a conference. There was much buzz the day before -- an MD5 collision was going to be published! As soon as the abstract was seen in the conference proceedings, people rushed to check that the 2 texts had the same MD5 hash value. I did the same, actually.

Sure enough, *they didn't*. A series of quick talks with 3 of the authors revealed a problem: they had attacked the MD5 algorithm published in the Chinese language edition of Bruce Schneier's __Applied Cryptography__. And the translation had a minor endiannity bug!

Running the texts through the "bugged" algorithm yielded a collision. Very impressive, to be sure, but considerably less in terms of shock value.

Luckily, the 4th contributor to the paper had stayed home. A series of phone calls back to him explained the problem, and he fixed the attack programs. A collision for the real MD5 algorithm was ready the next morning, in time for the conference talk.

Finding MD5 collisions is *that* fast. Actually, it's been sped up since then.

A hash collision for the MD5 ("secure") hash function had been known since at least 1996, if we use different (nonstandard!) initial values. Hans Dobbertin of the German Information Security Agency published a collision: an initialization vector IV and two inputs x, x' for which MD5(IV;x)=MD5(IV;x'). Quoting (but reformatting from PostScript) from the paper "*Cryptanalysis of MD5 Compress*":

Collision for the compress function of MD5. Use the initial value

IV = 0x12AC2375 0x3B341042 0x5F62B97C 0x4BA763ED

and define the first input X = (X_{i})_{i<16} by setting:
X_{0} = 0xAA1DDA5E X_{4} = 0x1006363E X_{8} = 0x98A1FB19 X_{12} = 0x1326ED65
X_{1} = 0xD97ABFF5 X_{5} = 0x7218209D X_{9} = 0x1FAE44B0 X_{13} = 0xD93E0972
X_{2} = 0x55F0E1C1 X_{6} = 0xE01C135D X_{10} = 0x236BB992 X_{14} = 0xD458C868
X_{3} = 0x32774244 X_{7} = 0x9DA64D0E X_{11} = 0x6B7A669B X_{15} = 0x6B72746A

The second input X'= (X'_{i})_{i<16} is defined by setting X'_{i} = X_{i} (i < 16; i ≠ 14) and X'_{14} = X_{14} + 2^{9}.
Then we have a collision, i.e. MD5-compress(IV ; X) = MD5-compress(IV ; X');
and this common compress value is

0xBF90E670 0x752AF92B 0x9CE4E3E1 0xB12CF8DE.

The computation of such a collision takes about 10 hours on a Pentium PC.

The paper already recommended using

SHA-1 or

RIPEMD-160, instead.