collisionS for the full MD5 hash function was published by Xiaoyun Wang, Dengguo Feng, Xuejia Lai and Hongbo Yu in 2004, in their paper "Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD" (the other hash functions were considerably weaker, and of less interest). The novelty in their technique, from what I've been able to glean, uses differential cryptanalysis, but simultaneously examining differentials of 32-bit subtraction and XORs ("1-bit subtractions"). The generated collisions have very few (6) different bits -- these are not random collisions, in any form.
Here's one such collision:
d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 87 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 71 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd f2 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 b4 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 a8 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 2b 6f f7 2a 70
and
d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 07 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 f1 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd 72 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 34 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 28 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 ab 6f f7 2a 70
Rather than bore you with number of just how quickly such collisions can be found today, I give an anecdote: The paper was given (in "extended abstract" form) at a conference. There was much buzz the day before -- an MD5 collision was going to be published! As soon as the abstract was seen in the conference proceedings, people rushed to check that the 2 texts had the same MD5 hash value. I did the same, actually.
Sure enough, they didn't. A series of quick talks with 3 of the authors revealed a problem: they had attacked the MD5 algorithm published in the Chinese language edition of Bruce Schneier's Applied Cryptography. And the translation had a minor endiannity bug!
Running the texts through the "bugged" algorithm yielded a collision. Very impressive, to be sure, but considerably less in terms of shock value.
Luckily, the 4th contributor to the paper had stayed home. A series of phone calls back to him explained the problem, and he fixed the attack programs. A collision for the real MD5 algorithm was ready the next morning, in time for the conference talk.
Finding MD5 collisions is that fast. Actually, it's been sped up since then.
A hash collision for the MD5 ("secure") hash function had been known since at least 1996, if we use different (nonstandard!) initial values. Hans Dobbertin of the German Information Security Agency published a collision: an initialization vector IV and two inputs x, x' for which MD5(IV;x)=MD5(IV;x'). Quoting (but reformatting from PostScript) from the paper "Cryptanalysis of MD5 Compress":
Collision for the compress function of MD5. Use the initial value
IV = 0x12AC2375 0x3B341042 0x5F62B97C 0x4BA763ED
and define the first input X = (Xi)i<16 by setting:
X0 = 0xAA1DDA5E X4 = 0x1006363E X8 = 0x98A1FB19 X12 = 0x1326ED65
X1 = 0xD97ABFF5 X5 = 0x7218209D X9 = 0x1FAE44B0 X13 = 0xD93E0972
X2 = 0x55F0E1C1 X6 = 0xE01C135D X10 = 0x236BB992 X14 = 0xD458C868
X3 = 0x32774244 X7 = 0x9DA64D0E X11 = 0x6B7A669B X15 = 0x6B72746A
The second input X'= (X'i)i<16 is defined by setting X'i = Xi (i < 16; i ≠ 14) and X'14 = X14 + 29.
Then we have a collision, i.e. MD5-compress(IV ; X) = MD5-compress(IV ; X');
and this common compress value is
0xBF90E670 0x752AF92B 0x9CE4E3E1 0xB12CF8DE.
The computation of such a collision takes about 10 hours on a Pentium PC.
The paper already recommended using SHA-1 or RIPEMD-160, instead.