It seems to me that any supposedly inefficient "Hello World" program which actually contains the string "Hello world\n" in its source code (excluding comments) is bogus since such a program could just print out the string directly.

Here's a program which doesn't contain the string "Hello world" anywhere except in the comments. It generates a string of twelve characters randomly selected from the set of lower and upper case letters, the space character and the newline character. It then checks if the string's MD5 checksum is equal to the hard-coded MD5 checksum of "Hello world\n". If the checksums match then it prints the string and terminates. Otherwise, it begins again with another randomly generated twelve character string.

There are 5412 or 614,787,626,176,508,399,616 twelve character strings containing the characters used by the program. If the program generates and MD5 checksums 100,000 random strings per second (roughly what a 2GHz Athlon should be capable of) then the correct string should appear in about one hundred million years.


In fairness, it should be noted that this program could produce the wrong output as the MD5 checksum algorithm is known to not necessarily produce different hash values for different input values (even if the input values are as short as 16 bytes). See ariels' md5 hash function writeup for details.

The input value to the MD5 algorithm in this program is the 12 character string "Hello world\n". Since this string is four bytes shorter than the sixteen byte value computed by the MD5 function, it isn't clear if there is another different 12 character string (using the same character set as used by this program) that results in the same MD5 hash value as "Hello world\n" generates. This may not seem very important but a program which is going to run for one hundred million years really should use an algorithm which is known to be correct!

P.S. Don't try this at home . . .


Here's a few questions for the cryptology folks or those with a few hundred million years of CPU cycles to burn:

  • An MD5 checksum is 16 bytes long. Each of the 256 different byte sequences of length one generate different MD5 checksums. On the other hand, there must be many pairs of byte sequences of length seventeen which both generate the same MD5 checksum.

    Question: are there any pairs of byte sequences shorter than 16 bytes which both generate the same MD5 checksum?

    if the answer is no then the program below will, eventually, produce the correct answer. If the answer is yes then more work is required before we can tell if the program necessarily eventually produces the correct answer.

  • Is there a second twelve character string containing characters from the set a-z, A-Z, space and newline which generates an MD5 checksum equal to the one generated for the twelve character string "Hello world\n"?

    If the answer is yes then the program below might produce the wrong answer. If the answer is no then the program below will, eventually, produce the correct answer.

One last question:

Assume:
  • that computers will continue to get faster and faster over time and that humanity survives long enough for the algorithm illustrated by this program to run to completion.
  • that whatever computer the program is started on will continue to operate without interruption until the program terminates successfully.
  • that computers continue to increase in speed at a long term average rate of a doubling in performance every two years.
  • that if you start the program on a current vintage (i.e. 2003) computer then it will take exactly 100 million years to get a result.
In which year's January should the program be started (on a then current computer) in order to get the correct result as quickly as possible using a single computer?

Hint: if all of the assumptions are true then many of the folks who read this writeup in 2003 will almost certainly still be alive when it comes time to start this program. See my homenode for the answer.


/*
 * Print's "Hello world\n" to stdout.
 *
 * How it works:  it generates strings of randomly selected letters,
 * spaces and newline characters and computes the MD5 checksum of
 * each string.  Once it get's a string whose MD5 checksum matches
 * the MD5 checksum of "Hello world\n", it prints the string and
 * terminates.
 *
 * N.B. This program uses the standard srandom and random functions to
 * generate the random strings.  Make sure that your implementation of
 * these functions has sufficient entropy to give you a decent chance
 * of generating the "Hello world\n" string!
 */

/*
 * md5data is a routine which computes the MD5 checksum of
 * the specified data bytes and returns the 16-byte checksum
 * into the 4 element array csum.
 * Many MD5 implementations are available on the 'net.  Pick your
 * favourite one (or use the one in the md5 hash function node) and
 * write an interface routine for it called md5data which takes
 * the parameters described below.
 */

extern void md5data( void *data, int length, int csum[] );

/*
 * pre-computed MD5 checksum of "Hello world\n"
 */

int md5[4] = { 0xf0ef7081, 0xe1539ac0, 0x0ef5b761, 0xb4fb01b3 };

main()
{
    srandom(0);
    while (1) {
        char msg[13];
        int tmp[4];
        int i;
        for ( i = 0; i < 12; i += 1 ) {
            msg[i] =
            "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ \n"[
            random() % (2 * 26 + 2)
            ];
        }
        msg[12] = '\0';
        md5data(msg,strlen(msg),tmp);
        if ( md5[0] == tmp[0]
        &&   md5[1] == tmp[1]
        &&   md5[2] == tmp[2]
        &&   md5[3] == tmp[3] ) {
            write(1,msg,strlen(msg));
            exit(0);
        }
    }
}