algorithmic information theory

In standard information theory, the discipline launched by Claude Shannon's famous paper, A Mathematical Theory of Communication, the 'information content' of a sequence of characters (string) is the number of bits it takes to represent that string as a binary sequence.

Suppose our string consists of the first 10,000,000 digits of pi. As each decimal digit is worth log₂ 10 bits (about 3.5) we can roughly calculate it will take a binary string about 35 million bits in length to represent it, so we would say the information content is 35 million bits.

But supposing we store the information in an ASCII representation instead: our string would now be 80,000,000 bits in length - as each decimal digit is taking one byte (the size of an ASCII character - 8 bits in length) to store.

We would like to say that, in some sense, the ASCII representation, at 80,000,000 bits, and the more efficient binary representation, at 35,000,000 bits, store the same amount of information.

In terms of Shannon's paper, the difference is accounted for in that the ASCII encoding used allows a larger set of possible messages than the 'decimal digit' encoding - in ASCII, we can write novels and treatises, for example, whereas in decimal digits we can only write numbers (unless we invoke some extra decoding...) In other words, the amount of information is log₂ of (i.e. the number of bits to represent) the total number of same-length messages in our chosen encoding.

Now suppose we take our 80,000,000 bit ASCII string and apply gzip (or a cleverer compression program) to it. We will end up with a shorter string (guaranteed, because of the redundancy of using only 10 of the available 256 ascii characters.)

Given that we can always decompress this shorter string and get the ASCII representation we started with, is there not some sense in which these two strings represent the same amount of information?

In algorithmic information theory, a different approach is taken: the amount of information in a given string is defined as the length of the shortest program which will produce that string as output. One way to think about this is to regard the AIT information content as the lower limit to the compressibility of the string.

This quantity is known variously as kolmogorov complexity, algorithmic complexity and program-length complexity.

In the case of pi, this happens to be pretty small, because there are short programs which will go on producing digits of pi forever. We only need to add a counter that stops the output after our program prints the first 10,000,000 digits and we will have obtained a huge amount of 'compression'. A little thought shows we can get the next 10,000,000 digits out of our program at the very small cost of increasing the value of the counter (for which the length of the program would increase at no greater than log₂ of the counter).

Small further alterations of the program will allow the output to be created as either an ASCII or binary representation.

So AIT tries to quantify the intuition that there is some sense in regarding each of these representations as containing the same amount of information - there must be some sense in saying a binary or an ASCII or a compressed representation of pi represent the same thing.

In AIT, the concept of program-length complexity is also used to provide a definition of randomness. A string is said to be random when its own length is equal or less than its program-length complexity - i.e. the string is as short or shorter than the shortest program that would produce it as output. This differs from the usual definition of randomness, which is a statistical one, specifying that all substrings (of up to a certain length) must appear tolerably equally often in the string.

In fact pi satisfies the statistical definition of randomness, but, as shown above, not the AIT one.

One interesting feature of this definition of randomness is that it is in general undecidable whether a string is random - because the shortest program that produces a given string as output is uncomputable, except for a finite number of strings (with N bits of axioms, you can determine program-length complexity where it is below or equal to N.)

This fact tends to go against there being many practical applications of AIT.

The terms algorithmic information theory and program-length complexity were coined by Gregory Chaitin to describe features of his own work on the foundations of mathematics.

The Grand Cosmological Freak Out	Kolmogorov complexity	Gregory Chaitin	randomness
Our computers are virtual machines in the real machine that is the universe	Halting problem	information theory	Algorithmic Complexity and Life
William S. Burroughs	Gödel's theorem	elegant program	The digits of pi are not random.
noncomputable	Claude Shannon	Tell the Center I'm Sorry	Meta Math!
combinator	non-compressibility of random data	How many bits are in the human genome?	infinity and human intuition
On Formally Undecidable Propositions of Principia Mathematica and Related Systems	Alfred North Whitehead	Berry's Paradox	Digits of Pi