In standard
information theory, the discipline launched by
Claude Shannon's famous paper,
A Mathematical Theory of
Communication, the 'information content' of a sequence
of characters (
string) is the number of bits it takes to
represent that string as a binary sequence.
Suppose our string consists of the first 10,000,000 digits of pi. As each decimal digit is worth log2 10 bits (about 3.5) we can roughly calculate it will take a binary string about 35 million bits in length to represent it, so we would say the information content is 35 million bits.
But supposing we store the information in an ASCII
representation instead: our string would now be
80,000,000 bits in length - as each decimal digit is taking
one byte (the size of an ASCII character - 8 bits in length) to store.
We would like to say that, in some sense, the ASCII
representation, at 80,000,000 bits, and the more efficient
binary representation, at 35,000,000 bits, store the same
amount of information.
In terms of Shannon's paper, the difference is accounted
for in that the ASCII encoding used allows a larger set
of possible messages than the 'decimal digit' encoding
- in ASCII, we can write novels and treatises, for example, whereas in decimal digits we can only write numbers (unless we invoke some extra decoding...) In other words, the amount of information is log2 of (i.e. the number of bits to represent) the total number of same-length messages in our chosen encoding.
Now suppose we take our 80,000,000 bit ASCII string and
apply gzip (or a cleverer compression program) to it. We
will end up with a shorter string (guaranteed, because
of the redundancy of using only 10 of the available 256
ascii characters.)
Given that we can always decompress this shorter string and get the ASCII representation we started with, is there not some sense in which these two strings represent the
same amount of information?
In algorithmic information theory, a different approach is taken: the amount of information in a given string is defined as the length of the shortest program which will produce that string as output. One way to think about this is to regard the AIT information content as
the lower limit to the compressibility of the string.
This quantity is known variously as kolmogorov complexity, algorithmic complexity and program-length complexity.
In the case of pi, this happens to be pretty small, because
there are short programs which will go on producing digits of pi forever. We only need to add a counter that stops the output after our program prints the first 10,000,000 digits and we will have obtained a huge amount of 'compression'. A little thought shows we can get the next 10,000,000 digits out of our program at the very small cost of increasing the value of the counter (for which the length of the program would increase at no greater than log2 of the counter).
Small further alterations of the program will allow the
output to be created as either an ASCII or binary
representation.
So AIT tries to quantify the intuition that there is some
sense in regarding each of these representations as
containing the same amount of information - there must
be some sense in saying a binary or an ASCII
or a compressed representation of pi represent the same thing.
In AIT, the concept of program-length complexity is also used to provide a definition of randomness. A string is said to be random when its own length is equal or less than
its program-length complexity - i.e. the string
is as short or shorter than the shortest program that would produce it as output. This differs from the usual definition of randomness, which is a statistical one, specifying that all substrings (of up to a certain length) must appear tolerably equally often in the string.
In fact pi satisfies the statistical definition of randomness, but, as shown above, not the AIT one.
One interesting feature of this definition of randomness
is that it is in general undecidable whether a string is
random - because the shortest program that produces
a given string as output is uncomputable, except for a
finite number of strings (with N bits of axioms, you
can determine program-length complexity where it is
below or equal to N.)
This fact tends to go against there being many practical
applications of AIT.
The terms algorithmic information theory and
program-length complexity were coined by
Gregory Chaitin to describe features of his own work
on the foundations of mathematics.