Information theory is the study of the transmission of messages, started by

Claude Shannon at

Bell Labs in 1948. Messages consist of a series of

symbols, drawn from a well known set. Messages are transmitted from sender to receiver over a

channel. Channels do not transmit signals reliably, and may contain noise (randomly changing one symbol to another.) Information theory makes these definitions more precise, and proves a number of useful facts about messages and channels. These definitions and results form the basis for most modern communications and compression, and have influenced most other areas of

Computer Science.

The most fundamental step in Information Theory is the definition of the amount of information coming from a source. Information is measured in bits, with one bit able to distinguish which of two equally likely messages is being sent. Two bits can then distinguish between four equally likely possibilities, etc. This allows us to define
each of a set of N symbols using log2(N) bits.

But information sources do not choose randomly among symbols: some letters are more common than others, some words more common, etc. Shannon goes on to define the key concept of entropy in terms of the probability of each symbol being emitted by a source. Entropy is also measured in bits, and is defined as:

--
\ - Psym log2(Psym)
/
--
Symbols

Where Psym is the probability of the source emitting that symbol.

That is, the sum over all symbols of (-Psym * log 2 (Psym))

Entropy allows us to measure how much variation there is in a source's signal. If the source always transmits the same symbol over and over, the entropy is 0. For the case where all symbols are equally likely, this reduces to: N * 1/N * -log 2(1/N) = log 2(N) as we would expect. This is the maximal entropy case, any variation in probability reduces the entropy of the source.

Having defined the entropy of a source, we come to Shannon's first major result. A code is a mapping from each symbol to a string of bits. He shows that by choosing the string of bits for each symbol carefully based on the probability, you can reduce the average number of bits needed for each symbol. Specifically, it is possible to code symbols from a source with entropy E using, on average, no more than E+1 bits per symbol.

To design a code properly, you give lower probability symbols longer bit strings, and higher probability symbols shorter strings. As an example, suppose we have a source with four symbols {A,B,C,D} occuring with probabilities {0.5,.25,.125,.125} A naive code would transmit using log2(4) = 2 bits. But we can beat that:

A -> 1

B -> 01

C -> 001

D -> 000

Now we use, on average, .5*1 + .25*2 + .125*3 +.125*3 = 1.75 bits per symbol (this is also the entropy of our source.)

The concept of codes is the basis of all compression algorithms.

Moving from messages to channels, Shannon defines the capacity C of a channel in terms of the number of bits that can be transmitted down the channel in a given amount of time. This is based on the frequency of the tranmission, and the number of distinguishable states in each cycle (which is based on the power of the transmission.)

Now, our nemesis noise enters the picture. Noise is added randomly to a channel, switching some bits. The amount of noise in a channel is also defined in terms of entropy, and measured in bits/second. You can look at noise as a second source transmitting information the receiver is completely uninterested in at the same time.

Shannon's triumphant achievement is a proof that, in the limit, in a channel with capacity C and noise N, and a source with entropy E, you can transmit (C-N)/E symbols per second.

This is important because it gives a mathematically certain way of transmitting information over a noisy channel. It covers the standard ways of dealing with noise (increase channel capacity, either power or frequency) as well as introducing a new way of dealing with noise through the use of error-correcting codes, allowing us to transmit data in a noisy environment without having to retransmit.

Modern information theory is often concerned with developing ever more efficient and reliable codes.