Everything2
Near Matches
Ignore Exact
Full Text
Everything2

information theory

created by raph

(thing) by lurkingowl (1.2 wk) (print)   ?   (I like it!) 1 C! Wed Jul 24 2002 at 19:59:41

Information theory is the study of the transmission of messages, started by Claude Shannon at Bell Labs in 1948. Messages consist of a series of symbols, drawn from a well known set. Messages are transmitted from sender to receiver over a channel. Channels do not transmit signals reliably, and may contain noise (randomly changing one symbol to another.) Information theory makes these definitions more precise, and proves a number of useful facts about messages and channels. These definitions and results form the basis for most modern communications and compression, and have influenced most other areas of Computer Science.

The most fundamental step in Information Theory is the definition of the amount of information coming from a source. Information is measured in bits, with one bit able to distinguish which of two equally likely messages is being sent. Two bits can then distinguish between four equally likely possibilities, etc. This allows us to define each of a set of N symbols using log2(N) bits.

But information sources do not choose randomly among symbols: some letters are more common than others, some words more common, etc. Shannon goes on to define the key concept of entropy in terms of the probability of each symbol being emitted by a source. Entropy is also measured in bits, and is defined as:

  --
  \	- Psym	log2(Psym)
  /
  --
Symbols
Where Psym is the probability of the source emitting that symbol.

That is, the sum over all symbols of (-Psym * log 2 (Psym))

Entropy allows us to measure how much variation there is in a source's signal. If the source always transmits the same symbol over and over, the entropy is 0. For the case where all symbols are equally likely, this reduces to: N * 1/N * -log 2(1/N) = log 2(N) as we would expect. This is the maximal entropy case, any variation in probability reduces the entropy of the source.

Having defined the entropy of a source, we come to Shannon's first major result. A code is a mapping from each symbol to a string of bits. He shows that by choosing the string of bits for each symbol carefully based on the probability, you can reduce the average number of bits needed for each symbol. Specifically, it is possible to code symbols from a source with entropy E using, on average, no more than E+1 bits per symbol.

To design a code properly, you give lower probability symbols longer bit strings, and higher probability symbols shorter strings. As an example, suppose we have a source with four symbols {A,B,C,D} occuring with probabilities {0.5,.25,.125,.125} A naive code would transmit using log2(4) = 2 bits. But we can beat that:
A -> 1
B -> 01
C -> 001
D -> 000
Now we use, on average, .5*1 + .25*2 + .125*3 +.125*3 = 1.75 bits per symbol (this is also the entropy of our source.)

The concept of codes is the basis of all compression algorithms.

Moving from messages to channels, Shannon defines the capacity C of a channel in terms of the number of bits that can be transmitted down the channel in a given amount of time. This is based on the frequency of the tranmission, and the number of distinguishable states in each cycle (which is based on the power of the transmission.)

Now, our nemesis noise enters the picture. Noise is added randomly to a channel, switching some bits. The amount of noise in a channel is also defined in terms of entropy, and measured in bits/second. You can look at noise as a second source transmitting information the receiver is completely uninterested in at the same time.

Shannon's triumphant achievement is a proof that, in the limit, in a channel with capacity C and noise N, and a source with entropy E, you can transmit (C-N)/E symbols per second.

This is important because it gives a mathematically certain way of transmitting information over a noisy channel. It covers the standard ways of dealing with noise (increase channel capacity, either power or frequency) as well as introducing a new way of dealing with noise through the use of error-correcting codes, allowing us to transmit data in a noisy environment without having to retransmit.

Modern information theory is often concerned with developing ever more efficient and reliable codes.


printable version
chaos

entropy Claude Shannon The annoying orange orb outside my window each morning Cryptonomicon
Thinking without language How much for the little girl? General Systems Theory compression
Blue Screen of Death systems theory Everything Theory Star Trek Economy
Cult of Information Penises have higher bandwidth than cable modems Bremermann Limit information
bandwidth trapdoor function algorithmic information theory TOTE Model
signal-to-noise ratio Gregory Bateson One Hundred Years of Solitude zero information
Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.
  Epicenter
Login
Password

password reminder
register

Everything2 Help

Cool Staff Picks
Little presents from the Node Fairy:
karmic evitability
Texas
Costa Rica
Decrying political correctness without an understanding of its causes and intended consequences is little more than racism muttered under one's breath
Live and Let Die
Cardinal
toast soldiers
Declawing your cat
runcible spoon
Node for the Ages
Anglo-Saxon Laws and Customs
Against you I will fling myself, unvanquished and unyielding, O Death!
Chinese has three words for "river"
New Writeups
Meezzio
Gotlandssnus(thing)
argv
Astral Plane(idea)
Madara
One Winged Angel(fiction)
Tom Rook
Talk is cheap(poetry)
shaogo
Adelle Davis(person)
Aerobe
race car g sfjsgsd(poetry)
Binah
Dream Log: July 5, 2008(dream)
StrawberryFrog
Forgotten things in space(idea)
antigravpussy
velvet revolution fairy tale(idea)
Heitah
Nerve agent VX(thing)
Pavlovna
shite(idea)
wonton
Days and nights come together in a slow falling down(fiction)
Pavlovna
wee(idea)
katherine
root log: July 2008(log)
Madara
There’s nothing like a trail of blood to find your way back home(fiction)
E2 is a by-product of the existence of The Everything Development Company