prefix-free encoding - Everything2.com

(Coding theory, what did you expect?)

Say one wishes to describe some set X using symbols Σ. A code is a one to one function c:X→Σ^* that matches every element of X to a word using the letters of Σ. Call the set C=c(X) of all words used to describe X the set of codewords. c is prefix-free if no codeword is a prefix of another codeword.

Examples and non-examples

ASCII, or any fixed-length encoding is prefix-free, since different words of the same length cannot be prefixes of one another.
Morse code is not prefix-free! For instance, "." (the code for 'e') is a prefix of "..." (the code for 's'). Indeed, when transmitting Morse code three symbols are used: dot, dash, and a short silence between letters.
Any code that has a distinct stop symbol is prefix-free. Morse code considered with the 3 symbols above is one example; null terminated strings are another.
Consider X={a,b,c,d,e}. The code
a ⇒ 01, b ⇒ 00, c ⇒ 100, d ⇒ 1011, e ⇒ 1111
is prefix-free, but if we reverse the encoding of each letter the code is no longer prefix-free.

Uses

A prefix-free encoding is useful because it is self-delimiting: if we have any string of symbols, there cannot be two ways to decode it into a sequence of elements of X. This has obvious advantages for transmission, but prefix-free encodings are also useful elsewhere:

Huffman codes are prefix-free (even though their reverses often aren't, as above); being prefix-free has obvious advantages in compression, and indeed...
Kolmogorov-Chaitin complexity is defined using a prefix-free encoding of Turing machines.
If we ignore comments, XML documents are prefix-free: There is one top-level tag, and upon closing it the document cannot be extended.

Yay! Another long short for bq2K6!

Kolmogorov complexity	BrevityQuest 2006	Chaitin's constant	You are complexity, studied and defined
Huffman code	Node .45	one to one	Gregory Chaitin
XML	test	Morse code	function
symbol	Silence	Turing Machine	Lojban
ASCII	compression