(Coding theory, what did you expect?)

Say one wishes to describe some set X using symbols Σ. A code is a one to one function c:X→Σ* that matches every element of X to a word using the letters of Σ. Call the set C=c(X) of all words used to describe X the set of codewords. c is prefix-free if no codeword is a prefix of another codeword.

Examples and non-examples

  • ASCII, or any fixed-length encoding is prefix-free, since different words of the same length cannot be prefixes of one another.
  • Morse code is not prefix-free! For instance, "." (the code for 'e') is a prefix of "..." (the code for 's'). Indeed, when transmitting Morse code three symbols are used: dot, dash, and a short silence between letters.
  • Any code that has a distinct stop symbol is prefix-free. Morse code considered with the 3 symbols above is one example; null terminated strings are another.
  • Consider X={a,b,c,d,e}. The code
    a ⇒ 01, b ⇒ 00, c ⇒ 100, d ⇒ 1011, e ⇒ 1111
    is prefix-free, but if we reverse the encoding of each letter the code is no longer prefix-free.

Uses

A prefix-free encoding is useful because it is self-delimiting: if we have any string of symbols, there cannot be two ways to decode it into a sequence of elements of X. This has obvious advantages for transmission, but prefix-free encodings are also useful elsewhere:

  • Huffman codes are prefix-free (even though their reverses often aren't, as above); being prefix-free has obvious advantages in compression, and indeed...
  • Kolmogorov-Chaitin complexity is defined using a prefix-free encoding of Turing machines.
  • If we ignore comments, XML documents are prefix-free: There is one top-level tag, and upon closing it the document cannot be extended.


Yay! Another long short for bq2K6!

Log in or register to write something here or to contact authors.