Benford's law is a remarkable law of text to do with the occurrence and distribution of numbers in a body of text or set of data.

The law states that if you take a corpus of text and look at the number of occurrences of the first digit of every number in that text, the distribution is roughly exponential.

More precisely - for any substantial body of text, numbers beginning with the digit 1 will occur roughly twice as many times as numbers beginning with the digit 2. Which in turn occur roughly twice as many times as numbers starting with the digit 3. This continues all the way down to numbers beginning with the digit 9.

In most cases the difference in frequency between two consecutive digits is actually around 30% (not half), but this it is still significant enough such that numbers starting with the digit 9 only occur in around 5% of all instances.

At first this law seems very counter-intuitive. The natural assumption is that all digits should occur with around the same frequency and that their distribution should be even. The other remarkable thing about this law is that not only does it fit random corpus of text, but it can be applied to most types of more formal data, even that which is logical, controlled and sequenced. This includes data such as electricity bills, stock prices, lengths of rivers and even mathematical constants.

The reason for this law is the overwhelming occurrences of exponential and recursive systems in nature. One example is population growth, which can be modelled recursively and so total population fits an exponential curve. If we consider bacteria, each bacteria divides into two bacteria, which in turn divide into four, and again into eight - the population continues to grow faster and faster. The reason Benford's law is so prevalent is that these kind of systems pop up everywhere in nature, even when we don't expect them. They were noted as far back as 1881, when the American astronomer Simon Newcomb (sometimes noted for discovering Benford's law) noticed that in the pages of logarithm books, the earlier pages which contained numbers starting with 1 were more worn than the other pages.

With any system where numbers grow exponentially, due to the way our number system works, you will see that the system spends most of its time at a point where the current total begins with the digit 1 or 2. This might be hard to envision, but it becomes clear if you look at logarithmic scale. In a logarithmic scale (of base 10) the distance between 1 and 2 is equal to the distance between 10 and 20, and also slightly larger than the distance between 2 and 3 (or 20 and 30). As an exponential curve can be plotted to a logarithmic scale such that it creates an even distribution, this shows that for the corpus or data set to be valid, the data it is modelling must be exponential.

What I like about Benford's law is that it shows how prevalent exponential and recursive processes are in all aspects of nature. It shows clearly that linear systems happen far less than we naively expect. It also shows that the very intuitive notions we have of perceiving the world (such as the idea that the mean is always the fairest average), can often be flawed, skewed, or not really tell us what we think they do.