A fundamentally important theorem of statistics, it describes the properties of the bell curve and gives the statistician the ability to form probable conclusions about entire populations based on incomplete data about sample populations.

If the number of sample data is sufficiently large, the mean distribution is approximately normal.
The Central Limit Theorem is an important theorem used in mathematical statistics used to make inferences about populations based on limited amounts of information.

The principle is that if you have n random variables, Y1, Y2,…,Yn each with mean (expected value) u; and each with some variance s^2, then

U = sqrt(n)*((Y - u)/s^2), where Y is the average of the realised value of these n random variables

will converge to the standard normal distribution as n approaches infinity. The standard normal distribution is with mean 0 and variance 1, usually denoted Z. Note that the CLT can be applied to any random sample Y1, Y2,…,Yn so long as n is large (say >30) and as long as the mean and variance of Y are known and finite.

There are two other important ways to think about the CLT.

1. Y is approximately normally distributed with mean u and variance s^2/n. This makes sense because as n gets lager and larger, the variance will get smaller and smaller making Y, a better and better estimate of u. That is, Y ~` N(u , s^2/n ).

2. Alternatively, Y1+ Y+,…,+Yn are approximately normally distributed with mean nu and variance ns^2. That is Y1+ Y+,…,+Yn ~` N(nu , ns^2).

An Example. Suppose the test scores of all high school students in a certain state have mean 60 and variance 64. A random sample of 100 students from a large high school had mean 58. Is there any evidence to suggest that the high school is inferior?

Let Y denote the mean of the random sample of n=100 scores from a population with mean u = 60 and variance s^2 = 64. We want to find the probability that this sample mean is less than or equal to 58. If this probability is small, then there is reason to suggest that the school is inferior. We know from the CLT that Y is approximately normally distributed with mean u and variance s^2/n from (1).

So, we want: P(Y less than 58) = P({Y - u}/sqrt(s^2/n) less than {58 - u}/sqrt(s^2/n)), this has standardised Y

This expression is now in the context of the CLT and so we can replace the left hand side by Z, where Z has the standard normal distribution. Ie Z ~ N(0,1).

So, = P(Z less than {58 – 60}/sqrt(64/100))
= P(Z less than -2.5)

Since Z is a continuous random variable, then we can ignore the ‘=’ sign and just consider values of Z less than –2.5. At this point we consult our standard normal distribution tables and look up a value of –2.5, to find that the probability of Z being less than –2.5 is just 0.0062.

That is, we can say that the probability that this school obtained an average score of 58 given that it has the same abilities as the rest of the state is approximately 0.0062, or 0.62%. Hence, there is reason to suggest that the school is inferior.

Technically, the central limit theorum refers to the characteristics of a sampling distribution.

Let's imagine we're looking at the heights of people in the population. There is a real value out there for the mean height, say, of British people. We could find it by measuring everyone, but to do so would be excessive. Instead, we take a random sample and measure the heights of those selected. The central limit theorum states that, for any collection of samples, the mean of the mean values for those samples will approach the mean of the population. (More samples will generally bring you closer to the true value.) Meanwhile, the distribution of those sample means will follow a classic bell curve distribution.

The fact that this is true for any population, whether the underlying distribution follows a unimodal, symmetric bell curve or not, is one of the most surprising and useful facts of statistics.

WARNING: Lots of HTML math ahead!


There are a several specific results which are known as central limit theorems, each sometimes referred to as "the" central limit theorem. Here we will focus on one particular version which

A word on notation: Here we will use the notation E(x) to denote the expectation value of a random variable x. There are other conventions in common use, including wedge brackets ⟨x⟩. The symbol i will be used for the basic imaginary number, while j and n will be used for counting indices.


Consider a set {xj}, j=1,...,N of N independent random variables with expectation E(xj) = μj and variance E(xj2)-E(xj)2 = σj2, where the σj are real and finite. (A specific additional condition on the σj will be discussed later.) Let σ = (&Sigmajσj2)1/2 and define a new variable z = Σj(xjj)/σ as the (scaled and shifted) sum of the xj. Then as N→∞ the distribution of z approaches normal, i.e. p(z) = (2π)-1/2exp[-z2/2] where p(z) is the density function of z.

Preliminary Definitions

The characteristic function Φ(k) for a variable x is defined as

Φ(k) = E(exp[ikx]) = ∫exp[ikx]p(x)dx

This is a calculational device for finding the moments E(x), E(x2), etc. as

Φ(m)(0) = imE(xm)

where Φ(m)(k) represents the mth derivative of Φ(k). If we can write these moments as derivatives of Φ(k), we can also do the reverse and write Φ(k) in a Taylor series:

Φ(k) = Σ E(xn)(ik)n/n!

The logarithm of the characteristic function is known as the cumulant generating function, defined as

Ψ(k) = ln[Φ(k)] = ΣCn(ik)n/n!

where the Cn, known as cumulants, are polynomials in the moments E(x), E(x2), etc. Of special note are C1 = E(x) and C2 = E(x2)-E(x)2 = σ2. Note that if we try to evaluate C0 the result is always zero, so this term is generally ignored.


Let Φz(k) and Φj(k) denote the characteristic functions for z and the xj. Then

Φz(k) = E(exp[ikz]) = E(exp[ikΣj(xjj)/σ]) = E(Πj exp[ik(xjj)/σ]) = E(Πj exp[ikxj/σ] exp[-ikμj/σ])

As the xj are independent the product can be moved outside the calculation of the expectation; so can the exponential in μj, as it is a constant. This results in

Φz(k) = Πj E(exp[ikxj/σ]) exp[-ikμj/σ] = Πj Φj(k/σ) exp[-ikμj/σ]

Now we take the log, to change the characteristic functions into the cumulant-generating functions:

Ψz(k) = Σj Ψj(k/σ) - ikμj/σ

Substituting the Taylor expansions,

Σn Czn(ik)n/n! = &SigmajΣn Cjn(ik/σ)n/n! - ikμj/σ

Coefficients of like powers of k must be equal on both sides, so we can solve for the Czn. As Cj1 = μj and Cj2 = σj2 we find

Cz1 = Σj Cj1/σ - μj/σ = Σj μj/σ - μj/σ = 0
Cz2 = Σj Cj2/σ2 = (Σj σj2)/j σj2) = 1

Now, Czn ∝ 1/σn, while σ is a sum of N finite σj, so as N→∞ it should not be surprising that Cz3 and higher-order Czn approach zero. This is straightforward if we make the simplifying assumption that the xj have equal variance, i.e. that the σj are all equal. However, there are several sufficient, weaker restrictions which we can impose on the distribution of the xj including the Lyapunov, Lindeberg, and Feller-Lévy conditions; the study and proof of these variants is left to the interested reader. In all cases, we find that Czn = 0 for n>2, so

Ψz(k) = (ik)2/2! = -k2/2


Φz(k) = exp[-k2/2]

This is the characteristic function of a standard normal distribution; we can verify this by performing an inverse Fourier transform to recover p(z):

p(z) = (2π)-1 ∫exp[-ikz]Φz(k)dk = (2π)-1 ∫exp[-ikz-k2/2]dk = (2π)-1/2exp[-z2/2]

Thus z converges to the standard normal distribution, as desired.

Log in or registerto write something here or to contact authors.