Central Limit Theorem - Everything2.com

by ModernAngel

Thu Feb 10 2000 at 4:00:26

A fundamentally important theorem of statistics, it describes the properties of the bell curve and gives the statistician the ability to form probable conclusions about entire populations based on incomplete data about sample populations.

If the number of sample data is sufficiently large, the mean distribution is approximately normal.

I like it!

(thing)

by dowjones

Thu Jun 21 2001 at 2:39:12

The Central Limit Theorem is an important theorem used in mathematical statistics used to make inferences about populations based on limited amounts of information.

The principle is that if you have n random variables, Y1, Y2,…,Yn each with mean (expected value) u; and each with some variance s^2, then

U = sqrt(n)*((Y - u)/s^2), where Y is the average of the realised value of these n random variables

will converge to the standard normal distribution as n approaches infinity. The standard normal distribution is with mean 0 and variance 1, usually denoted Z. Note that the CLT can be applied to any random sample Y1, Y2,…,Yn so long as n is large (say >30) and as long as the mean and variance of Y are known and finite.

There are two other important ways to think about the CLT.

1. Y is approximately normally distributed with mean u and variance s^2/n. This makes sense because as n gets lager and larger, the variance will get smaller and smaller making Y, a better and better estimate of u. That is, Y ~` N(u , s^2/n ).

2. Alternatively, Y1+ Y+,…,+Yn are approximately normally distributed with mean nu and variance ns^2. That is Y1+ Y+,…,+Yn ~` N(nu , ns^2).

An Example. Suppose the test scores of all high school students in a certain state have mean 60 and variance 64. A random sample of 100 students from a large high school had mean 58. Is there any evidence to suggest that the high school is inferior?

Let Y denote the mean of the random sample of n=100 scores from a population with mean u = 60 and variance s^2 = 64. We want to find the probability that this sample mean is less than or equal to 58. If this probability is small, then there is reason to suggest that the school is inferior. We know from the CLT that Y is approximately normally distributed with mean u and variance s^2/n from (1).

So, we want: P(Y less than 58) = P({Y - u}/sqrt(s^2/n) less than {58 - u}/sqrt(s^2/n)), this has standardised Y

This expression is now in the context of the CLT and so we can replace the left hand side by Z, where Z has the standard normal distribution. Ie Z ~ N(0,1).

So, = P(Z less than {58 – 60}/sqrt(64/100))
= P(Z less than -2.5)

Since Z is a continuous random variable, then we can ignore the ‘=’ sign and just consider values of Z less than –2.5. At this point we consult our standard normal distribution tables and look up a value of –2.5, to find that the probability of Z being less than –2.5 is just 0.0062.

That is, we can say that the probability that this school obtained an average score of 58 given that it has the same abilities as the rest of the state is approximately 0.0062, or 0.62%. Hence, there is reason to suggest that the school is inferior.

I like it!

(thing)

by Semisane

Tue Jun 21 2005 at 4:56:31

Technically, the central limit theorum refers to the characteristics of a sampling distribution.

Let's imagine we're looking at the heights of people in the population. There is a real value out there for the mean height, say, of British people. We could find it by measuring everyone, but to do so would be excessive. Instead, we take a random sample and measure the heights of those selected. The central limit theorum states that, for any collection of samples, the mean of the mean values for those samples will approach the mean of the population. (More samples will generally bring you closer to the true value.) Meanwhile, the distribution of those sample means will follow a classic bell curve distribution.

The fact that this is true for any population, whether the underlying distribution follows a unimodal, symmetric bell curve or not, is one of the most surprising and useful facts of statistics.

I like it!

(idea)

by Grayscale

Fri Jan 27 2006 at 21:27:58

WARNING: Lots of HTML math ahead!

Forward

There are a several specific results which are known as central limit theorems, each sometimes referred to as "the" central limit theorem. Here we will focus on one particular version which

A word on notation: Here we will use the notation E(x) to denote the expectation value of a random variable x. There are other conventions in common use, including wedge brackets ⟨x⟩. The symbol i will be used for the basic imaginary number, while j and n will be used for counting indices.

Theorem

Consider a set {x_j}, j=1,...,N of N independent random variables with expectation E(x_j) = μ_j and variance E(x_j²)-E(x_j)² = σ_j², where the σ_j are real and finite. (A specific additional condition on the σ_j will be discussed later.) Let σ = (&Sigma_jσ_j²)^1/2 and define a new variable z = Σ_j(x_j-μ_j)/σ as the (scaled and shifted) sum of the x_j. Then as N→∞ the distribution of z approaches normal, i.e. p(z) = (2π)^-1/2exp[-z²/2] where p(z) is the density function of z.

Preliminary Definitions

The characteristic function Φ(k) for a variable x is defined as

Φ(k) = E(exp[ikx]) = ∫exp[ikx]p(x)dx

This is a calculational device for finding the moments E(x), E(x²), etc. as

Φ^(m)(0) = i^mE(x^m)

where Φ^(m)(k) represents the m^th derivative of Φ(k). If we can write these moments as derivatives of Φ(k), we can also do the reverse and write Φ(k) in a Taylor series:

Φ(k) = Σ E(xⁿ)(ik)ⁿ/n!

The logarithm of the characteristic function is known as the cumulant generating function, defined as

Ψ(k) = ln[Φ(k)] = ΣC_n(ik)ⁿ/n!

where the C_n, known as cumulants, are polynomials in the moments E(x), E(x²), etc. Of special note are C₁ = E(x) and C₂ = E(x²)-E(x)² = σ². Note that if we try to evaluate C₀ the result is always zero, so this term is generally ignored.

Proof

Let Φ_z(k) and Φ_j(k) denote the characteristic functions for z and the x_j. Then

Φ_z(k) = E(exp[ikz]) = E(exp[ikΣ_j(x_j-μ_j)/σ]) = E(Π_j exp[ik(x_j-μ_j)/σ]) = E(Π_j exp[ikx_j/σ] exp[-ikμ_j/σ])

As the x_j are independent the product can be moved outside the calculation of the expectation; so can the exponential in μ_j, as it is a constant. This results in

Φ_z(k) = Π_j E(exp[ikx_j/σ]) exp[-ikμ_j/σ] = Π_j Φ_j(k/σ) exp[-ikμ_j/σ]

Now we take the log, to change the characteristic functions into the cumulant-generating functions:

Ψ_z(k) = Σ_j Ψ_j(k/σ) - ikμ_j/σ

Substituting the Taylor expansions,

Σ_n C_zn(ik)ⁿ/n! = &Sigma_jΣ_n C_jn(ik/σ)ⁿ/n! - ikμ_j/σ

Coefficients of like powers of k must be equal on both sides, so we can solve for the C_zn. As C_j1 = μ_j and C_j2 = σ_j² we find

C_z1 = Σ_j C_j1/σ - μ_j/σ = Σ_j μ_j/σ - μ_j/σ = 0
C_z2 = Σ_j C_j2/σ² = (Σ_j σ_j²)/(Σ_j σ_j²) = 1

Now, C_zn ∝ 1/σⁿ, while σ is a sum of N finite σ_j, so as N→∞ it should not be surprising that C_z3 and higher-order C_zn approach zero. This is straightforward if we make the simplifying assumption that the x_j have equal variance, i.e. that the σ_j are all equal. However, there are several sufficient, weaker restrictions which we can impose on the distribution of the x_j including the Lyapunov, Lindeberg, and Feller-Lévy conditions; the study and proof of these variants is left to the interested reader. In all cases, we find that C_zn = 0 for n>2, so

Ψ_z(k) = (ik)²/2! = -k²/2

and

Φ_z(k) = exp[-k²/2]

This is the characteristic function of a standard normal distribution; we can verify this by performing an inverse Fourier transform to recover p(z):

p(z) = (2π)^-1 ∫exp[-ikz]Φ_z(k)dk = (2π)^-1 ∫exp[-ikz-k²/2]dk = (2π)^-1/2exp[-z²/2]

Thus z converges to the standard normal distribution, as desired.

I like it!

1 C!

Bertrand's Box Paradox	Zero divided by zero	Good from far, but far from good	Gaussian Distribution
Why can't Starbucks sell "small," "medium," and "large" drinks?	Why Koreans choose seemingly random email addresses	Car names totally lacking in coolness	Law of large numbers
Walter A. Shewhart	normal distribution	chi-square curve	Statistics every writer should know
Bayesian Network	standard error	Andrei Markov	There's a Delta for Every Epsilon
Names for Large Numbers	Choose your words carefully; now throw them away	normal distribution tables	simple random walk
binomial distribution	Confidence Interval	IID	random variable