When we want to make

statistical inference based on an a sequence of

experiments it often turns out that we do not need to know the value of every single

measurement, but that all the relevant information is contained in some

statistic (i.e. a

function of the

sample data).

As an example suppose that you are a highly merited statistical researcher, and that you are given the task of deciding whether a certain coin is unbiased. You model the problem by saying that the outcomes of different coin tosses are independent, and that the probablity of getting 'tails' is p ∈ [0, 1]. You then proceed to perform a large number of experiments. More precisely you give the coin to a graduate student and tell him to toss the coin 10 000 times and then give you the "result". Using that result you apply some superadvanced tests to solve the intricate problem of whether p = 1/2 or not.

The issue here is what we mean by the result. You could ask the grad student to list the outcome of every single experiment, but that's a bit impractical. Being a brilliant mathematician you realise that all you need to know to estimate the value of p is the total number T of tails tossed, because the order they occured in is not relevant. T is therefore in some sense sufficient for p.

We now express this more formally.

Suppose that X_{1}, X_{2}, ..., X_{n} are independent identically distributed random variables, where the distribution is specified by some parameter p (p could be pretty much any kind of parameter, but typically it is a real number or a vector with real entries).

**Definition:**

A statistic T(**X**) is said to be sufficient for p if the conditional distribution of **X** given T is independent of p.

Note that T can take many forms; but like p T is usually a real number or a vector.

Why does this definition express our intuitive notion of sufficency? The reason is that the way we make statistical inference about p from **X** is based on the probability of obtaining the recorded value **x** for **X** given different values of p. Values of p that give a high value of seeing **x** are considered more likely. If the conditional distribution of **X** given T(**X**) is independent of p and we know the value of T(**x**) then additionally learning the value of **x** does not change the likelihood of p, and therefore gives us no extra information about p.

While the definition is nice and formal it is not very useful for actually finding sufficient statistics and demonstrating their sufficiency. Conveniently there is a necessary and sufficient condition that allows us to practically read off a sufficient statistic from the distribution of **X**.

**Factorisation criterion:**

A statistic T(**X**) is sufficient for p iff the distribution function of **X** can be written in the form

f(**x**|p) = g(p, T)h(**x**)

for some functions g, h.

**Proof:**

For simplicity we assume that **X** is discrete.

If the distribution function of **X** factorises then

P(**X** = **x**|T = t) = P(**X** = **x**, T = t)/P(T = t) = g(p, t)h(**x**)/Σg(p, t)h(**x**) = h(**x**)/Σh(**x**)

(where the sums are taken over all **x** such that T(**x**) = t) which is clearly independent of p.

Conversely suppose that the distribution of **X** given T is independent of p. Using that P(**X** = **x**|T(**X**) = t) = 0 if t ≠ T(**x**) we find

P(**X** = **x**) = Σ_{t} P(**X** = **x**|T(**X**) = t) = P(**X** = **x**|T(**X**) = T(**x**))P(T(**X**) = T(**x**))

If we take

g(p, T) = P(T(**X**) = T(**x**))

h(**x**) = P(**X** = **x**|T(**X**) = T(**x**))

we obtain the desired factorisation, since h(**x**) is independent of p by assumption.

If we apply this to the coin tossing example we have that for **x** ∈ {0, 1}^{n}

P(**X** = **x**) = ∏p^{xi}(1-p)^{1-xi} = p^{Σxi}(1-p)^{n - Σxi}

This is a function of T = Σx_{i}, which is therefore sufficient for p by the factorisation criterion. This justifies the earlier statement that only the number of tails tossed is relevant and not their order.

As a postscript we would actually be interested in the order of the outcomes if it turned out that e.g. they formed an alternating sequence of heads and tails. This would indicate that the coin possessed some sort of memory, which would be very curious indeed. The reason why this would be interesting is, however, not that it tells us anything about the value of p in our model, but because it indicates that our model (where we assumed that different coin tosses are independent) is incorrect. Deciding whether the model you use is reasonable is always a problem in statistics, but in order to be able to draw any conclusions you must use one, and it is within the model that precise mathematical statements can be made and concepts such as sufficency become meaningful.