A *frequentist* is someone who interprets probability via the
**Law of Large Numbers**. This interpretation contrasts
with the Bayesian interpretation.

# Frequentist Meaning of Probability

"The probability that this coin lands heads is 1/2." How to interpret
this? A sensible answer is in terms of frequencies: if we were to toss
the coin many, many times, on average about half of those tosses would
be heads. In this way, probability is something that can be measured
physically, at least hypothetically, if we are patient enough to repeat
an experiment many times. This is how frequentists receive their
nickname: *objectivists*. Probabilities are objectively measurable
things.

Or at least they are hypothetically measurable. A first objection to
the frequentist interpretation centres around experiments that cannot
be repeated. What is the probability that a frequentist's firstborn
will be male? Does it still make sense to say that with probability
1/2, her first born will be a boy? If so, a frequentist would have to
interpret this as a statement of what will happen on the average as
she gives birth to many firstborns. If we imagine that
it were possible to give birth to a firstborn many times, half would
be boys, she would say. Or, if pressed, she may recast her
interpretation as a statement of all the women who ever gave birth to
a firstborn and observe that about half of them had boys. Other more
elaborate examples may require more work to dissuade critics.

But these are mere philosophical trifles. There is no philosophy
anymore in the mathematical theory of probability. A. N. Kolmogorov already axiomatised probability. A
probability is a real number between zero and one we assign to sets (events)
according to three simple rules; probability theory is a branch of set
theory, calculus, and mathematical analysis. The Law of Large Numbers
is a theorem about convergence in probability, and real-world
interpretations of what all the mathematical abstract nonsense really
means is not the business of the pure mathematician.

# Frequentist Statistics

Interpretations are, however, the business of the statistician, a
breed of applied mathematician. The real impact of the frequentist
interpretation of probability arises only when we begin to assign
probabilities to certain events in the world and make statistical
inferences.

Statistical inference arises in response to some very practical
problems, in every science. Whoever said that mathematics is
everywhere would have been better off saying that *statistics*
is everywhere. For example, we may want to know the value of the speed
of light, the average rainfall in the Amazon, or what the public
thinks of our political leaders. The problem, of course, is that it is
impractical or impossible to know any of these things
precisely. Measurements of the speed of light bristle with
experimental error; we cannot put in a gigantic graduated cylinder
all the water that falls in Brazil, and it would take too long to ask
everyone what they think of our political leader. Like good
scientists, we must thus make some idealisations and postulate a
model.

The frequentist model goes like this: there are some proportions,
certain numbers out there in the world that exist objectively as the
relative frequencies of some events in the long run. Even if we do
not know them, they are not random. Instead, they describe certain
probability distributions from which we sample. There is therefore a
true, certain, and constant speed of light out there, but if we
ever attempt to measure it, we will be faced with random error. The
error has a certain distribution determined by the true speed of
light, such as a normal distribution (also known as a Gaussian
distribution or a bell curve). In order to know the true speed of
light, we can look at many random samples of measurements. The model
for conducting a poll of public opinion runs along similar lines:
there is a true proportion of the population that supports the actions
of Julius Caesar, but we are forced to take a random sample of the
population in order to estimate the true proportion.

These non-random descriptors of a probability distribution are known
as *parameters*, such as the true speed of light or the true
mean rainfall in the Amazon. The name of the game is to estimate these
parameters.

There are three common ways to perform statistical inference in a
frequentist fashion. A frequentist has at her disposal point
estimation, interval estimation, and hypothesis testing. The
common ANOVA procedure (analysis of variance), for example, is nothing
more than systematised hypothesis testing. Point estimation is about
coming up with recipes that give single numbers that on average come
close to the parameters we are estimating. For example, if we took ten
measurements of the speed of light, we might decide to take their
average and say that the speed of light is close to this average. We
have no idea how close this actually is to the true value of the speed
of light, but we know that our method works well on average given the
assumptions we have about the distribution of error. Interval
estimation does the same, with intervals in lieu of single number,
with the hope of capturing the true mean with some interval of numbers
we computed from our sample. Hypothesis testing is about accepting or
rejecting a hypothesis on the basis of the plausibility of the
observed data under the assumption that the hypothesis is true. For
example, if we hypothesise that the true speed of light is 55 miles per
hour, but we instead measure 2.99 x 10^{8} metres per second,
then we have made a very improbable observation, and we can question the truthfulness of our hypothesis.

# Confidence Intervals and Misconceptions

I will speak more about interval estimation, because it provides a
good example of the frequentist framework and because there are some
common misconceptions of how it works. Consider, again, the speed of
light. Suppose our budget only allows us to take twenty measurements
of the speed of light. Before we do so, we stipulate that these
measurements will come from a normal distribution, estimate the
variance of the distribution with the sample variance, resort to
Student's t-distribution, and concoct a recipe that ensures that on
95% of the occasions when we take these twenty measurements, the true
speed of light will be between *X* - *s* and *X*
+ *s*, where *X* and *s* are some quantities we
computed using our twenty measurements. This is called a 95%
*confidence interval*. After we're done with the mathematics,
we take our twenty measurements. A minute of arithmetic leads us to
the confidence interval [2.9987 x 10^{8}, 3.0001 x
10^{8}]. We write a report that reads "It was found that [2.9987 x
10^{8}, 3.0001 x 10^{8}] is a 95% confidence interval for
the true speed of light *c*." We send it for publication and
hope for the best.

How are we to interpret [2.9987 x 10^{8}, 3.0001 x
10^{8}]? Often people will say "the probability that
*c* is in [2.9987 x 10^{8}, 3.0001 x 10^{8}] is
0.95". This is not correct. The probability is either zero or
one, true or false. Remember that our model and the rationale for our computation of
the confidence interval rely on the assumption that the
parameter *c* is not random. There is no probability associated
with the true value of the speed of light according to the frequentist
paradigm, and our interval isn't random either, because we already observed it and know what
it is. Although many people would *love* to make a
probability statement about something they do not know, this is one
occasion on which it is not fair to do so. Rather, if they wish to be
consistent with the frequentist methodology they must report

This 95% confidence interval comes from a distribution of confidence intervals
that on repeated trials will capture the true speed of light
approximately 95% of the time. We cannot be certain whether *c*
lies between 2.9987 x 10^{8} and 3.0001 x 10^{8} or
not. We cannot even assign a nontrivial probability to the event that
*c* lies in our interval, because *c* is not random. We
have **faith**, however, because our procedure is sound,
that our interval is a valid estimate for the speed of light.

Usually this report gets abbreviated to "*c* is in [2.9987 x
10^{8}, 3.0001 x 10^{8}]
with 95% confidence." Beware of reading "confidence" as a synonym for
"probability". Nonsense may ensue!

In order to
drive this point home, I will present one more (slightly contrived) example, and at the same time make the frequentists look a
little silly. All in good fun, of course.

Suppose that we are in a situation where a uniform distribution would
be a good model. Such a situation I leave to your imagination. For concreteness, suppose that we are sampling from
the interval [μ - 1/2, μ + 1/2], and that the values are equally
likely to come from anywhere in that interval. It looks like this:

**
** <-- 1/2 --> μ <-- 1/2 -->
[_______________|_______________]
<------------ 1 ------------>

We don't know where the centre μ is, but from theoretical
considerations relevant to what we are modelling, we are certain that the length of the interval is
exactly one. Now we want to estimate where the centre μ is, but we
only have enough resources to make two independent observations. We do
the reasonable thing, and decide to estimate μ by a confidence
interval whose endpoints are our two observations. Call these two
observations *X* and *Y*; they are two independent
identically distributed uniform random variables with mean μ. Then
we can compute that

P(μ lies between X and Y) = P(X < μ < Y) + P(Y < μ < X) (two disjoint events)
= P(X < μ)P(Y > μ) + P(Y < μ)P(X >μ) (by independence)
= (0.5)(0.5) + (0.5)(0.5)
= 0.25 + 0.25 = 0.5

Thus, the interval determined by *X* and *Y* is a 50%
confidence interval.

Suppose now we carry out our experiment, and our two observations turn
out to be 20.0 and 20.6. We do not know the true value of μ, but we
are certain that our observations cannot be more than a distance of 1
apart. Thus, we are **certain** that μ lies
between 20.0 and 20.6. There is simply not enough room for it to be
anywhere else, because μ is the centre of our uniform distribution,
according to our model. This time, we do know that the probability
that μ is between 20.0 and 20.6 is 1. And yet, even though in this
case we are certain, even though the probability is 1, the frequentist
still has to report "[20.0,
20.6] is a 50% confidence interval for μ."

That seems a little silly. Fifty-percent confidence for
something we know for certain? Objections like this lead to the
Bayesian interpretation of probability.