A frequentist is someone who interprets probability via the Law of Large Numbers. This interpretation contrasts with the Bayesian interpretation.

Frequentist Meaning of Probability

"The probability that this coin lands heads is 1/2." How to interpret this? A sensible answer is in terms of frequencies: if we were to toss the coin many, many times, on average about half of those tosses would be heads. In this way, probability is something that can be measured physically, at least hypothetically, if we are patient enough to repeat an experiment many times. This is how frequentists receive their nickname: objectivists. Probabilities are objectively measurable things.

Or at least they are hypothetically measurable. A first objection to the frequentist interpretation centres around experiments that cannot be repeated. What is the probability that a frequentist's firstborn will be male? Does it still make sense to say that with probability 1/2, her first born will be a boy? If so, a frequentist would have to interpret this as a statement of what will happen on the average as she gives birth to many firstborns. If we imagine that it were possible to give birth to a firstborn many times, half would be boys, she would say. Or, if pressed, she may recast her interpretation as a statement of all the women who ever gave birth to a firstborn and observe that about half of them had boys. Other more elaborate examples may require more work to dissuade critics.

But these are mere philosophical trifles. There is no philosophy anymore in the mathematical theory of probability. A. N. Kolmogorov already axiomatised probability. A probability is a real number between zero and one we assign to sets (events) according to three simple rules; probability theory is a branch of set theory, calculus, and mathematical analysis. The Law of Large Numbers is a theorem about convergence in probability, and real-world interpretations of what all the mathematical abstract nonsense really means is not the business of the pure mathematician.

Frequentist Statistics

Interpretations are, however, the business of the statistician, a breed of applied mathematician. The real impact of the frequentist interpretation of probability arises only when we begin to assign probabilities to certain events in the world and make statistical inferences.

Statistical inference arises in response to some very practical problems, in every science. Whoever said that mathematics is everywhere would have been better off saying that statistics is everywhere. For example, we may want to know the value of the speed of light, the average rainfall in the Amazon, or what the public thinks of our political leaders. The problem, of course, is that it is impractical or impossible to know any of these things precisely. Measurements of the speed of light bristle with experimental error; we cannot put in a gigantic graduated cylinder all the water that falls in Brazil, and it would take too long to ask everyone what they think of our political leader. Like good scientists, we must thus make some idealisations and postulate a model.

The frequentist model goes like this: there are some proportions, certain numbers out there in the world that exist objectively as the relative frequencies of some events in the long run. Even if we do not know them, they are not random. Instead, they describe certain probability distributions from which we sample. There is therefore a true, certain, and constant speed of light out there, but if we ever attempt to measure it, we will be faced with random error. The error has a certain distribution determined by the true speed of light, such as a normal distribution (also known as a Gaussian distribution or a bell curve). In order to know the true speed of light, we can look at many random samples of measurements. The model for conducting a poll of public opinion runs along similar lines: there is a true proportion of the population that supports the actions of Julius Caesar, but we are forced to take a random sample of the population in order to estimate the true proportion.

These non-random descriptors of a probability distribution are known as parameters, such as the true speed of light or the true mean rainfall in the Amazon. The name of the game is to estimate these parameters.

There are three common ways to perform statistical inference in a frequentist fashion. A frequentist has at her disposal point estimation, interval estimation, and hypothesis testing. The common ANOVA procedure (analysis of variance), for example, is nothing more than systematised hypothesis testing. Point estimation is about coming up with recipes that give single numbers that on average come close to the parameters we are estimating. For example, if we took ten measurements of the speed of light, we might decide to take their average and say that the speed of light is close to this average. We have no idea how close this actually is to the true value of the speed of light, but we know that our method works well on average given the assumptions we have about the distribution of error. Interval estimation does the same, with intervals in lieu of single number, with the hope of capturing the true mean with some interval of numbers we computed from our sample. Hypothesis testing is about accepting or rejecting a hypothesis on the basis of the plausibility of the observed data under the assumption that the hypothesis is true. For example, if we hypothesise that the true speed of light is 55 miles per hour, but we instead measure 2.99 x 108 metres per second, then we have made a very improbable observation, and we can question the truthfulness of our hypothesis.

Confidence Intervals and Misconceptions

I will speak more about interval estimation, because it provides a good example of the frequentist framework and because there are some common misconceptions of how it works. Consider, again, the speed of light. Suppose our budget only allows us to take twenty measurements of the speed of light. Before we do so, we stipulate that these measurements will come from a normal distribution, estimate the variance of the distribution with the sample variance, resort to Student's t-distribution, and concoct a recipe that ensures that on 95% of the occasions when we take these twenty measurements, the true speed of light will be between X - s and X + s, where X and s are some quantities we computed using our twenty measurements. This is called a 95% confidence interval. After we're done with the mathematics, we take our twenty measurements. A minute of arithmetic leads us to the confidence interval [2.9987 x 108, 3.0001 x 108]. We write a report that reads "It was found that [2.9987 x 108, 3.0001 x 108] is a 95% confidence interval for the true speed of light c." We send it for publication and hope for the best.

How are we to interpret [2.9987 x 108, 3.0001 x 108]? Often people will say "the probability that c is in [2.9987 x 108, 3.0001 x 108] is 0.95". This is not correct. The probability is either zero or one, true or false. Remember that our model and the rationale for our computation of the confidence interval rely on the assumption that the parameter c is not random. There is no probability associated with the true value of the speed of light according to the frequentist paradigm, and our interval isn't random either, because we already observed it and know what it is. Although many people would love to make a probability statement about something they do not know, this is one occasion on which it is not fair to do so. Rather, if they wish to be consistent with the frequentist methodology they must report

This 95% confidence interval comes from a distribution of confidence intervals that on repeated trials will capture the true speed of light approximately 95% of the time. We cannot be certain whether c lies between 2.9987 x 108 and 3.0001 x 108 or not. We cannot even assign a nontrivial probability to the event that c lies in our interval, because c is not random. We have faith, however, because our procedure is sound, that our interval is a valid estimate for the speed of light.

Usually this report gets abbreviated to "c is in [2.9987 x 108, 3.0001 x 108] with 95% confidence." Beware of reading "confidence" as a synonym for "probability". Nonsense may ensue!

In order to drive this point home, I will present one more (slightly contrived) example, and at the same time make the frequentists look a little silly. All in good fun, of course.

Suppose that we are in a situation where a uniform distribution would be a good model. Such a situation I leave to your imagination. For concreteness, suppose that we are sampling from the interval [μ - 1/2, μ + 1/2], and that the values are equally likely to come from anywhere in that interval. It looks like this:


                    <-- 1/2 -->   μ   <-- 1/2 -->
                  [_______________|_______________]

                    <------------ 1 ------------>

We don't know where the centre μ is, but from theoretical considerations relevant to what we are modelling, we are certain that the length of the interval is exactly one. Now we want to estimate where the centre μ is, but we only have enough resources to make two independent observations. We do the reasonable thing, and decide to estimate μ by a confidence interval whose endpoints are our two observations. Call these two observations X and Y; they are two independent identically distributed uniform random variables with mean μ. Then we can compute that


P(μ lies between X and Y) = P(X < μ < Y) + P(Y < μ < X)         (two disjoint events)

                           = P(X < μ)P(Y > μ) + P(Y < μ)P(X >μ)  (by independence)

                           = (0.5)(0.5)  + (0.5)(0.5)

                           = 0.25 + 0.25 = 0.5

Thus, the interval determined by X and Y is a 50% confidence interval.

Suppose now we carry out our experiment, and our two observations turn out to be 20.0 and 20.6. We do not know the true value of μ, but we are certain that our observations cannot be more than a distance of 1 apart. Thus, we are certain that μ lies between 20.0 and 20.6. There is simply not enough room for it to be anywhere else, because μ is the centre of our uniform distribution, according to our model. This time, we do know that the probability that μ is between 20.0 and 20.6 is 1. And yet, even though in this case we are certain, even though the probability is 1, the frequentist still has to report "[20.0, 20.6] is a 50% confidence interval for μ."

That seems a little silly. Fifty-percent confidence for something we know for certain? Objections like this lead to the Bayesian interpretation of probability.