A statistical function to calculate the probability of getting n successes out of N attempts in a mutually exclusive, dichotomous, independent, random function with the probability of p.

Well, that's a load of gibberish, isn't it? Allow me to explain:

This can be described mathematically by the following function:


           N!
P(n) = ---------- pn(1-p)N-n
       n!(N - n)!

      |_________|
           |
           |
           |
           |
  ------------------------------------------------
  | Side note: This is the binomial coefficient  |
  | of N and n, or "N choose n"                  |
  ------------------------------------------------

P(n) is the probability of achieving exactly n successes. p is the probability of success in one single case. N is the number of trials.

An example: If I flip a coin seven times, what is the probability that it will land heads up twice? This gives us the following values:
N = 7
n = 2
p = 0.5 (it's a 50/50 chance of heads/tails)
Insert these values into the function:

           7!
P(2) = ---------- 0.52(1-0.5)7-2 = 0.1640625
       2!(7 - 2)!

The probability is approximately 16%.

This function is often used to calculate the probability to achieve at least m successes in N tries. This can be done in the following way:

 N
---
\   P(ni)
/
---
i=m

In the previous example, if I had wanted to calculate the probability to achieve at least 5 heads up, here's the formula:

 7
---
\   P(ni) = P(5) + P(6) + P(7)
/
---
i=5

This will cause a distribution not entirely unlike the normal distribution (the gaussian distribution).

The similarity of the binomial and normal distributions is very important. Calculating the chances that a binomial random variable, with large N, will be less that something around, say, N/2 requires a lot of calculation.

The normal distribution can be used as a much easier-to-calculate approximation. Set up the variable with mu equaling n*p and sigma-squared equaling n*p*(1-p) and you have an easier way to find your answer. This is considered to be a very good approximation as long as sigma-squared is greater than 10.

I feel the above formula needs a little more explanation. Say we have a random variable X, where X~Bin(N,p). This means we are counting the number of successes in N trials, where each trial has a probability p of success. We also define q as 1-p, in other words the probability of failure. We are interested in the probability that X takes the particular value x. To calculate this, we use:

P(X = x) = NCn x pn x qN-n

You see we have three terms here. First, the term pn is the probability of n success. Simple enough. Now if we have n successes, we must have N - n failures to make up the N trials. So we have the term qN-n for the probability of N-n failures. We multiply these two probabilities together because we need both of the events to occur.

Now the term NCn. Perhaps you recognise the C as the Choose function, but perhaps not. Put simply, aCb is the number of ways of choosing b objects from a, where the order in which you pick doesn't matter. Remember, when we have n successes and N - n failures, they can happen in any order; we don't mind. Therefore the Choose term effectively represents the number of different orders in which the successes and failures can occur.

Note that we could equally have chosen the number of failures, and put NCN - n at the front. However, this is exactly the same! By choosing N - n failures, you automatically choose n successes. You can see this in symmetry of Pascal's Triangle, but that's another story. Conventionally, we use the one that is shorter to write. Mathematicians like brevity.

Log in or register to write something here or to contact authors.