Chi-square analysis is a statistical method to calculate the probability that two dichotomous variables within a sample or population are related. This is calculated with respect to the normal distribution.

For example: we may wish to determine if there is a relationship between blond hair and gender. Each variable can only have two values (i.e. dichotomous); blond/not blond and male/female. In this example we will observe 100 people, 48 males, 52 females, 13 women are blond, 10 men are blond. A contingency table is created to represent these values that looks like this:

Blond/ Not
F 13 39
M 10 38

Each column and row is then summed, in this case R1=52, R2=48, C1=23, C2=77. The value of each cell is f11=13, f12=39, f21=10, f22=38. The values are plugged into the Eq.1 to calculate a Chi-square value:


Eq.1 chisq = n(f11 f22- f12 f21)^2/ R1 R2 C1 C2

Plugging in the numbers of our example gives us a Chi-square value of 0.065967. This value is analogous in use to the Z-value of the normal distribution and the associated p=0.80 where degrees of freedom=1 and the chi-square is 0.068967.

If p<=0.05 we could conclude that the presence of blond hair is related to gender we must reject this hypothesis since p>0.05.

In cases where dichotomous variables have small frequencies we may use Haber's method to calculate chi-square as it is more robust than Eq.1. Eq. 2 is Haber's correction.


Eq.1 chisq = n^3D^2/ R1 R2 C1 C2

The variable D is used to replace part of the numerator of the calculation. D is determined by calculating f^ and d; f^=RminimumCminimum/n, d=abs(fminimum-f^)

and if f<=2f^ then D = the largest multiple of 0.5 that is < d;
if f > 2f^ then D=d-0.5.