The mutual information of two random variables, X and Y, is defined to be

I(X,Y) = Sum(Pr(x,y).log(Pr(x,y)/(Pr(x).Pr(y))))

Where Sum() is the sum over the alphabets of both X and Y. Alternatively, the mutual information can be expressed more intuitively as

I(X,Y) = H(X) - H(X|Y)

Where H(X) is the entropy of X and H(X|Y) is the conditional entropy of X on Y. This can be interpreted as as measure of the reduction in the uncertainty of X when we know the value of Y - in other words a measure of how much we learn about X by finding out the value of Y. The mutual information is symmetric, so we can equivalently write

I(X,Y) = H(Y) - H(Y|X)

which may be easier to evaluate in some cases.

The capacity of a noisy channel is defined to be the maximum value of the mutual information of the input and output variables with respect to input probability distribution.