The mutual information of two random variables, X and Y, is defined to be
I(X,Y) = Sum(Pr(x,y).log(Pr(x,y)/(Pr(x).Pr(y))))
Where Sum() is the sum over the alphabets of both X and Y. Alternatively, the mutual information can be expressed more intuitively as
I(X,Y) = H(X) - H(X|Y)
Where H(X) is the entropy of X and H(X|Y) is the conditional entropy of X on Y. This can be interpreted as as measure of the reduction in the uncertainty of X when we know the value of Y - in other words a measure of how much we learn about X by finding out the value of Y. The mutual information is symmetric, so we can equivalently write
I(X,Y) = H(Y) - H(Y|X)
which may be easier to evaluate in some cases.
The capacity of a noisy channel is defined to be the maximum value of the mutual information of the input and output variables with respect to input probability distribution.