More generally than a

binary digit, a

*bit* is a general

unit for

entropy or

information. Given a

random source with

2 possible equally-probable

states (a

fair coin), the best possible

compression of the source's current state will be

1 bit (

here "best" refers to the best average rate you can achieve even for long runs of the source; clearly if your source is `A` 99% of the time and `B` 1% of the time, you can compress a run of 1000 `A`'s and `B`'s to a lot less than 1000 bits; for instance, 10 bits suffice to give the location of each `B` or to say there are no more `B`'s, which gives an average of 110 bits). So a fair coin toss gives 1 bit of information; its

entropy is 1 bit.

More generally still, it is customary to measure *any* log odds ratio (any logarithm of the ratio of 2 probabilities) in bits! See the Naiman-Pearson lemma for an example of the use of such an odds ratio; since odds ratios have a huge dynamic range, taking a logarithm is a very "natural" thing to do. And the logarithm of a ratio is just the difference between the logarithms, so *any* log of the inverse of a probability is *also* measured in bits. What the Naiman-Pearson lemma says, when phrased this way, is that when trying to decide which of two probability distributions a sample came from, you should pick the distribution for which the sample gives *less* information. That explains the ratio appearing there. Of course, one of the distributions may be a lot more likely than the other, so choosing it requires a lot less information; taking into account the added information you get for choosing one distribution over the other gives the constant which appears in the lemma.

When natural logarithms are used, the result is in nats.