More generally than a binary digit, a bit is a general unit for entropy or information. Given a random source with 2 possible equally-probable states (a fair coin), the best possible compression of the source's current state will be 1 bit (here "best" refers to the best average rate you can achieve even for long runs of the source; clearly if your source is A 99% of the time and B 1% of the time, you can compress a run of 1000 A's and B's to a lot less than 1000 bits; for instance, 10 bits suffice to give the location of each B or to say there are no more B's, which gives an average of 110 bits). So a fair coin toss gives 1 bit of information; its entropy is 1 bit.

More generally still, it is customary to measure any log odds ratio (any logarithm of the ratio of 2 probabilities) in bits! See the Naiman-Pearson lemma for an example of the use of such an odds ratio; since odds ratios have a huge dynamic range, taking a logarithm is a very "natural" thing to do. And the logarithm of a ratio is just the difference between the logarithms, so any log of the inverse of a probability is also measured in bits. What the Naiman-Pearson lemma says, when phrased this way, is that when trying to decide which of two probability distributions a sample came from, you should pick the distribution for which the sample gives less information. That explains the ratio appearing there. Of course, one of the distributions may be a lot more likely than the other, so choosing it requires a lot less information; taking into account the added information you get for choosing one distribution over the other gives the constant which appears in the lemma.

When natural logarithms are used, the result is in nats.