As Sam Clemens
once wrote, "There are lies, damn lies, and statistics
." In this HOWTO
, we'll teach you how to make him roll over in his grave
by abusing statistics
There are a number of techniques to consider in lying with statistics:
- Use a small, biased, sample
- 4 out of 5 doctors surveyed recommend using our product!
- Of course, there are 4 doctors on our board. We asked them first. Then we called one at random from the AMA's member directory. Any correlation between our survey method and the results is pure serendipity.
- Use a self-selecting population
- 4 out of 5 of our female subscribers who responded to our survey cheated on their husbands!
- Sure, but you have two self-selections. First: You only surveyed your subscribers, not a random sample of all women. Second, since people had to respond, the ones most likely to respond were the ones who had cheated.
- Tie one data set to an earlier one, implying causality
- 100% of all crack addicts drank water before becoming addicted to crack. Water kills!
- The more interesting (and likely truthful) statistic is "How many water drinkers become addicted to crack?" By tying together two unrelated (or even semi-related) groups together, nearly anything can be proved. Try these out for size:
- Use a lower confidence to gain a higher probability
- Over 80% of America watches our show!
- Sample sets are merely predictors of the population at large. If 80% of a sample set has some attribute, then there is a confidence level associated with saying, "at least 80% of the population has this attribute." By increasing the probability, you lower your confidence, and, by decreasing the probability, you increase the confidence.
- Use obscure definitions and data sets
- 50% of Yankees are let go from their jobs at least once a year. The Yankee work ethic makes it hard to keep a job.
- First, what's a Yankee? An American, a New Englander, a Vermonter, a woodchuck? What does "let go" mean? Perhaps this statistic started as "50% of rural Vermonters have a second job in a seasonal industry, which supplements their annual income.
- Compare a statistic that affects most of the population to one that affects a small portion of the population
- You are more likely to be hit by lightning thrice than attacked by a shark.
- Well, let's see. Who is vulnerable to lightning strikes? Just about everyone in the world. Who is vulnerable to shark attack? Only those who swim in shark-infested waters. If you have a population, which 100% of its members have a 0.05% chance of event A happening to them, and 5% of its members have a 2% chance of event B happening, then the following three facts are true:
Unless you know which group someone is in, you can't really predict which is more likely to happen
- Event B happens with greater frequency than event A.
- People in the 5% group are more likely to have event B happen than event A.
- People in the other 95% are more likely to have event A happen than event B.
This is just a primer
. There is a statistics textbook
by the same name as this node*
for some more suggestions. Or, next time you hear an implausible
statistic, try to figure out the fact behind the fancy
*This is a hint that maybe the next wu here should be a book review.