regression effect - Everything2.com

This is a simple fact of statistics which is often misunderstood and can lead to the regression fallacy.

In its simplest form, the regression effect says that if you take new measurements of something that can vary, the mean of the group that measured lower will go up, and the mean of the group that measure higher will go down. All other things being equal, of course.

We may be measuring anything about any group of subjects: IQ tests on people, or the weight of geese, or the temperature at a bunch of different locations. As long as it can vary somewhat between measurements. The less it can vary between samplings, the lower the regression effect.

This makes sense if you think about it a bit. It's related to regression to the mean. If the measurement can vary from test to test, any group chosen by their below-average scores will contain some "average" subjects that are "having a bad test." When these subjects are re-tested, they score near the mean, and bring up the mean of the group. Similar, some high scorers are just lucky, and score average the next time around, bringing down the higher group's mean. Another way to think of it is: lower scores are more likely to vary upwards since there are "more" possibilities for high scores than low.

Let's imagine a measurement that is basically random for a given subject. Say, the number rolled in a sample of fair six-sided dice. Obviously, if we "remeasure" -any- subgroup, that group's average will be 3.5. If we remeasure the group that rolled 1s (whose mean is 1,) we will find their mean to have gone up 2.5 points. Similarly, the mean of the group that rolled a six will have dropped by 2.5 points. That, in its simplest form, is the regression effect.

The regression effect also applies to measurements that are strongly correlated over tests (this is, they are consistent from one measurement to the next for the same subject.) Let's say we're "measuring" phone numbers of people. Most people's numbers won't change if we measure them again a month later. Some of them will have moved or gotten new numbers, though. If we look at the people who had low numbers the first time we asked, they will still have mostly low numbers the second time around. The ones that have changed numbers, though, are more likely to have had their numbers go up than down (since there are more numbers above theirs than there are below.) So the new mean of this group will be higher. How much higher depends on how many phone numbers change between measurements (the correlation between samples, in statistics lingo) and how far from the mean the group was originally.

It is important that the groups be chosen only by their low/high means. Simply picking a group that has a low mean (say women in a measurement of height) will not cause the effect. Picking all people between 5' and 5'2" will, though.

regression fallacy	Unskilled and Unaware of It	regression to the mean	statistics
Sampling bias	How Not to be Wrong	Animal or Mammalian Self Commanding to Evil	Regression