In the early 1950s the linguist Morris Swadesh
proposed to study the dating of language change by using a list of one hundred basic concepts, assumed to exist in all languages, and comparing how many items were different between related languages.
The approach he founded is called glottochronology, or sometimes lexicostatistics. It makes two key assumptions, first that these basic terms (earth, man, egg, sit, bite, I, this) are likely to be more conservative, and not be replaced by borrowings or meaning shifts as much as other, more cultural items would be. The Swadesh 100 list would track the oldest and most stable core of languages.
The second assumption is that replacement of these items happens at some fairly identifiable constant rate, so that the amount of difference between two languages maps linearly to the time since they parted from their common ancestor. Swadesh calibrated his list on the best established dates then known, basically those of Indo-European languages, and came up with a figure of 14% replacement per millennium. This also means that language relationships can be traced no further back than about 7000 years, since by then they resemble each other no more than random noise even if they are in fact related.
Unfortunately this second assumption is wrong: languages change at variable rates. They all change continuously, but word replacement need not proceed at the same rate as change in grammar or pronunciation. For example, Icelandic has changed in pronunciation at a normal rate over the last thousand years, but its Swadesh 100 list is almost identical over that period. By contrast, in the last thousand years English has changed greatly: the massive influx from Norman French affects about half the dictionary, though in the basic 100 list only a few words are from French (mountain, person, round).
The first assumption might be okay. Comparison of Swadesh list probably does give us a rough measure of the degree of relatedness. We can see that English and German are more closely related to each other than either is to Danish, and that they are all equally distant from French or Russian, and unrelated to Arabic. It is only the calibration to specific time depths that has been discredited, and very few if any linguists still try to use it for dating these days.
The European languages have been written down for a long time, and this disguises differences: in print English all and German alle are almost identical, but the main use of the Swadesh list would be on much less well-known languages, such as unwritten Amazonian ones, where we want to reconstruct their history. So we should compare English [O:l] and German [al@].
The list of 100 basic words:
The number 100 is arbitrary; there are also lists of 200 words or thereabouts, and these contain a few terms like 'rain' and 'snow' that won't be found everywhere. The principle of translation
of the lists is to try to find a general term rather than a specific one (so 'woman' rather than 'wife'). It should also be noted that several of the English words are ambiguous
, in the sense that only one of their senses is the one to be translated: 'child' means young human, not offspring; 'man' means male adult; 'skin' means human skin; 'know' is knowing a fact, not a person; 'you' is singular.
You also use the ordinary word, and don't hunt out less usual near-synonyms that you know to be cognate, because this would distort the list in favour of the known relationships. In the nature of things, we calibrate these lists on language groups (like Indo-European) where we have some independent idea of time depths; but we use it to try and extract time-depth information from groups we don't have a history for. So we want to avoid a bias towards cognates that have drifted apart.
A recent issue of Nature (vol. 426, 27 November 2003) contained a brief paper "Language Tree Divergence Times Support the Anatolian Theory of Indo-European Origin" by two biologists, Russell Gray and Quentin Atkinson, claiming to have used improved statistical techniques to extract usable dating from Indo-European comparisons. The reaction of the linguistic community was one of puzzlement and scepticism on the whole: the paper itself was not clear enough about the techniques to judge if they could really control for the random fuzziness and variable rates across the family.
For more on the Gray--Atkinson paper see http://itre.cis.upenn.edu/~myl/languagelog/archives/000208.html