From statistical linguistics
) is a word frequency
measurement for large blocks of text
. It measures the likelyhood of two noun
s, chosen at random from the text, being the same. Thus, it is a measure of the complexity
of the text, as well as its repetitive
ness. Yule's Characteristic measurements are given in the form of an positive integer
which represents the ratio of misses to hits of the block of text. That is, a K value of 300 means that for any pair of noun
s chosen at random
from the given text, there's a 1 in 300 chance that they will be the same.
Yule's Characteristic was invented in 1938 by George Udny Yule, a Cambridge statistician. It was most famously used as yet more fuel for the "Bacon was Shakespeare" debate in a 1957 paper. The paper showed that K measurements of Shakespeare's works were a useful metric for judging style, and varied predictably based on which act of each play was analyzed. However, in the author's own words, he "should not care to suggest that the characteristic is going to provide an infallible test of authorship."