A linguistic corpus put together by Roger Brown. The Brown corpus includes radio show transcripts, stories from the Wall Street Journal, and a number of public domain novels (in the sense that their copyright had lapsed), if I recall correctly. A common complaint about the corpus is that it contains language which is more formal or stylized than ordinary speech and thus conclusions drawn from studies of this sample are spurious at best since.

Log in or register to write something here or to contact authors.