The complete Chomsky hierarchy consists of the following sets of languages, described by the appropriate grammar, and accepted by the machine:

Every formal language in the list is a proper subset of all the languages that come after it in the list, every grammar can express constructions of any of the grammars before it, and every machine in the list can be simulated by the machines that come after it. Turing machines can be constructed, of course, that are capable of simulating any arbitrary Turing machine as well (universal Turing machines).

Natural languages are in general not context-free, but linguists have found that while the context-sensitive languages seem to be a large enough class to describe natural languages, they also seem to contain a much bigger class of formal languages than that. Even worse, CSL's can take up to exponential time to simulate on ordinary computers (this is true even on a hypothetical computer endowed with nondeterminism), making them totally unworkable for practical use. Ongoing research on computational linguistics has thus focused on formulating other classes of languages that are "mildly" context-sensitive and can be simulated in polynomial time, such as the tree adjoining grammars, coupled context-free languages, and linear context-free rewriting systems. The languages described by these formalisms properly lie between the context-free and context-sensitive languages.

The tree adjoining grammars mentioned above are enjoying a renaissance of research because not only do they seem to be useful as a tool for the study of natural language, but they also seem to be natural for characterizing the structure of RNA secondary structures in molecular biology and virology.


Thanks to alfimp for suggestions on the relationship between natural languages and the Chomsky hierarchy, and for looking for some resources on the topic.