Let L be a regular language. That is, L is a set of words which is recognised by a finite state automaton (or, equivalently, by some regular expression). Consider a DFA recognising L, and suppose L is infinite. Suppose a DFA D recognising L has k states. We may write the steps D takes when recognising a word w=w1w2...wn above the letters:
s1s2...sn
w1w2...wn
This just means that after reading w
1...w
k, D is in
state s
k.
Then for a word w in L of length >k, due to the pigeonhole principle, some state s must repeat itself: s=si=sj, i < j. Then D must also accept
w1w2...wiwj+1...wn
w1w2...wi...wjwi+1...wj...wn
and so on. In fact, if we write x=w
1...w
i, y=w
i+1...w
j, z=w
j+1...w
n (x and y may both be
empty), then w=xyz, and for all
nonnegative integers m, the word xy
mz is also in L.
We've just proved
[The pumping] lemma:
Let L be an infinite regular language. Then there exists some k such that if w∈L has more than k letters, then w=xyz where
For all j≥0, xykz∈L.
The pumping lemma is useful for proving certain languages are not regular. This approach is usually taken in introductory computation courses (see e..g pumping lemma proof that the balanced braces language is not regular). However, in practice the Myhill / Nerode theorem is much more useful for proving that languages are not regular.
Pumping lemmas also exist for CFGs (context free grammars); this is noded below.