Kleene's theorem - Everything2.com

A language L over the alphabet Σ is regular if and only if there is a finite automaton that accepts L

(If this is new to you, perhaps browse some of the nodes at formal language theory first.)

Kleene's theorem states that in order to determine whether or not a language is regular, it is sufficient to determine whether or not an FA can be constructed to accept it. Similarly, given any FA, the language which it accepts is regular. Since we know that NFA's, NFA-Λ's¹, and FA's are all equivalent (explanations for this are under those individual nodes), and since NFA-Λ's have the "loosest" definition, it is easiest just to prove the theorem for NFA-Λ's. The proof takes two parts:

I. Given a regular language L over the alphabet Σ, there exists an NFA-Λ which accepts L

We prove this by induction. A regular language is defined recursively by saying that the language containing zero elements (L = {∅}) is regular, any language with the individual element {x} where x ∈ Σ is regular, and any language L is regular if it consists of other regular languages L₁ ... L_n and the operations of concatenation (x ∈ L₁L₂ if x = ab such that a ∈ L₁ and b ∈ L₂), union (x ∈ L₁ ∪ L₂ if x ∈ L₁ or x ∈ L₂), and Kleene * (x ∈ L* if x = a₁a₂...a_n (n ≥ 0) such that a_n ∈ L for all n).

In order to prove that there exists an NFA-Λ for every regular language L, we then need to prove that we can construct an NFA-Λ M which accepts any given member of the basis set of languages (L = {∅} and all languages L = {x} where x ∈ Σ), and that given two regular languages L₁ and L₂, each of which is accepted by an FA (M₁ and M₂, respectively), we can construct an NFA-Λ M which accepts L₁L₂, L₁ ∪ L₂, and L₁* (or L₂*).

The first part of this proof is simple: if L = {∅}, the NFA-Λ M which we construct has a single state q₀, no accepting states, and a transition function δ(q₀,a) = {∅} for all a. If L = {λ}, M has a single state q₀, which is also the only accepting state and the initial state, and δ(q₀,a) = {∅} for all a. If L = {x}, where x ∈ Σ, M has two states: an initial state q₀ and q₁, which is the only accepting state, δ(q₀,a) = {∅} for all a ≠ x, δ(q₀,x) = {q₁}, and δ(q₁,a) = {∅} for all a.

Next we must be able to construct an NFA-Λ M which accepts the concatenation of two regular languages L₁ and L₂, each accepted by their own NFA-Λ M_n = {Q_n,Σ,q_n, A_n,δ_n}. To simplify things, we relabel Q₁ and Q₂ if necessary so that Q₁ ∩ Q₂ = {∅}. In order for a language to be accepted by M, it must first be accepted by M₁, and then by M₂, so we define M by joining M₁ and M₂ with a Λ-transition, making the accepting states of M₂ the accepting states of M. Formally, we define M to have states Q = Q₁ ∪ Q₂, initial state q = q₁, accepting states A = A₂, and a transition function δ defined as follows:

δ(x,a) = {

δ₁(x,a) for all a where x ∈ Q₁ and x ∉ A₁,

δ₂(x,a) for all a where x ∈ Q₂,

δ₁(x,a) where a ≠ Λ where x ∈ A₁,

δ₁(x,Λ) ∪ {q₂} where a = Λ and x ∈ A₁

}

It is easy to see (or at least, it should be easy to see) that this NFA-Λ accepts L₁L₂.

Next, we must construct an NFA-Λ M which accepts L = L₁ ∪ L₂. In this case, a string x is in the new language L if x ∈ L₁ or x ∈ L₂. The new NFA-Λ will contain a new initial state q with Λ-transitions to each of M₁ and M₂, and M will be in an accepting state if either M₁ or M₂ is in an accepting state. Formally:

Q = Q₁ ∪ Q₂ ∪ q

A = A₁ ∪ A₂

δ(q,Λ) = {q₁,q₂}

δ(q,a) = {∅} for all a ≠ Λ

and δ(p,a) = δ₁(p,a) or δ₂(p,a) depending on whether a ∈ Q₁ or Q₂

(As before, we relabel Q₁ and Q₂ if necessary so that Q₁ ∩ Q₂ = ∅). Once again, it should be fairly intuitive that M accepts L₁ ∪ L₂.

Last, we must construct an NFA-Λ M which accepts L₁*, given an NFA-Λ M₁ which accepts L₁. Λ is a valid string in this language, so the initial state q must also be an accepting state. We construct M by basically taking M₁ and looping it back in on itself with Λ-transitions. M has one accepting state, q (which is also the initial state). Q = Q₁, and:

δ(q,Λ) = {q₁},

δ(q,a) = {∅} for a ≠ Λ,

δ(p,a) = δ₁(p,a) for all a except where p ∈ A₁ and a = Λ in which case

δ(p,Λ) = δ₁(p,Λ) ∪ q

For the third time, the rest of this part of the proof is intuitive. Now, we must prove the theorem in the other direction:

II. Any language which is accepted by an NFA-Λ is regular.

As before, it is easiest to prove this using induction. First, though, we have to reduce the problem to as simple a form as possible. Given an NFA-Λ M, the set of languages recognized by M corresponds to the set of all strings x such that δ*(q,x) ∈ A.² We can break A down into individual components, and since the set of regular languages is closed over the union operator, it is sufficient to prove that for any p ∈ A, the language L = { x | δ*(q,x) = p } is regular. In other words, we need to prove that every path between q and p is regular. It might make sense to you (or not, depending on how you think) to use induction on the length of these paths. Since a path between q and p could possibly loop back on itself an infinite number ot times, though, it makes more sense to talk about the number of distinct states that a path between p and q passes through.

If a path x between p and q passes through an intermediate state r, we can split x up into three parts: y, which goes from p to r; w, which may or may not loop back through other states before returning to r; and z, which goes from r to q. Formally, x = ywz; where δ*(p,y) = r, δ*(r,w) = r, and δ*(r,z) = q. To make life easier, we number the states of the NFA-Λ in order from 1 to n, where n is the number of states. We then define X(p,q,k) to be the set of all paths between p and q which pass through no states numbered higher than k. If k = n, X(p,q,k) is simply the set of all paths between p and q, so if we use induction on k we can prove that for any p and q, every path between p and q is regular.

We first consider the case where k = 0. In this case, X(p,q,0) is the set of all paths between p and q which pass through no nodes at all, so X(p,q,0) = {∅}. From the definition of a regular language, we know that {∅} is regular. Next, we must show that if X(p,q,k) is regular, X(p,q,k+1) is also regular. It is once again sufficient to consider only an individual path x ∈ X. If the path doesn't contain the node labelled k+1, then it is also in X(p,q,k), and is therefore regular. So, we only consider the case where x passes through the node labelled k+1. As before, we can break the path down into three segments.

The first segment y connects p and the node labelled k+1, the second segment w (which may loop back any number of times) connects k+1 with k+1, and the third segment z connects k+1 and q. Therefore, x = yw*z, and since neither y, w, nor z pass through k+1, all three are contained in X(p,q,k) and are (from our inductive hypothesis) regular. Therefore, X(p,q,k) is regular for any k ≥ 0, so every path between any two states p and q is regular, and the entire language accepted by the NFA-Λ is therefore regular.

A nondeterministic finite automaton with Λ-transitions is simply an NFA where null-transitions are allowed. This means that there can be places where the NFA transitions between two states with no input at all, that is with null input. An NFA-Λ M₁ can be converted to a plain old NFA M by (along with some other technicalities) setting the transition function δ of M as the Λ-closure of δ₁. The Λ-closure of a set of states over δ is defined as Λ(X) = {x | x ∈ δ*(a,Λ*), for some a ∈ X}, or in plain english the set of states that are reachable from X with only null-transitions (including the original set X). (This is probably not the standard definition, but it suffices to explain the concept.)

For those of you to whom this operator is new, δ*(q,x) is the set of all states that can be reached from q by following the string x. Formally, δ*(q,a) = δ(q,a) where a ∈ Σ, and δ*(q,xa) = δ(δ*(q,x),a). In other words, δ* is the "string version" of δ.

For more information:
John Martin, Introduction to Languages and the Theory of Computation

formal language theory	Lemma of Choice	March 5, 2004	Kleene Closure
Kleenex	Wolff-Parkinson-White syndrome	nondeterministic finite automaton	Finite automaton
Halting problem	regular language	Chomsky hierarchy