Kleene's Theorem

Next: Converting a NFA to Up: Non-Deterministic Finite-State Automata Previous: Converting a regular expression Contents

Kleene's Theorem

We have shown how to convert a regular expression to an NFA; the proof that these represent the same language is straightforward (by induction). However, we have asserted that there is a correspondence in both directions; this is formalised in Kleene's Theorem:

KLEENE'S THEOREM, PART 1: To each regular expression there corresponds a NFA
Strategy: The corresponding NFA is the one constructed by Thompson's Algorithm; proof that it is equivalent is by induction over regular expressions.
Proof: Boring!

KLEENE'S THEOREM, PART 2: To each NFA there corresponds a regular expression
Strategy: Proof by induction over the states of the NFA (sort of).
The language represented by a NFA can be partitioned into the union of a number of smaller languages, each defined as follows:

Let the states of the NFA be numbered from 1 to .
Let and be states, and let be a number such that $0 \leq J \leq N$ .
Then:

$\begin{displaymath}L(p,q,J) = \left\{ \begin{array}{c} x \in \, \Sigma^* \, \mi... ...through no state numbered as high as $J$}\end{array} \right\} \end{displaymath}$

Note that if

is the start state, then the union of all

for all finish states

is the language accepted by the NFA.

Equally, if we can show that has a corresponding regular expression for all , and , we will have proved the theorem. The proof of this will proceed by induction over .

Proof: (Kleene's Theorem, part 2)

Base Case:
We must be able to get from to without passing through any states; therefore either =, or we go directly in one transition from to . The only strings that can be recognised in this case are $\varepsilon$ and single characters from $\Sigma$ . The rules and give us a regular expression corresponding to these situations.
Inductive Step:
The induction hypothesis is that for some such that $1 \leq K \leq N$ , the language has a corresponding regular expression. Now we must prove that, based on this assumption, has a corresponding regular expression.
Suppose the machine consumes some string ; we must find a regular expression corresponding to . Now suppose also that it passes through state on its way (if it doesn't, then is in , which has a corresponding regular expression by the induction hypothesis). Since the machine can ``loop'' on arbitrarily many times, we can split up into three substrings:

a

which moves the machine from state to state

b

which causes the machine to ``loop'' on state

c

which moves the machine from state to state

We note that while any of the above strings may cause us to start or finish at state , none of them actually cause us to move through . Thus we have:

$a\; \in \; L(p,K,K)$

$b\; \in \; L(K,K,K)$

$c\; \in \; L(K,q,K)$

By the induction hypothesis each of these have corresponding regular expressions, say A, B and C respectively, and thus the regular expression for is A(B*)C.
Since our choice of was arbitrary, we have proved the theorem.

${\cal QED.}$

Next: Converting a NFA to Up: Non-Deterministic Finite-State Automata Previous: Converting a regular expression Contents

James Power 2002-11-29