Construction of an NFA from a Regular Expression Computer Science 332 Compiler Construction
A constructive algorithm (proof) : cases of algorithm follow cases in definition Base case: NFA recognizing a in Σ , or ∈
3.7 From a Regular Expression to an NFA
Inductive cases: Disjunction Concatenation Kleene closure
Each case introduces no more than 2 states, so size of NFA is linear in size of reg. exp.
Thompson's Construction
Thompson's Construction
First parse r.e. into constituent subexpressions; e.g., (a | b)*abb parses to
Next build NFA from depth-first traversal of parse, according to following constructive rules:
r5 ( r1 a
r4 r3 |
) r2 b
r7 *
r9 r6 a
r11 r8
1. For ∈ construct the NFA r10
start
∈
b
b
2. For a in Σ construct the NFA start
a
Thompson's Construction 3. Suppose N(s) and N(t) are NFA's for regular expressions s and t a) For the reg. exp. (s | t) construct the following composite NFA N(s | t): ∈
N(s)
∈
start ∈
3. Suppose N(s) and N(t) are NFA's for regular expressions s and t b) For the reg. exp. st construct the following composite NFA N(st): start
N(s)
N(t)
∈ N(t)
I.e., merge final state of N(s) and initial state of N(t)
Thompson's Construction 3. Suppose N(s) and N(t) are NFA's for regular expressions s and t c) For the reg. exp. s* construct the following NFA N(s*): ∈ ∈
start
Thompson's Construction
∈ N(s)
∈
Thompson's Construction 3. Suppose N(s) and N(t) are NFA's for regular expressions s and t c) For the reg. exp. (s) use the existing NFA N(s)
Thompson's Construction New NFA has at most twice as many states as all of its component NFA's. New NFA has exactly one start state and one accepting state. Each state of new NFA has either one outgoing transition on a symbol in Σ or at most two outgoing ∈ transitions.
Thompson's Construction Let's build an NFA for r.e. (a | b)*abb with parse tree
r5 ( r1 a
Simulating an NFA
|
)
r6
r11 r8
r10 b
b
a
r2 b
Goal: Given reg. exp. r and input string x, determine whether x is in L(r) Method #1: Build NFA N from r using Thompson's construction, then run previous algorithm
S := ∈-closure({s0}) a := nextchar while a ≠ eos do begin S := ∈-closure(move(S, a)) a := nextchar end if S ∩ F ≠ ∅ then return ''yes'' else return ''no''
r3
*
r9
Time-Space Tradeoffs
Can use the following deterministic algorithm to simulate an NFA without building its DFA:
r4
r7
Can construct NFA in O(|r|) time. N has at most twice as many states as |r|, and at most two transitions from each state, so transition table is O(|r|) space. Previous algorithm accepts or rejects x in O(|r|×|x|) time.
Time-Space Tradeoffs
Goal: Given reg. exp. r and input string x, determine whether x is in L(r) Method #2: Build NFA N from r using Thompson's construction, then DFA D from N using subset construction; then use DFA algorithm from last time for accepting/rejecting x D can have up to 2k states, where k = # states in N. ''Worstcase'' string (a | b)* a (a | b)(a | b)...(a | b) : why? DFA acceptance algorithm accepts or rejects x in O(|x|)
Tradeoffs Summary: Automaton
Build
Run
NFA
O(|r|)
O(|r|×|x|)
DFA
O(2|r|)
O(|x|)
So use first method (NFA) for quick search over short text strings (e.g., emacs r.e. search) Use second method (DFA) for longer searches over long text strings (e.g., Unix grep on multiple files) ''Lazy'' DFA method builds DFA transition table on the fly, caching transitions as state/input pairs are encountered.