Appendix C. Computing Regular Expression for the Language Recognized by an FA a b 1
2
a
a,b
3
ε
a*b((a+b) + (a+b)a*b)*
b
1
Computing Regular Expression for the Language Recognized by an FA In section 7.2, we learned how to compute a regular expression which expresses the language recognized by a given FA. The idea is to eliminated the states one at a time until we can read off a regular expression from the reduced transition graph that denotes the language accepted by one of the accepting states of the FA. This is a manual technique, which is not easy to implement into a program. Here we shall show an algorithm known as the Cocke-Younger-Kasami (shortly CYK) algorithm, developed based on the dynamic programming technique.
2
Computing Regular Expression Let M = ( Q, Σ , δ , q0 , F ) be an FA, where Q = {1, 2, . . ., n } and q0 = 1. We define the following notation. • Rij (0) : a regular expression which denotes the set of the transition path labels going from state i to state j (with no intermediate states). • Rij (k) : a regular expression which denotes the set of transition path labels from state i to state j going through the states whose id’s are less than or equal to k. The figure below shows an example for computing Rij (0) . Notice that for each state i, there is an implicit self-loop with label ε . That is why every Rii (0) contains ε . a b R11 (0) = a+ε R12 (0) = b R13 (0) = ∅ 1 2 a R21 (0) = ∅ R22 (0) = ε R23 (0) = ε a,b ε 3 R31 (0) = a+b R32 (0) = a R33 (0) = b+ε b 3
Computing Regular Expression With Rij (0) computed, we can recursively compute Rij (k) , for all k ≥ 1, using the following recurrence. The figure below illustrates how this recurrence is derived. Rij (k) = Rij (k-1) + Rik (k-1) (Rkk (k-1) )*Rkj (k-1) Rij (k-1) j
i Rik (k-1)
k
Rkj (k-1)
Rkk (k-1) Given the matrix of size n × n for Rij (k-1) , we can compute Rij (k) using the above recurrence. It follows that starting with Rij (0) , we can iteratively compute Rij (n) . From this matrix for Rij (n) we collect R1f (n) , for every f ∈ F, combine them with the union operator + and finally construct a regular expression which denotes the language recognized by the FA. ( Recall that 1 is the start state, and hence, R1f (n) is a regular expression which expresses the language recognized by the FA in state f ∈ F.)
4
Computing Regular Expression
An example:
Rij i
(0)
j 1
2
3
1
a+ε
b
∅
2
∅
ε
ε
3
a+b
a
b+ε
a b 1
a,b Rij (1) = Rij (0) + Ri1 (0) (R11 (0) )*R1j (0)
Rij (1) 1 2 3
1
(a+ε )+(a+ ε )*(a+ε ) = a* ∅ + ∅ (a+ ε )*(a+ε ) =∅ (a+b)+(a+b)(a+ε )*(a+ε ) = (a+b)(ε +a*) = (a+b)a*
2
a ε
3
b 2
3
b+(a+ε ) ∅ +(a+ ε )(a+ε )* ∅ (a+ε )*b = b+a*b =∅ = a*b ε + ∅ (a+ε )*b = ε + ∅ (a+ε )* ∅ = ε ε a+(a+b)(a+ε )*b (b+ε )+(a+b)(a+ε )* ∅ = a+(a+b)a*b = (b+ε ) 5
Computing Regular Expression
An example Rij (1) i
j 1
2
3
a
1
a
ab
∅
2
∅
ε
ε
1
3
(a+b)a*
b+ε
a,b
*
*
a+(a+b)a*b
b
1
1 a*+(a*b)ε * ∅ = a*
2
∅ + ε ε *∅ =∅
3
(a+b)a*+ (a+(a+b)a*b)ε *∅ = (a+b)a*
2 (a*b) + (a*b)ε *ε = b+a*b = a*b ε +ε ε *ε = ε (a+(a+b)a*b)+ (a+ (a+b)a*b)ε *ε = a+ (a+b)a*b
3
ε
b
Rij (2) = Rij (1) + Ri2 (1) (R22 (1) )*R2j (1)
Rij (2)
2
a
3 ∅ + (a*b)ε *ε = a*b ε +ε ε *ε = ε (b+ε )+ (a+ (a+b)a*b)ε *ε = (a+b+ε )+(a+b)a*b 6
Computing Regular Expression
An example
Rij (2)
j 1
2
3
i 1
a*
a*b
a*b
2
∅
ε
ε
3
(a+b)a*
a+(a+b)a*b
a b
(a+b+ε )+(a+b)a*b
1 a,b
Rij (3)
Rij (3) = Rij (2) + Ri3 (2) (R33 (2) )*R3j (2)
3
ε
b
1
2
R11 (2) + R13 (2) (R33 (2) )*R31 (2)
R12 (2) + R13 (2) (R33 (2) )*R32 (2)
R13 (2) + R13 (2) (R33 (2) )*R33 (2)
2
R21 (2) + R23 (2) (R33 (2) )*R31 (2)
R22 (2) + R23 (2) (R33 (2) )*R32 (2)
R23 (2) + R23 (2) (R33 (2) )*R33 (2)
3
R31 (2) + R33 (2) (R33 (2) )*R31 (2)
R32 (2) + R33 (2) (R33 (2) )*R32 (2)
R33 (2) + R33 (2) (R33 (2) )*R33 (2)
1
3
2
a
7
Computing Regular Expression
An example Rij (2)
j 1
a
3
2
b
i 1
a*
a*b
a*b
1
2
∅
ε
ε
3
a,b
(a+b)a
*
a+(a+b)a b *
2
a 3
(a+b+ε )+(a+b)a b *
ε
b To save space in the last matrix for Rij (3) , we put the recurrences instead of the regular expressions. Since state 3 is the only accepting state of the given FA, the entry at R13 (3) should contain a regular expression that denote the language. Using Rij (2) above and the recurrence in R13 (3) we can compute the regular expression as follows. R13 (3) = R13 (2) + R13 (2) (R33 (2) )*R33 (2) = (a*b) + (a*b)((a+b+ε )+(a+b)a*b)*((a+b+ε )+(a+b)a*b)
8
Computing Regular Expression R13 (3) = R13 (2) + R13 (2) (R33 (2) )*R33 (2) = (a*b) + (a*b)((a+b+ε )+(a+b)a*b)*((a+b+ε )+(a+b)a*b) Even with the intermitent simplification of the regular expressions while computing the recurrence, this final result still appears rather complex for the simple FA. Can we further simplify this regular expression? Unfortunately, there is no algorithmic approach for simplifying regular expressions. If we use the state elimination technique given in Chapter 7 for the given FA, we can get a simpler expression as shown below. Finally, we present an algorithm on the next slide, that can be easily understood based on our discussion. a
a b
1 a,b
2
a 3
1
ε
Eliminate state 2
a+b
b r3 = (r11 )*r13 (r33 + r31 (r11 )*r13 )* = a*b((a+b) + (a+b)a*b)*
b 3 a+b
9
CYK Algorithm
Computing Regular Expression
//input: state transition graph of M, output: regular expression for L(M) for (i = 1; i <= n; i++ ) for (j = 1; j <= n; j++ ) //compute R(0) [i][j] if (i == j) if ( there are m >= 0 labels a1, . . ., am on i to i loop edge) R(0) [i][j] = “ε +a1+ . . .+am” ; // “ε “, if m = 0 else // (i ≠ j ) if (there are m > 0 labels a1, . . ., am on i to j edge) R(0) [i][j]= “a1+ . . .+am” ; else R(0)[i][j]= “∅” ; for ( k = 1; k <= n; k++ ) for (i = 1; i <= n; i++ ) for (j = 1; j <= n; j++ ) R(k) [i][j] = R(k-1) [i][j] + R(k-1) [i][k]( R(k-1) [k][k] )*R(k-1) [k][j] ; output ∑ f∈F R(n) [1][f] ; // F is the set of accepting states 10