This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA
a
b
1
C0
p
Ci
1
C0
q
Cj
(a) CIs for the first input a and b
C0 b
(b) State transitions of 1-DFA M’ 528
2-way FA 1-way FA
Algorithm
Since M is deterministic, and given a pair
s
a
r
t
s
a
(a, L)
a r
(a) 2-way NFA
Cj
Ci
r
<s, Cj>
q
(c) Transitions of 1-way NFA
t
Ci
Ck
(b) Computing CI 529
2-way FA 1-way FA
Algorithm
Previous example is somewhat misleading, because the state transition graphs of the two automata appear isomorphic. This is not true in general. Consider the 2-way NFA in figure (a) below which has a self-looping transition on state t. The 1-way NFA constructed by the algorithm may not necessarily have a self-loop on state because the pair
. a
(a, R)
s
a
r
a
t (a, L)
<s, Cj>
(a, R)
(a) 2-FA transitions
2-way FA 1-way FA
Algorithm
The 1-way FA model has many attracting properties, such as on-line (computes while receiving the input string), real-time (takes exactly |x| steps for input string x), easier to implement in hardware than the other model, etc. On the contrary, the 2-way FA model is off-line and the computing time may take longer than the input length. Because of the 2-way property, it is not easy to analyze or manipulate them. For example, consider the 2-way FA shown below. Though the automaton appears simple, it is not easy to figure out the language L(M) recognized by the machine. (a, R) (b, R) 1
(a, R) (b, R) 2
(a, L)
3
L(M) = (a+b)ba*
(b, L)
531
Appendix C. Computing Regular Expression for the Language Recognized by an FA a b 1
a
a,b
3
2
a*b((a+b) + (a+b)a*b)*
b
532
Computing Regular Expression for the Language Recognized by an FA In section 7.2, we learned how to compute a regular expression which expresses the language recognized by a given FA. The idea is to eliminated the states one at a time until we can read off a regular expression from the reduced transition graph that denotes the language accepted by one of the accepting states of the FA. This is a manual technique, which is not easy to implement into a program. Here we shall show an algorithm known as the Cocke-Younger-Kasami (shortly CYK) algorithm, developed based on the dynamic programming technique.
533
Computing Regular Expression
Let M = ( Q, , , q0 , F ) be an FA, where Q = {1, 2, . . ., n } and q0 = 1. We define the following notation. • Rij(0) : a regular expression which denotes the set of the transition path labels going from state i to state j (with no intermediate states). • Rij(k) : a regular expression which denotes the set of transition path labels from state i to state j going through the states whose id’s are less than or equal to k. The figure below shows an example for computing Rij(0). Notice that for each state i, there is an implicit self-loop with label . That is why every Rii(0) contains .
a b 1
a
a,b
3
2
R11(0) = a+
R12(0) = b
R13(0) =
R21(0) =
R22(0) =
R23(0) =
R31(0) = a+b
R32(0) = a
R33(0) = b+
b 534
Computing Regular Expression With Rij(0) computed, we can recursively compute Rij(k), for all k ≥ 1, using the following recurrence. The figure below illustrates how this recurrence is derived. Rij(k) = Rij(k -1) + Rik(k -1)(Rkk(k -1))*Rkj(k -1) Rij(k-1) j
i Rik(k-1)
k
Rkj(k-1)
Rkk(k-1) Given the matrix of size n n for Rij(k-1), we can compute Rij(k) using the above recurrence. It follows that starting with Rij(0), we can iteratively compute Rij(n). From this matrix for Rij(n) we collect R1f(n), for every f F, combine them with the union operator + and finally construct a regular expression which denotes the language recognized by the FA. ( Recall that 1 is the start state, and hence, R1f(n) is a regular expression which expresses the language recognized by the FA in state f F.) 535
Computing Regular Expression
An example: j
Rij i
(0)
1
2
3
1
a+
b
2
3
a+b
a
b+
a b 1
a,b Rij(1) = Rij(0) + Ri1(0)(R11(0))*R1j(0)
Rij(1) 1
1 (a+)+(a+ )*(a+) = a*
2
a
3
b 2
3
b+(a+)(a+)*b = b+a*b = a*b
+(a+ )(a+)* =
2
+ (a+ )*(a+) =
+ (a+)*b =
3
(a+b)+(a+b)(a+)*(a+) = (a+b)(+a*) = (a+b)a*
a+(a+b)(a+)*b = a+(a+b)a*b
+ (a+)* = (b+)+(a+b)(a+)* = (b+) 536
Computing Regular Expression
An example Rij(1) i
j 1
2
3
1
a*
a*b
2
1
3
(a+b)a*
a+(a+b)a*b
b+
a,b
a
b
1
2
3
3
1
a*+(a*b)* = a*
2
+ * =
+* =
+* =
3
(a+b)a*+ (a+(a+b)a*b)* = (a+b)a*
(a+(a+b)a*b)+ (a+(a+b)a*b)* = a+(a+b)a*b
(b+)+ (a+(a+b)a*b)* = (a+b+)+(a+b)a*b
(a*b) +(a*b)* = b+a*b = a*b
2
b
Rij(2) = Rij(1) + Ri2(1)(R22(1))*R2j(1)
Rij(2)
a
+ (a*b)* = a*b
537
Computing Regular Expression
An example
Rij(2)
j 1
2
3
i 1
a*
a*b
a*b
2
3
(a+b)a*
a+(a+b)a*b
(a+b+)+(a+b)a*b
a
b 1
a
a,b
3
Rij(3) = Rij(2) + Ri3(2)(R33(2))*R3j(2)
Rij(3)
b
1
2
R11(2) + R13(2)(R33(2))*R31(2)
R12(2) + R13(2)(R33(2))*R32(2)
R13(2) + R13(2)(R33(2))*R33(2)
2
R21(2) + R23(2)(R33(2))*R31(2)
R22(2) + R23(2)(R33(2))*R32(2)
R23(2) + R23(2)(R33(2))*R33(2)
3
R31(2) + R33(2)(R33(2))*R31(2)
R32(2) + R33(2)(R33(2))*R32(2)
R33(2) + R33(2)(R33(2))*R33(2)
1
3
2
538
Computing Regular Expression
An example Rij(2)
j 1
a
3
2
b
i 1
a*
a*b
a*b
1
2
a,b
3
(a+b)a*
a+(a+b)a*b
(a+b+)+(a+b)a*b
a
3
2
b To save space in the last matrix for Rij(3), we put the recurrences instead of the regular expressions. Since state 3 is the only accepting state of the given FA, the entry at R13(3) should contain a regular expression that denote the language. Using Rij(2) above and the recurrence in R13(3) we can compute the regular expression as follows. R13(3) = R13(2) + R13(2)(R33(2))*R33(2) = (a*b) + (a*b)((a+b+)+(a+b)a*b)*((a+b+)+(a+b)a*b)
539
Computing Regular Expression R13(3) = R13(2) + R13(2)(R33(2))*R33(2) = (a*b) + (a*b)((a+b+)+(a+b)a*b)*((a+b+)+(a+b)a*b) Even with the intermitent simplification of the regular expressions while computing the recurrence, this final result still appears rather complex for the simple FA. Can we further simplify this regular expression? Unfortunately, there is no algorithmic approach for simplifying regular expressions. If we use the state elimination technique given in Chapter 7 for the given FA, we can get a simpler expression as shown below. Finally, we present an algorithm on the next slide, that can be easily understood based on our discussion. a
a b
1 a,b
a 3
1
2
Eliminate state 2
b
a+b
b r3 = (r11)*r13(r33 + r31(r11)*r13)* = a*b((a+b) + (a+b)a*b)*
3 a+b
540
CYK Algorithm
Computing Regular Expression
//input: state transition graph of M, output: regular expression for L(M) for (i = 1; i <= n; i++ ) for (j = 1; j <= n; j++ ) //compute R(0)[i][j] if (i == j) if ( there are m >= 0 labels a1, . . ., am on i to i loop edge) R(0)[i][j] = “+a1+ . . .+am” ; // ““, if m = 0 else // (i j ) if (there are m > 0 labels a1, . . ., am on i to j edge) R(0)[i][j]= “a1+ . . .+am” ; else R(0)[i][j]= “” ; for ( k = 1; k <= n; k++ ) for (i = 1; i <= n; i++ ) for (j = 1; j <= n; j++ ) R(k)[i][j] = R(k-1)[i][j] + R(k-1)[i][k]( R(k-1)[k][k] )*R(k-1)[k][j] ; output fF R(n)[1][f] ; // F is the set of accepting states 541
Appendix D. Properties of Deterministic Context-free Languages 1. A CFL that cannot be recognized by any DPDA 2. Closure property of DCFL’s under complementation 3. Making a DPDA read the input up to the end of the input
542
Properties of DCFL's 1. A CFL that cannot be recognized by a DPDA Let LCFL and LDCFL be, respectively, the classes of CFL’s and DCFL’s. The theorem below shows that LDCFL LCFL. In other words, it says that there is a CFL that cannot be recognized by a DPDA, but by an NPDA. (Recall that, in contrast, every language recognized by an NFA can also be recognized by a DFA.) This theorem can be proved in two ways, which are both interesting. Theorem 1. There is a CFL that cannot be recognized by a DPDA. Proof (Non-constructive). The complement of every DCFL is also a DCFL. (We will show this by the proof of Theorem 2 below.) In Chapter 9, we showed a CFL whose complement is not CFL, which implies the theorem. •
543
Normal Form of PDA
Properties of DCFL’s
We need the following lemma for the constructive proof of Theorem 1. This lemma, which simplifies the PDA model, will also be used for the proof of Theorem 2. Lemma 1 (Normal form of PDA). Every CFL can be recognized by a PDA which satisfies the following conditions.
(1) The PDA never empties the stack (i.e., it does not pop Z0 ), (2) when pushes, the machine pushes exactly one symbol, and (3) never changes the stack-top. Proof. Let M = (Q, , , , q0, Z0, F) be a PDA, where p, q Q, A, B and a {}. Notice that conditions (2) and (3) does not allow a pushing move, like (p, a, A) = (q, BC), where the original stack-top A is changed to C. This normal form applies all PDA’s, either deterministic or nondeteriministic. In the rumination section at the end of Chapter 4, we showed that condition (2) does not affect the language recognized by a PDA. Here we show that the lemma is true for conditions (1) and (3).
544
Properties of DCFL’s
Normal Form of PDA
Suppose that a PDA M does not satisfy condition (1) and has a move which pops the bottom of the stack symbol Z0 as shown in figure (a) below. Since, with the stack empty, the PDA cannot have any move, we can simply let it push a new stack symbol, say X0 , on top of Z0 instead of popping it as shown in figure (b). This modified PDA M’ recognizes the same language.
(. , Z0 / ) start
(. , Z0 / )
(a) PDA M
(. , z0 / X0Z0 ) start (. , z0 / X0Z0 )
(b) PDA M'
545
Properties of DCFL's
Normal Form of PDA
Now, suppose that PDA M satisfies conditions (1) and (2), except for condition (3). We convert M to M’ such that M’ keeps the stack-top symbol of M in its finite state control and simulates M as illustrated in the following figure. Notice that when the stack of M is empty, M’ keeps a copy of Z0 in its finite state control. PDA M’ never rewrites its stack top and recognizes the same language. •(By keeping the stack top in the finite state control, we are increasing the number of states of the PDA.)
(a) M Z0
(b) M'
Z0 Z0
Z0A
.. BA
.. BC
A
A
C
Z0
.. B
.. B
.. BDA
A N I
A
.. BD 546
Properties of DCFL's
Proof of Theorem 1
Proof of Theorem 1 (constructive). Now, we will show that no DPDA recognizes the palindrome language L = { wwR | w {a, b}+ }. (This language is context-free, because we can easily construct a CFG, or a NPDA as shown in Section 5.2) To the contrary, suppose that there is a DPDA M which recognizes L. Let qx and Ax be, respectively, the state of M and a stack-top symbol, when M has read the input up to some prefix x of the input string. M may read additional input string segment z before it pops Ax (see the figure below). By [qx, Ax], we shall denote such a pair. x [qx, Ax]
x
qx Z0
z
t
Ax
Z0
Ax
547
Properties of DCFL's
Proof of Theorem 1
Since M is a DPDA, for a given string x, there exists a unique pair [qx, Ax]. For the proof of the theorem, we will first show that if M recognizes the palindrome language L, there are two different strings x and y for which [qx, Ax] = [qy, Ay]. For such strings x and y, we can easily find a string z such that xz L and yz L. Let’s examine what will happen for M with input strings xz and yz. When the machine reads up to x and y, it enters in the same state (i.e., qx = qy ) with the same stack-top (i.e., Ax= Ay ), and never pops it while reading the remaining part z of the input. It follows that M should either accept both xz and yz or both not. We are in a contradiction because M is a DPDA. x [qx, Ax]
x
qx Z0
z
t
Ax
Z0
Ax
548
Proof of Theorem 1
Properties of DCFL's
For an arbitrary input string u {a, b}+, let u be the content of the stack when M has read up to the last symbol of string u (figure (a)). Let v {a, b}* be a string such that given uv as an input, the machine reduces u to its minimum (uv in figure (b)) by the time when M reads the last symbol of string uv. u
u
v
quv Z0
u
(a)
Z0
uv (b)
In other words, v is a string which appended to u and given uv as an input, M reduces the stack height |u|-|uv| to its minimum. Thus after processing input u, no other string v’ appended to u and given as an input, M never pops the content of uv. Notice that depending on u, there can be more than one v that minimizes uv. In special case, it is also possible to have v = , i.e., u, = uv. Clearly, for every string u, there exists such a string v. 549
Proof of Theorem 1
Properties of DCFL's
u
u
v
quv Z0
u (a)
Z0
uv (b)
Let [quv, Auv] be the pair of state and the stack-top symbol (of uv ) when M reads the last symbol of the input string uv as figure (b) above illustrates, and define the following sets S and T. S = { [quv, Auv] | u {a, b}+, and v {a, b}* that gives the shortest |uv| } T = { uv | u {a, b}+, and v {a, b}* that gives the shortest |uv| }
Since the number of states of M and the size of the stack alphabet is finite, so is the set S. However, T is infinite, because it contains uv for every string u {a, b}+. Clearly, for every string x T there exists a pair [qx, Ax] S. It follows that for two distinct strings x, y T, there must be one pair [qx, Ax] = [qy, Ay] in S. 550
Proof of Theorem 1
Properties of DCFL's
Now, with the two strings x, y T for which [qx, Ax] = [qy, Ay], we find a string z such that xz L and yz L as follows. (1) If |x| = |y|, we let z = xR. Then clearly, xz = xxR L and yz = yxR L. (2) If |x| |y|, we construct z as follows: Suppose that |x| < |y|. (The same logic applies when it is assumed the other way.) Let y1 be the prefix of y such that |y1| = |x|, and let y = y1y2. Find a string w such that |w| = |y2| and w y2 and construct string z = wwRxR . Clearly, xz = xwwRxR L and yz = y1y2wwRxR L. (Notice that because of the three conditions |y1| = |x|, |w| = |y2| and w y2 , string yz does not have the palindrome property of L.) Now, let’s examine what will happen with the DPDA M for two input strings xz and yz. We know that for the two input strings, [qx, Ax] = [qy, Ay], which implies that M must either accept both xz and yz, or both not, because the DPDA is computing with the same input z starting with the same state qx ( = qy) and the same stack top Ax ( = Ay). This contradicts the supposition that M is a DPDA. It follows that L is not a CFL recognizable by a DPDA.
551
Properties of DCFL's
2. Properties of DCFL's Theorem 2. Let L1 and L2 be arbitrary DCFL’s, and let R be a regular language. (1) L1 R is also DCFL. (2) The complement of L1 is also DCFL. (3) L1 L2 and L1 L2 are not necessarily a DCFL. In other words, DCFL’s are not closed under union and intersection. Proof. We assume that every DFA and DPDA read the input string up to the last symbol without rejecting the input in the middle. It needs a long and complex proof to show that we can take such assumption. We shall defer this part of the proof toward the end of this appendix.
552
Properties of DCFL's
Proof Proof (1). L1 R is also a DCFL.
Let M and A be, respectively, a DPDA and a DFA which recognizes L1 and R. With the two automata M and A, we construct a DPDA M' which recognizes the language L1 R as follows. With the transition functions of M and A in its finite state control, the DPDA M' simulates both M and A, keeping track of their states. M' simulates A only when M reads the input. (Recall that PDA’s can have an -move, in which they do not read the input.) DPDA M' enters an accepting state if and only if both M and A simultaneously enter an accepting state. Since M and A both read the input up to the end of the input, clearly, M' recognizes the language L1 R. x A M'
M
553
Properties of DCFL's
Proof Proof(2) The complement of L1 is also DCFL.
Let M be a DPDA recognizing L1. Unfortunately, we cannot use the simple technique of converting the accepting states to non-accepting states, and vice versa, that we used to prove the closure property of regular languages under complementation. Let’s see why. Suppose that M takes a sequence of -moves, where the machine computes only with the stack, without reading the input as illustrated in the following figure. (In the figure, the heavy circle denotes an accepting state.) If the input symbol a, read by the machine entering state p, is the last input symbol, then the input string will be accepted, because it enters an accepting state after (not necessarily right after) reading the last input symbol. (, ./..) (a, ./..)
(, ./..)
(, ./..)
(b, ./..)
p
554
Properties of DCFL's
Proof
Let’s see what will happen, if we convert the accepting state to non-accepting state, and vice versa as shown below (figure (b)). Still the machine accepts the input, because it enters an accepting state after reading the last symbol a. To solve this problem, we will use a simulation technique.
(, ./..)
(a, ./..)
(, ./..)
(, ./..)
(b, ./..)
p
(a) (, ./..) (a, ./..)
(, ./..)
(, ./..)
(b, ./..)
p
(b)
555
Properties of DCFL's
Proof
We construct a DPDA M' which simulates M to recognizes the complement of L(M) as follows. M' keeps the transition function of M in its finite state control and uses its own stack for M. The simulation is carried out in two cases (a) and (b), depending on whether M enters an accepting state between two moves of reading the input. (a) If M, after reading an input symbol (a in the figure), does not enter an accepting state till it reads the next input symbol (b in the figure), M' reads the input a and enters an accepting state, and then simulates M reading the next input symbol b. (The transitions in between are ignored.) Notice that, M' is simulating M to recognize the complement of L(M). If the symbol a that M reads is the last one from input string x, it will not be accepted by M. So, to have this input string x accepted by M', we let it enter an accepting state right after reading the symbol a. (b) If M ever enters an accepting state in between two reading moves (i.e., non- transitions), M' simulates the two reading moves of M without entering an accepting state. (, ./..) (a, ./..)
p
(, ./..)
(, ./..)
q
(b, ./..)
556
Properties of DCFL's
Proof
Proof (3). L1 L2 and L1 L2 are not necessarily a DCFL. As shown in the following page, L1 and L2 below are DCFL’s. However, in Section 12.4 we proved that the intersection L1 L2 = {aibici | i 1} is a CSL which is not context-free. L1 = {aibick | i, k 1 }
L2 = {aibkck | i, k 1 }
If the union L = L1 L2 is a DCFL, the following language L' must be a DCFL according to property (1) because {aibjck | i, j, k 1 } is regular (see a DFA recognizing this language in the following page). L' = L {aibjck | i, j, k 1 } = {aibici | i 1 ) However, we know that {aibici | i 1 } is not context-free (see Section 12.5).
557
Properties of DCFL's
Proof (a, Z0/aZ0), (a, a/aa)
(a, Z0/Z0), (b, Z0/bZ0),(b, b/bb)
(b, a/)
(a, Z0/Z0)
(b, a/) (c, b/)
(c, Z0/Z0)
(, Z0/Z0)
(c, b/) (c, Z0/Z0)
(b) DPDA accepting L2 = {aibkck | i, k 1 }
(a) DPDA accepting L1 = {aibick | i, k 1 }
a
c
b
a b
c
(c) DFA accepting {aibjck | i, j, k 1 } 558
Properties of DCFL's
3. Making every DPDA and DFA read up to the last input symbol In this section we shall prove the following lemma which we have deferred while proving Theorem 2. Lemma 2. Every DCFL and regular language can be recognized, respectively, by a DPDA and a DFA which read up to the last input symbol. Proof. According to the convention, when we define the transition function (or the transition graph) of an automaton, we may not define it for every possible tape symbol (and the stack-top for a PDA). We assume that entering a state from which no transition defined, the automaton rejects the input immediately without reading the input further). To make a DFA read up to the last input symbol, we explicitly introduce a dead state and let the machine enter it for every undefined transition. Then we let it read off all the remaining input symbols in the dead state as the following example shows. (See also Section 9.1.) a b a
start
b
d
a, b
A N I
a, b 559
Properties of DCFL's
Making DPDA read the last input symbol
Let M = (Q, , , , q0, Z0, F) be a DPDA. Recall that for every p Q, a and A , both (p, a, A) and (p, , A) give at most one value and if (p, a, A) is defined (p, , A) is not defined, and vice versa. The problem of making M read up to the last input symbol is not that simple. The automata may hit an undefined transition as for DFA’s or end up in a cycle of -moves in the middle of the computation without consuming the input. For the case of undefined transitions, we can use the same approach as for DFA’s. Here is an example. (Notice that this DPDA accepts the language {aibi | i 1}.) ( a, A/AA )
1
(a, Z0 /AZ0)
2
( b, A/ )
( b, A/ )
start (b, Z0 /Z0)
3
(, Z0 /Z0)
4
( a, A /A ) (X, Z0 /Z0)
d (X, Y/Y )
= {a, b} = {A, Z0}
A N I
X {a, b} Y {A, Z0}
560
Properties of DCFL's
Making DPDA read the last input symbol Now, we study the problem of converting M to a DPDA M’ which consumes the input without entering a cycle of -moves. The figure below shows a part of the transition graph of M with such loop. Notice that entering the loop the machine does not necessarily stay in it. Depending on the stack contents it may exit. Given a state transition graph of M, our objective is to find a state q and the stack top symbol Z such that the machine cyclically enters q with the same stack top Z, that is, (q, , Z) |-* (q, , (Z)*Z) for , *. (, D/ED)
(, A/CA)
(a, Z0/AZ0)
1
2
(, E/ ), (, C/ )
(, B/) (, B/B) 4
(b, A/BA)
3
(, D/ )
(, A/BA)
561
Properties of DCFL's
Making DPDA read the last input symbol
Since the graph is finite and the transitions are deterministic, we can effectively identify every entering transition which remains in the cycle, detach it from the cycle and let it enter the dead state, where the remaining input is consumed. If the cycle involves an accepting state, we let the transition enter an accepting state before sending it to the dead state.
X Y
1
(X, Y/Y )
d (, B/B)
(b, A/BA)
(, D/ED)
(, A/CA)
(a, Z0/AZ0)
2
(, E/ ), (, C/ )
(, B/) 4
3
(, D/ )
(, A/BA)
562
Making DPDA read the last input symbol
Properties of DCFL's
For more detailed approach to the conversion, refer J. Hopcroft and J. Ullman, “Introduction to Automata Theory, Languages and Computation, Section 10.2,” Addison Wesley, 1979, or M. Harrison, “Introduction to Formal Language Theory, Section 5.6,” Addison Wesley, 1978.
563
Appendix E. A CFL Satisfying the Pumping Lemma for Regular Languages We showed that every infinite regular language satisfies the pumping lemma. We proved this lemma with no regard to other classes of languages. As an application of the lemma we showed that context-free language {aibi | i > 0 } does not satisfy the lemma and hence, is not regular. We may ask the following. Is every non-regular context-free language does not satisfy the lemma? Or otherwise, is there a non-regular language which satisfies the pumping lemma? Here we show a non-regular context-free language that satisfies the pumping lemma, giving positive answer for the later question. (This example is given by one of the author’s colleagues Professor Robert McNaughton at RPI.) L1 = {x | x {a, b}*, x has aa or bb as a substring.} L2 = {x | x {a, b}*, |x| = p, p is a prime number } L1 = {a, b}* - L1
// complement of L1
L = ( L 1 L 2 ) L1
// L is a non-regular language satisfying the lemma. 564
Proof
CFL Satisfying the Pumping Lemma
We will prove why L satisfies the lemma. L1 is regular. We can prove this by the fact that L1 can be expressed by a regular expression or the complement of L1 can be recognized by an NFA shown below, because regular languages are closed under complementation. a,b a,b a a (ab)* + b(ab)* + (ab)*a + (ba)* b b a,b L1 = {x | x {a, b}*, x has no aa or bb as a substring.} L2 = {x | x {a, b}*, |x| = p, p is a prime number } L1 = {a, b}* - L1
// complement of L1
L = ( L1 L2 ) L1
// L is a non-regular language satisfying the lemma.
565
Proof
CFL Satisfying the Pumping Lemma
We will first show that L is not regular. Suppose L is regular. According to the properties of regular languages, since L1 is regular, L - L1 = L1 L2 must also be regular. Since this language is infinite, it should satisfy the pumping lemma. Let n be the constant of the pumping lemma, and choose a string z in L1 L2 whose length is greater than n. Let z = uvw such that |uv| n and |v| ≥ 1. Now, we pump z and make z’ = uvpj+1w, where j = |v| and p = |z| is a prime number according to the condition of L2 . Clearly, the length of z’ is |z’| = p + |v|pj = p(1 + j2 ), which is not a prime number. It follows that z’ L1 L2 . The language L1 L2 is not regular, implying that L is not regular. L1 = {x | x {a, b}*, x has no aa or bb as a substring.} L2 = {x | x {a, b}*, |x| = p, p is a prime number } L1 = {a, b}* - L1
// complement of L1
L = ( L1 L2 ) L1
// L is a non-regular language satisfying the lemma.
566
Proof
CFL Satisfying the Pumping Lemma
L1 = {x | x {a, b}*, x has no aa or bb as a substring.}
L2 = {x | x {a, b}*, |x| = p, p is a prime number } L1 = {a, b}* - L1
// complement of L1
L = ( L1 L2 ) L1
// L is a non-regular language satisfying the lemma.
Now, we will prove that language L satisfies the pumping lemma. For the proof, we must show that there exists a constant n such that for every string z L whose length is greater than or equal to n, there exists strings u, v and w that satisfy the following conditions. (i) z = uvw
(ii) |uv| n (iii) |v| ≥ 1
(iv) For all i ≥ 0, uviw L Let z = c1c2c3x be a string in L, where ci {a, b}, x {a, b}*, |x| ≥ n - 3.
567
Proof
CFL Satisfying the Pumping Lemma
L1 = {x | x {a, b}*, x has no aa or bb as a substring.}
L2 = {x | x {a, b}*, |x| = p, p is a prime number } L1 = {a, b}* - L1
// complement of L1
L = ( L1 L2 ) L1
// L is a non-regular language satisfying the lemma.
Let n = 3, and consider the following 3 cases. Case 1: c1 c2 c3 (i.e., z = abax, or z = babx). Let u = c1, v = c2 and w = c3x. Then, clearly, for every i 0, z' = uviw L. ( Specifically, if i = 1, then z' L. Otherwise, z' L1.) Case 2: c1 = c2 (i.e., z L1). Let u = c1c2, v = c3 and w = x. Then for all i 0, we have z' = uviw L (specifically, z' L1). Case 3: c2 = c3 (z L1 ). Let u = , v = c1 and w = x. Then again, for all i 0, we have z' = uviw L. It follows that language L satisfies the pumping lemma. 568
Appendix F. CYK Algorithm for the Membership Test for CFL The membership test problem for context-free languages is, for a given arbitrary CFG G, to decide whether a string w is in the language L(G) or not. If it is, the problem commonly requires a sequence of rules applied to derive w. A brute force technique is to generate all possible parse trees yielding a string of length |w|, and check if there is any tree yielding w. This approach takes too much time to be practical. Here we will present the well-known CYK algorithm (for Cocke, Younger and Kasami, who first developed it). This algorithm, which takes O(n3) time, is based on the dynamic programming technique. The algorithm assumes that the given CFG is in the Chomsky normal form (CNF). Let w = a1a2 . . . . an, wij = aiai+1 . . . aj and wii = ai . Let Vij be the set of nonterminal symbols that can derive the string wij , i.e., Vij = { A | A * wij , A is a nonterminal symbol of G}
569
CYK Algorithm wij = ai
.....
aj Construct an upper triangular matrix whose entries are Vij as shown below. In the matrix, j corresponds to the position of input symbol, and i corresponds to the diagonal number.
Vij j w =
a1
a2
a3
a4
a5
a6
V11
V22
V33
V44
V55
V66
V12
V23
V34
V45
V56
V13
V24
V35
V46
V14
V25
V36
V15
V26
i
Clearly, by definition if S V16 , then string w L(G).
V16 570
CYK Algorithm The entries Vij can be computed with the entries in the i-th diagonal and those in the j-th column, going along the direction indicated by the two arrows in the following figure. If A Vii (which implies A can derive ai ), B V(i+1)j (implying B can derive ai+1. . . aj ) and C AB, then put C in the set Vij . If D Vi(i+1) (which implies D can derive aiai+1), E V(i+2)j (implying E can derive ai+2. . . aj ) and F DE, then put F in the set Vij , and so on. wij =
ai
ai+1
ai+2
. . . . .
Vii
aj Vjj
Vi(i+1)
A N I
V(i+2)j Vi(j-1)
V(i+1)j Vij
571
CYK Algorithm
For example, the set V25 is computed as follows. w =
a1
a2
a3
a4
a5
a6
V11
V22
V33
V44
V55
V66
V12
V23
V34
V45
V56
V13
V24
V35
V46
V14
V25
V36
V15
V26
Let A, B and C be nonterminals of G. V25 = { A | B V22 , C V35 , and A BC } { B | C V23 , A V45 , and B CA }
A N I
V16
{ C | B V24 , A V55 , and C BA } ..... (Recall that G is in CNF.) 572
CYK Algorithm
In general, Vij =
wij =
ai
i k j-1
{ A | B Vik , C V(k+1) j and A BC }
ai+1
. . . . .
aj
Vii
Vjj Vi(i+1)
V(i+2)j Vi(j-1)
V(i+1)j Vij
573
CYK Algorithm Example: w =
a
a
a
a
b
b
{A, D}
{A,D}
{A,D}
{A,D}
{B}
{B}
{D}
{D}
{D}
{S,C}
CFG G
D AD
D AD
D AD
S aSb | aDb {D}
D aD | a
{D} {D}
CNF CFG
{S,C} S AC C DB
{S,C} {S,C}
S AB | AC A a
C DB
S AB C DB
B SB
D AD | a
Bb
{} {B}
B SB
{S,B,C} SAB,CDB B SB
{S,B,C} SAC,CDB B SB
{S,B,C}
SAB,S AC CDB, BSB
Since S V16 , we have w L(G). 574
CYK Algorithm
Here is a pseudo code for the algorithm.
a1
a2
a3
a4
a5
a6
V11
V22
V33
V44
V55
V66
//initially all sets Vij are empty V12 V23 V34 // Input x = a1a2 . . . . an. V13 V24 for ( i = 1; i <= n; i ++ ) V14 Vii = { A | A ai }; for ( j = 2; j <= n; j++ ) for ( i = j-1; i =1; i-- ) for ( k = i; k <= j-1; k++) vij = vij { A | B Vik , C V(k+1) j and A BC }; if ( S Vin ) output “yes”; else output “no”;
V45
V56
V35
V46
V25
V36
V15
V26
w =
V16
The number of sets Vij is O(n2), and it takes O(n) steps to compute each vij. Thus the time complexity of the algorithm is O(n3). 575
References for Further Reading A. V. Aho and J. D. Ullman, The Theory of Parsing, Translations, and compiling, Vol. 1: Parsing, Englewood Cliffs, N. J.: Prentice-Hall, 1972. P. J. Denning, J. B. Dennis, and J. E. Qualitz, Machine, Languages, and Computation, Englewood Cliffs, N. J.: Prentice Hall, 1978 H. Ehrig, M. Nagl, G. Rozenberg, and A. Rosenbeld (Edition), Graph Grammars and Their Application to Computer Science, Lecture Notes in Computer Science #291, Springe-Verlag, 1986. M. A. Harrison, Introduction to Formal Language Theory, Reading, Mass.: Addison-Wesley, 1978. J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, Second Ed., Addison-Wesley, 2001. P. Linz, An Introduction to Formal Languages and Automata, Third Ed., Jones and Bartlett Publishers, 2001. D. W. Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Lab. Press, 2001. F. P. Preparata and R. T. Yeh, Introduction to Discrete Structures for Computer Science and Engineering, Reading Mass.: Addison-Wesley, 1973. S. Wolfram, Theory and Application of Cellular Automata, World Scientific, 1986.
576
Index 2-D tape TM , 165 2-dimensional(2D)-tape FA , 177 2-head FA, 178 2-sided L-system , 62 2-way FA , 176, 515 2-way PDA , 169 Accept, 81 Accepting by empty stack , 172 Accepting configuration , 370, 383 Accepting state , 94 Algebraic properties, 73 Ambiguous CFG , 294 Amino acid , 428 Associative law, 73 Associativity, 301 Axiom, 63 Base, 15 Bottom of the stack symbol, 101, 104 Bottom-up left-to-right traversal, 356 Bottom-up parsing, 379 CA, 179 Cell, 81
Cellular automata, 179 Characterization, 189, 191, 471 Chomsky hierarchy, 185, 187, 190, 309, 310, 435 Chomsky normal from (CNF), 277, 278 Church's hypothesis, 181 Closure, 5 Closure property, 248 Closure property of DC리, 543 Coding region, 430 Codon , 428, 430 Commutative law, 73 Complement , 240 Complementary strand, 426 Composite symbol, 3, 480 Computational capability, 160 Computing regular expression , 532 Concatenate, 2 Conditional connective, 22 Configuration, 360 Constructive proof, 17 Containment, 185, 188, 190, 309, 310 Context-free grammar, 51
577
Index Context-sensitive grammar, 51 Countable, 438 Crossing information (CI), 518 Crossing state pair, 518 CYK algorithm, 352, 533, 569 Declarative statement, 6 DeMorgan's law, 4 Deterministic finite automata (DFA), 111 Deterministic linear bounded automata (DLBA), 98 Deterministic L-system, 62 Deterministic pushdown automata (DPDA), 100 Deterministic Turing machine (DTM), 81 Diagonalization, 21, 453, 465 Distributive law, 73 Document type definition (DTD), 414 Encoding function, 444 Enumerable set, 437 Enumerable languages, 450 Enumeration, 437 Enumerator, 437 epsilon-move, 102 epsilon-transition, 146
Equivalent state, 224 Exclusive OR, 4 Existential quantification, 13, 346 Extensible markup language (XML), 414, 418 FA array, 179 Finite state control, 81 Formal language, 30, 31, 48 Front nonterminal, 280 Front recursion, 280 Gene code, 425 Gene grammar, 425 Genome, 425 Grammar, 30, 33 Greibach normal form (GNF), 277, 280 Hidden Markov model (HMM), 180 Homomorphism, 444 Hyper text markup language (HTML), 414, 415 Hypothesis, 15 Implication, 9 Induction, 15 Induction hypothesis, 15 Inherently ambiguous, 305
578
Index Input alphabet, 94 Intersection, 240 Intron, 427, 430 Kleene closure, 5 Kleene star, 240 Language class, 309 Left associativity, 304 Left linear, 57 Leftmost derivation, 354 Lex, 352, 404 Lexical ambiguity, 288 Lexical analyzer, 404 Linear, 57 LL(k) grammar, 365, 378 Look ahead, 359, 368 Look ahead LR parser (LALR), 406 Look-ahead, 368, 373 LR(k) grammar, 386, 403 LR(k) parser, 379 L-system, 62 Lyndenmayer, 62 Membership test for CFL, 569
Minimize, 224, 226 Moore automata, 161 Morphemes, 49 Move, 81 Multi-head FA , 178 Multi-stack PDA, 170 Multi-tape TM, 164 Multi-track TM, 162 NFA, 126 Non-constructive proof, 17 Nondeterministic algorithm, 143 Nondeterministic automata, 124. 154 Nondeterministic choose, 141 Nondeterministic computing, 140 Nondeterministic finite automata (NFA), 126 Nondeterministic linear bounded automata (NLBA), 139 Nondeterministic pushdown automata (NPDA), 132 Nondeterministic Turing machine (NTM), 139 Non-front terminal, 280 Nonterminal alphabet, 49 Normal from, 277 Normal form of PDA, 544
579
Index Null string, 2 Ogden's lemma, 336 Parallel rewriting, 56, 62 Parse table, 366, 385 Parse tree, 290, 354 Parsing, 352 Pascal syntax flow graph, 508 Path label, 317 Phrase structured grammar, 49 Pigeonhole principle, 18 Pop, 100 Power set, 5 Precedence, 293, 301 Product, 240 Production rule, 54 Proof by cases, 11 Proof by contradiction, 12 Proof by contrapositive, 11 Proof by counting, 20 Proof by example, 13 Proof by generalization, 14 Proof by induction, 15
Proof by pigeonhole principle, 18 Proof technique, 10 Proper containment, 309, 312 Property list, 3 Proposition, 6 Pumping lemma, 316, 322, 564 Pumping lemma for CFL, 328 Purine, 431 Push, 100 Pushdown stack, 100 Pyrimidine, 431 Read only, 100 Recursive language, 460 Recursive TM, 459, 460 Recursively enumerable (R.E.), 50, 446 Reduced DFA, 229 Reduction table, 385 Regular expression, 70 Regular grammar, 52 Reversal, 240 Rewriting rule, 33, 54 Ribosome, 427
580
Index
Right associativity, 304 Right linear, 57 Rightmost derivation, 354 RNA transcription, 426 Rule, 31, 33 Semantic ambiguity, 288 Sentential form, 50 Set complement, 4 Set intersection, 4 Set operation, 4 Set product, 4 Set properties, 3 Set specification, 3 Set subtraction, 4, 240 Set union, 4 Shift-in, 394, 400 Simulate, 448 Splicing, 427 Spontaneous transition, 146 Stack alphabet, 104 Stack head, 101 Stack-top, 100
Stack-top erasing set, 489 Start codon, 428, 430 Start state, 94 Start symbol, 49 State, 81 State minimization, 224 State partitioning technique, 226 State transition function, 82, 92, 94 State transition graph, 82, 90 State transition profile, 126 State transition table, 93 Stop codon, 428, 430 String, 2 Sum-of-subset problem, 142 Symbol, 2 Syntactic ambiguity, 288 Syntactic category, 49 Syntax, 290 Syntax diagram, 67 Syntax flow graph, 67 Tape alphabet, 94 Tape head, 81
581
Index Terminal alphabet, 49 Token definition, 404 Token description, 404 Top-down left-to-right traversal, 355 Total alphabet, 49 Transcribe, 426 Transcript, 429 Transducer, 161 Transition path, 317 Translation, 427 Truth table, 6 Turing machine, 80 Type 0 grammar, 49 Type 1 grammar, 51 Type 2 grammar, 51 Type 3 grammar, 52 Union, 240 Unit production rule, 262, 264 Universal quantification, 13, 346 Universal set, 4 Universal Turing machine, 448 Useless symbol, 266
User-written code, 405, 408 Variable, 40 YACC, 352, 406 Yield, 291
582