Models of Language Generation: Grammars
1
11. Ambiguity of Context-free Grammars Ambiguity in a language occurs either when a symbol or an expression has more than one meaning (e.g., story), or when an expression can be (grammatically) parsed in two different ways. The former is called lexical (or semantic) ambiguity, and the later syntactic (or structural) ambiguity. For example, in natural language, the sentence “A man entered the room with a picture” can be interpreted (i.e., parsed) into two different grammatical structures as follows.
man A
entered
room the
with
picture a
man entered room A with picture the a This sentence is syntactically ambiguous. With no further information, it is impossible to know which way the sentence should be translated. In formal language, given a grammar G and a sentence x (i.e., a string, in the formal language jargon), parsing shows how x can be derived by the grammar. If x can be derived in two different ways, grammar G is ambiguous. Parsing is one of the main functions of the compiler of a programming language. In this chapter we will study syntactic ambiguity.
2
Ambiguity 11.1 Parse tree 290 11.2 Parse Tree and Ambiguity 293 11.3 Eliminating Ambiguity of an ambiguous CFG 295 Using parenthesis, Fixing the order of rule applications Eliminating redundant rules Setting up precedence and associativity Rumination 305 Exercises 306
Dear Dad & Dear Son
Break Time
Dear Dad, $chool i$ really great. I am making lot$ of friend$ and $tudying very hard. With all my $tuff, I $imply can`t think of anything I need, $o if you would like, you can ju$t $end me a card, a$ I would love to hear from you. Love, Your $on The Reply: Dear Son, I kNOw that astroNOmy, ecoNOmics, and oceaNOgraphy are eNOugh to keep even an hoNOr student busy. Do NOt forget that the pursuit of kNOwledge is a NOble task, and you can never study eNOugh. Love, Dad - Adrea -
3
Ambiguity
11.1 Parse Tree In the formal language, the syntax (i.e., structure) of a string generated by a grammar depends on the rules applied as well as their order of application. Syntax provides critical information for the compiler to translate the string (i.e., a program) into an object code. Thus, if a string can be derived in two different ways, it is impossible to give a unique translation. A parse tree is an efficient data structure to represent the syntax of a string derived by the grammar and to translate it by the compiler. For example, figure (b) below shows a parse tree for string p∨q∧r generated by grammar G in figure (a). S
S ∨
A
G: S → S∨S | S∧S | ∼ S | A A →p | q | r (a)
S S ∧ S
p
A
A
q
r
(b) 4
Ambiguity
Parse Tree
Given a derivation (i.e., sequence of rules applied) for a string w, the parse tree for w with respect to the derivation is constructed as follows. First put the root node with the start symbol S. Then, for each leaf node on the current tree with a nonterminal label, say A, recursively expand the tree as follows: Suppose that A → β is a rule applied next to derive w. For each symbol X appearing in β , in the order from left to right, a child node with label X is add to A. This procedure repeats until the tree has no leaf nodes (labeled with a nonterminal symbol) to expand. Reading all the leaf nodes left to right on the final tree should give the string w. This string of terminal symbols is called the yield of the parse tree. S
S ∨
A
G: S → S∨S | S∧S | ∼ S | A A →p | q | r (a)
S S ∧ S
p
A
A
q
r
(b) 5
Ambiguity
parse tree
In general, given a source code in programming environments, the compiler constructs a parse tree of the code and then traversing the tree bottom up, left to right, generates an object code (machine language program). For example, suppose that for the string p∨q∧r the compiler has generated the parse tree in figure (b) below. The compiler generates machine language instructions that will access the values of variables q and r, compute q∧r and store the result (usually in a register), access the value of p, and finally execute the OR operation with the stored result of q∧r. Because of the way of traversing the tree, bottom up, left to right, to generate the object code, the order of logical operations depends on the tree. For the example, the compiler generates an object code that will execute the logical expression p∨q∧r in the order of p∨( q∧r ). S S G: S → S∨S | S∧S | ∼ S | A
A
S ∧ S
p
A →p | q | r (a)
S
∨
(b)
A
A
q
r 6
Ambiguity
11.2 Parse Tree and Ambiguity The two parse trees in figures (b) and (c) below show two parse trees yielding the same string p∨q∧r. In other words, it can be derived by grammar G in figure (a) in two ways. Consequently, the two parse trees imply that the expression p∨q∧r can be evaluated in two different ways, i.e., p∨(q∧r) and (p∨)q∧r. This implies that for grammar G, the operator precedence between the two logical operations ∨ (OR) and ∧ (AND) is ambiguous. S S A G: S → S∨S | S∧S | ∼ S | A A →p | q | r (a)
p
∨
S S
S
S ∧S
S ∨ S
A
A
A
A
q
r
p
q
(b) p∨(q∧r)
∧
S A r
(c) (p∨q)∧r
7
Parse Tree and Ambiguity
Ambiguity
As we saw in the previous example, the existence of two parse trees yielding the same string is a problematic property, called ambiguity, of a CFG that should be eliminated. In real application, we cannot expect the correct result from a program written in the language of an ambiguous grammar. Before we discuss how to eliminate ambiguity from a CFG, we need a formal definition of it. Definition (Ambiguity): A CFG G is ambiguous if there is a string x ∈L(G) for which there are two parse trees yielding x. Unfortunately, it is an unsolvable problem to decide whether an arbitrary CFG is ambiguous or not. Also, there is no algorithm available that given an ambiguous CFG, converts it to an unambiguous grammar. However, for a certain restricted construct, it is possible to solve the problems. In this section we will present several techniques with some examples.
8
Ambiguity
11.3 Eliminating Ambiguity of a CFG (1) Binding with parenthesis. Example: We know that the CFG G1 below is ambiguous because there are two parse trees yielding the same string p∨q∧r. The ambiguity occurs because it can generate the same string by applying S → S∨S followed by S → S∧S, or vice versa, as shown, respectively, in figure (a) and figure (b). S S G1: S → S∨S | S∧S | ∼ S | A A →p | q | r
A p
∨
S S
S
S ∧S
S ∨ S
A
A
A
A
q
r
p
q
(a)
∧
S A r
(b ) 9
Eliminating Ambiguity
Ambiguity
This ambiguity can be eliminated by parenthesizing the right side of those two rules as shown in G2 below. The parentheses make the yields of the two parse trees different as shown in figures (a) and (b). Ambiguous
S → S∨S | S∧S | ∼ S | A
G1:
A →p | q | r
Unambiguous G2: S → (S∨S) | (S∧S) | ∼ S | A S
(
S A p
∨
)
S
( S ∧S A
A
q
r
(a): (p ∨(q ∧r))
)
A →p | q | r S
(
S
)
S ( S ) ∧ A S ∨ r A A p
q
(b): ((p ∨q ) ∧r) 10
Ambiguity
Eliminating Ambiguity
The parenthesizing technique is simple, but has a serious drawback, because we are altering the language by adding new terminal symbols, i.e., the parentheses. However, this is a popular technique used in programming languages. Instead of the parentheses they use other notations, for example, in Pascal “begin” and “end,” and in C and C++, the braces ‘{’ and ‘}’. (2) Fixing the order of applying rules. Example 1. The language generated by CFG G3 below is {bicbj | i, j ≥ 0}. This grammar is ambiguous because, for example, the string bcb can be derived by generating the left side b first then the right side b, or vice versa, as shown below.
b G3 : S → bS | Sb | c
S
S
S
S
S c
b
b
b
S c 11
Ambiguity
Eliminating Ambiguity
We can simply modify the grammar G3 to G4 as shown below such that left side b’s, if any, are always generated first. Figure (b) is the only parse tree for string bcb. Grammar G4 is unambiguous. Ambiguous
G3 : S → bS | Sb | c
Unambiguous G4 : S → bS | A
b
A → Ab | c
S
S
S
S
S
b
b
c (a)
b b
S S A
S
A
c
c
b
(b) 12
Ambiguity
Eliminating Ambiguity
Example 2. Using the technique of fixing the order of derivation, the ambiguous CFG G1 that we have examined can be converted to an unambiguous grammar G5 shown in figure (a). Notice that this grammar generates the operators left to right in the order they appear in the string. S
Ambiguous: G1 :
S → S∨S | S∧S | ∼ S | A
A
A →p | q | r Unambiguous: G5 : S → A∨S | A∧S | ∼ S | A A →p | q | r (a)
S
∼
∨ S A ∧ S
p q
A ∨ S q
(b): ∼ p∨q∧q∨p
A p
13
Ambiguity
Eliminating Ambiguity (3) Eliminating redundant rules
The CFG G6 below is ambiguous because it can generate ab either by B or D. We can simply delete one of the two and make the grammar unambiguous (see G7). G6 : S → BD
B → abb
D → abd
Unambiguous G7 : S → BD
B → abb
D →d
Ambiguous
The CFG G8 below is ambiguous because it can generate ε in two ways. Applying the technique for minimizing the number of ε -production rules, we can convert it to an unambiguous grammar G9. Ambiguous G8 : S → BD B → bBcε Unambiguous G9 : S → BDε B → bBcbc D → dDede
D → dDeε
14
Ambiguity
Eliminating Ambiguity (4) Implementing operator precedence and associativity
Operator precedence and associativity are important rules for evaluating mathematical expressions. In programming languages, the rules are defined by the grammar. As we know, multiplication (*) and division (/) are given higher precedence than addition (+) and subtraction (-). The assignment operator (=) is given the lowest. Operator = is right associative, and all the others are left associative. According to this order of precedence and associativity, the mathematical expression in figure (a) will be evaluated as shown in figure (b).
a=b+c*d–e/f (a)
a = ((b + (c * d)) – (e / f)) (5) (3) (1) (4) (2) (b)
15
Ambiguity
Eliminating Ambiguity
Example. Assume that the logical operators ∼ , ∧, and ∨are right associative with precedence given in that order (i.e., ∼ is at the top followed by ∧, and ∨at the bottom). The ambiguous CFG G1 (repeated below) can be modified into an unambiguous G10 by implementing the precedence and associativity into the production rules. Notice that every OR (∨) operator in the string should be generated by S → D∨S before generating others. Then AND (∧) operators, if any, are generated by D → C∧D, and finally NOT (∼ ) operators. Also notice that with D, there is no way to derive ∨, and with C neither ∨nor ∧can be derived. Ambiguous G1: S → S∨SS∧S∼ SA A → pqr Unambiguous G10 : S → D∨SD
D → C∧DC
C → ∼ CA
A → pqr
(a)
S D ∨ S ∨ S C D D ~ C C C ∧ D C A A A r A p q p (b)
16
Ambiguity
Eliminating Ambiguity
This fixed order of rule applications facilitates the compiler generating an object code to evaluate the string (i.e., a logical expression) according to the operator precedence and associativity. For example, because NOT (∼ ) operators are generated last, they appear on a subtree of the parent node of a leaf node labeled with ∧or ∨. Hence, the compiler, traversing the tree bottom up, left to right, generates the instructions which will execute the NOT (∼ ) operators before the instructions executing other operators. Similarly, we can see how the compiler generates the instructions to execute AND (∧) operators before OR (∨) operators. S D ∨ S ∨ S G10 : S → D∨SD D → C∧DC C D D C → ∼ CA A → pqr ~ C C C ∧ D C A A A ∼ p ∨q ∨r ∧p ⇒ ((∼ p) ∨(q ∨(r ∧p))) (1) (4) (3) (2)
(a)
p
r
q (b)
A p 17
Eliminating Ambiguity
Ambiguity
Now, for the associativity, notice that identical operators are generated left to right in the order they appear in the string. For example, all OR (∨) operators are derived by applying S → D∨S recursively. Thus, the compiler, traversing the parse tree bottom up, left to right, will generate an object code to evaluate the logic expression following the order of right associativity. If we want to implement left associativity for an operator used in the language, we can simply reverse the right side of the rule which derives it. For example, we use S → S∨D instead to make operator ∨left associative. S D ∨ S ∨ S C D D G10 : S → D∨SD D → C∧DC ~ C C C ∧ D C → ∼ CA A → pqr C A A A ∼ p ∨q ∨r ∧p ⇒ ((∼ p) ∨(q ∨(r ∧p))) (1) (4) (3) (2) r A p q
(a)
(b)
p 18
Ambiguity Rumination (1): ambiguity We learned that for a simple CFG, it is possible to analyze it to decide whether it is ambiguous or not, and if it is ambiguous, convert it to an unambiguous one. However, as we mentioned before, most of the problems concerning ambiguity of CFG’s in general are very difficult or even unsolvable. Here are some interesting facts. • There are some languages that can only be generated by ambiguous context-free grammars. Such languages are call inherently ambiguous. The following language L is an example. L = {anbncm | m, n ≥ 1 } ∪ {anbmcm | m, n ≥ 1} • It is an unsolvable problem to tell whether an arbitrary CFG is ambiguous or not. • It is an unsolvable problem to convert an arbitrary ambiguous CFG, whose language is not inherently ambiguous, to an unambiguous CFG.
Today’s Quote
Break Time
Be slow in choosing a friend, slower in changing. - Benjamin Franklin -
19
Exercises
Ambiguity
11.1 Show that each of the following CFG’s is ambiguous, and convert it to an unambiguous CFG. (a) S → Sa | A | a
A → a | bA
(b) S → AB | ε
A→ a|ε
B→ b|ε
11.2 (a) Show that the following CFG is ambiguous. G: S → S + S | S * S | T
T→ a|b
(b) In the CFG G above, let + and * be the addition and multiplication operators, respectively, and a and b be integer variables. Convert G to a CFG G’ that satisfies the following conditions. (i) L(G) = L(G’) (ii) Operator * has higher precedence than operator +. (iii) Both operators are left associative. You should also present a parse tree showing that your grammar meets the required order of operator precedence and associativity.
20
Ambiguity
Exercises
The following problem is concerned with the ambiguity of the if-statement, which appears often in the text. Question (a) is easy, but question (b) is challenging. 11.3 The syntax flow graph below defines a simplified of the Pascal programming language. Following the convention, symbols in a circle or oval are terminals, and the words in rectangles correspond to nonterminal symbols. (a) Transform the flow graph to a CFG G and show that G is ambiguous. (b) Convert grammar G to an unambiguous CFG and explain in detail how you got your answer.
IF
<statement>
(
a-b a+b
)
then
<statement>
else
<statement>
c d
21