Simplifications of Context-Free Grammars
1
A Substitution Rule
S → aB A → aaA A → abBc B → aA B→b
Equivalent grammar
Substitute
B→b
S → aB | ab A → aaA A → abBc | abbc B → aA 2
A Substitution Rule S → aB | ab A → aaA A → abBc | abbc B → aA
Substitute
B → aA
S → aB | ab | aaA A → aaA A → abBc | abbc | abaAc
Equivalent grammar 3
In general:
A → xBz B → y1 Substitute
B → y1
A → xBz | xy1z
equivalent grammar 4
Nullable Variables
λ − production :
A→λ
Nullable Variable:
A⇒⇒ λ
5
Removing Nullable Variables Example Grammar:
S → aMb M → aMb M →λ Nullable variable 6
Final Grammar
S → aMb M → aMb M →λ
Substitute
M →λ
S → aMb S → ab M → aMb M → ab
7
Unit-Productions
Unit Production:
A→ B
(a single variable in both sides)
8
Removing Unit Productions Observation:
A→ A Is removed immediately
9
Example Grammar:
S → aA A→a A→ B B→A B → bb 10
S → aA A→a A→ B B→A B → bb
Substitute
A→ B
S → aA | aB A→a B → A| B B → bb
11
S → aA | aB A→a B → A| B B → bb
Remove
B→B
S → aA | aB A→a B→A B → bb
12
S → aA | aB A→a B→A B → bb
Substitute
B→A
S → aA | aB | aA A→a B → bb
13
Remove repeated productions
Final grammar
S → aA | aB | aA A→a B → bb
S → aA | aB A→a B → bb
14
Useless Productions
S → aSb S →λ S→A A → aA
Useless Production
Some derivations never terminate...
S ⇒ A ⇒ aA ⇒ aaA ⇒ ⇒ aa aA ⇒ 15
Another grammar:
S→A A → aA A→λ B → bA
Useless Production
Not reachable from S
16
contains only terminals
In general: if
S ⇒ ⇒ xAy ⇒ ⇒ w
w∈ L(G ) then variable
A
is useful
otherwise, variable
A
is useless
17
A production A → x is useless if any of its variables is useless
Variables useless useless useless
S → aSb S →λ S→A A → aA B→C C→D
Productions useless useless useless useless 18
Removing Useless Productions Example Grammar:
S → aS | A | C A→a B → aa C → aCb
19
First:
find all variables that can produce strings with only terminals
S → aS | A | C A→a B → aa C → aCb
Round 1:
{ A, B} S→A
Round 2:
{ A, B, S } 20
Keep only the variables that produce terminal symbols:
{ A, B, S }
(the rest variables are useless)
S → aS | A | C A→a B → aa C → aCb
S → aS | A A→a B → aa
Remove useless productions 21
Second: Find all variables reachable from
S
Use a Dependency Graph
S → aS | A A→a B → aa
S
A
B not reachable 22
Keep only the variables reachable from S (the rest variables are useless)
S → aS | A A→a B → aa
Final Grammar
S → aS | A A→a
Remove useless productions 23
Removing All
Step 1: Remove Nullable Variables Step 2: Remove Unit-Productions Step 3: Remove Useless Variables
24
Normal Forms for Context-free Grammars
25
Chomsky Normal Form Each productions has form:
A → BC variable
or variable
A→a terminal
26
Examples:
S → AS S →a A → SA A→b Chomsky Normal Form
S → AS S → AAS A → SA A → aa Not Chomsky Normal Form 27
Convertion to Chomsky Normal Form Example:
S → ABa A → aab B → Ac Not Chomsky Normal Form
28
Introduce variables for terminals:
Ta , Tb , Tc
S → ABTa S → ABa A → aab B → Ac
A → TaTaTb B → ATc Ta → a Tb → b Tc → c 29
Introduce intermediate variable:
S → ABTa A → TaTaTb B → ATc Ta → a Tb → b Tc → c
V1
S → AV1 V1 → BTa A → TaTaTb B → ATc Ta → a Tb → b Tc → c 30
Introduce intermediate variable:
S → AV1 V1 → BTa A → TaTaTb B → ATc Ta → a Tb → b Tc → c
V2
S → AV1 V1 → BTa A → TaV2 V2 → TaTb B → ATc Ta → a Tb → b Tc → c
31
Final grammar in Chomsky Normal Form:
S → AV1 V1 → BTa Initial grammar
S → ABa A → aab B → Ac
A → TaV2 V2 → TaTb B → ATc Ta → a Tb → b Tc → c
32
In general: From any context-free grammar (which doesn’t produce λ ) not in Chomsky Normal Form we can obtain: An equivalent grammar in Chomsky Normal Form 33
The Procedure First remove: Nullable variables Unit productions
34
Then, for every symbol
a: Ta → a
Add production
In productions: replace
New variable:
a
with
Ta
Ta 35
Replace any production with
A → C1C2 Cn
A → C1V1 V1 → C2V2 Vn− 2 → Cn −1Cn
New intermediate variables:
V1, V2 , ,Vn−2
36
Theorem:
For any context-free grammar (which doesn’t produce λ ) there is an equivalent grammar in Chomsky Normal Form
37
Observations • Chomsky normal forms are good for parsing and proving theorems
• It is very easy to find the Chomsky normal form for any context-free grammar 38
Greinbach Normal Form
All productions have form:
A → a V1V2 Vk symbol
k ≥0
variables
39
Observations • Greinbach normal forms are very good for parsing
• It is hard to find the Greinbach normal form of any context-free grammar 40
Compilers
41
Machine Code
Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } ......
Compiler
Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v ELSE: WHILE: cmp x,3 ... 42
Compiler Lexical analyzer
input program
parser
output machine code 43
A parser knows the grammar of the programming language
44
Parser
PROGRAM → STMT_LIST STMT_LIST → STMT; STMT_LIST | STMT; STMT → EXPR | IF_STMT | WHILE_STMT | { STMT_LIST } EXPR → EXPR + EXPR | EXPR - EXPR | ID IF_STMT → if (EXPR) then STMT | if (EXPR) then STMT else STMT WHILE_STMT→ while (EXPR) do STMT 45
The parser finds the derivation of a particular input derivation input 10 + 2 * 5
Parser E -> E + E |E*E | INT
E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5 46
derivation tree derivation E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5
E E 10
+ E 2
E *
E 5 47
derivation tree E E 10
machine code
+ E 2
E *
mult a, 2, 5 add b, 10, a E 5 48
Parsing
49
input string
Parser grammar
derivation
50
Example:
Parser
input
aabb
S → SS S → aSb S → bSa S →λ
derivation ?
51
Exhaustive Search
S → SS | aSb | bSa | λ Phase 1:
S ⇒ SS S ⇒ aSb S ⇒ bSa S ⇒λ
Find derivation of
aabb
All possible derivations of length 1 52
S ⇒ SS S ⇒ aSb S ⇒ bSa S ⇒λ
aabb
53
Phase 2
Phase 1
S ⇒ SS S ⇒ aSb
S → SS | aSb | bSa | λ
S ⇒ SS ⇒ SSS S ⇒ SS ⇒ aSbS S ⇒ SS ⇒ bSaS S ⇒ SS ⇒ S S ⇒ aSb ⇒ aSSb S ⇒ aSb ⇒ aaSbb S ⇒ aSb ⇒ abSab S ⇒ aSb ⇒ ab
aabb
54
S → SS | aSb | bSa | λ
Phase 2
S ⇒ SS ⇒ SSS S ⇒ SS ⇒ aSbS S ⇒ SS ⇒ S
aabb
S ⇒ aSb ⇒ aSSb S ⇒ aSb ⇒ aaSbb Phase 3
S ⇒ aSb ⇒ aaSbb ⇒ aabb 55
Final result of exhaustive search (top-down parsing) Parser
input
aabb
S → SS S → aSb S → bSa S →λ derivation
S ⇒ aSb ⇒ aaSbb ⇒ aabb 56
Time complexity of exhaustive search Suppose there are no productions of the form
A→λ A→ B Number of phases for string
w
: approx. |w|
57
For grammar with
k rules
Time for phase 1:
k k possible derivations
58
Time for phase 2:
k
2
k
2
possible derivations
59
Time for phase |w| is 2|w| :
A total of 2|w| possible derivations
60
Total time needed for string
k + k ++ k 2
phase 1
phase 2
w: | w|
phase |w|
Extremely bad!!! 61
For general context-free grammars: There exists a parsing algorithm that parses a string | w | 3 in time | w |
The CYK parser
62