lecture 6 - 480/580 (compiler construction) top down parsing (continued) last lecture we talked about recursive descent parsing, which i said was the method of choice if you don't have an automated parsing tool (such as yacc) to do the dirty work for you. recall that in order for rd to work, our grammar had to be ll(1), which meant that any time we had a choice of several alternatives, we must be able to decide upon one alternative based only on the next input token. that is, if we have a choice, such as a -> alpha | beta | gamma | delta we must be able to tell which choice to take based on only one token. let us today examine some implications of this. if the correct alternative to take at this point is an alpha, then either (1) the next token must be something that can begin an alpha, or (2) it must be legal for alpha to match nothing, and the next token must be something that can follow an a. possibility (1) leds us into defining a relation, which we will call first. first(alpha) will be the set of symbols that can begin a string derived starting from alpha (remember alpha may contain tokens and/or nonterminals). possibility (2) leds us into defining a relation follow(a), which is the set of symbols that can legally follow an a. note that in this case, unlike the first, the "argument" is a single nonterminal. i want to quickly go over the algorithm to compute first and follow. really, the details are not all that important. what you should know is (1) what first and follow are, and (2) how they are used. the details of how to compute first and follow are given in the book in much more detail, if you really want to see it. the initial step is to discover all the nonterminals that can derive nothing. let us call this the epsilon set. clearly anything that has an epsilon production on the right hand side is in the epsilon set, so we first scan the productions and add these symbols. next, we repeatedly do the following: scan the productions, if anything has all nonterminals on the right hand side, and if all those nonterminals are in the epsilon set, then add the left hand side to the epsilon set. we do this until we have scanned the productions at least once without adding anything to the epsilon set. now to construct first(alpha), there are three cases: alpha begins with a terminal - x , then first(alpha) consists of only x. alpha begins with a nonterminal, x we go compute first(x), and assign this to first(alpha).
now if x is in the epsilon set, we also add first(remainder) to first(alpha). alpha is epsilon - add epsilon to first set. to compute the follow set of a nonterminal x, look at all the places x is used in the grammer. compute what could come next (using the first set information) after we have seen the x. this is the follow set. (depending upon time, give railroad diagram form for grammar, and intuition about first and follow in terms of railroad charts). once we have first and follow, we can rephrase the ll(1) requirement, as follows: when we are faced with a choice a -> alpha | beta | gamma | delta the first sets of the choices must be distinct, and if any choice can be empty than these must also be distinct from the follow set for a. we can also rephrase our construction rule for building a procedure for recognzing nonterminals. in this example, our procedure would be boolean procedure a() if next token in first set for alpha then recognize alpha else if next token in first set for beta then recognize beta else if next token in first set for gamma then recognize gamma else if next token in first set for delta then recognize delta else error the error is replaced by "return true", if epsilon is a legal choice for this nonterminal. this all seems very straightforward. we can encode our choices in a table, indexed by nonterminals and terminals and containing right hand sides. m[a,x] is "if you are looking for an a, and the next token is an x, then this is the production to use." procedure is given on page 225. how do we use this? build a push down automata with a stack full of goals. algorithm given on page 226. this is our first example of a table driven parser. advantages of table driven parsers: (1) small and (2) fast disadvantages: (1) hard to debug if you are doing it by hand.