next up previous contents
Next: Implementing a Parser Up: Lookahead Previous: Lookahead   Contents

FIRST and FOLLOW

We formalise the task of picking a production rule using two functions, FIRST and FOLLOW.

FIRST is applied to the r.h.s. of a production rule, and tells us all the terminal symbols that can start sentences derived from that r.h.s. It is defined as:

  1. For any terminal symbol a, $FIRST({\bf a})$ = $\{{\bf a}\}$.
    Also, $FIRST(\varepsilon)$ = $\{\varepsilon\}$.
  2. For any non-terminal $A$ with production rules $A\; \rightarrow\; \alpha_1\; \mid\; \alpha_2\; \mid\; \ldots\; \mid\; \alpha_n$, $FIRST(A)$ = $FIRST(\alpha_1) \cup FIRST(\alpha_2) \cup \ldots \cup FIRST(\alpha_n)$
  3. For any r.h.s. of the form: $\beta_1 \beta_2 \ldots \beta_n$ (where each $\beta_i$ is a terminal or a non-terminal) we have:
    1. $FIRST(\beta_1)$ is in $FIRST(\beta_1 \beta_2 \ldots \beta_n)$
      if $\beta_1$ can derive $\varepsilon$, then $FIRST(\beta_2)$ is also in $FIRST(\beta_1 \beta_2 \ldots \beta_n)$
      if both $\beta_1$ and $\beta_2$ can derive $\varepsilon$, then $FIRST(\beta_3)$ is also in $FIRST(\beta_1 \beta_2 \ldots \beta_n)$
      $\cdots$
      if $\beta_1 \beta_2 \ldots \beta_i$ can derive $\varepsilon$, then $FIRST(\beta_{i+1})$ is also in $FIRST(\beta_1 \beta_2 \ldots \beta_n)$
    2. $\varepsilon$ is in $FIRST(\beta_1 \beta_2 \ldots \beta_n)$ only if $\varepsilon$ is in $FIRST(\beta_i)$, for all $0 \leq i \leq n$

FIRST can be applied to any r.h.s., and returns a set of terminal symbols.

Thus, if $X$ is the current non-terminal (ie. leftmost in the current sentential form), and a is the next symbol on the input, then we want to look at the r.h.s. of the production rules for $X$ and choose the one whose FIRST set contains a.

FOLLOW is used only if the current non-terminal can derive $\varepsilon$; then we're interested in what could have followed it in a sentential form. (NB: A string can derive $\varepsilon$ if and only if $\varepsilon$ is in its FIRST set.)

  1. If $S$ is the start symbol, then put $\bot$ into $FOLLOW(S)$
  2. Examine all rules of the form $A\; \rightarrow\; \alpha X \beta$; then
    1. $FIRST(\beta)$ is in $FOLLOW(X)$
    2. If $\beta$ can derive the empty string then put $FOLLOW(A)$ into $FOLLOW(X)$

The ``$\bot$'' is a symbol which is used to mark the end of the input tape.

FOLLOW can be applied to a single non-terminal only, and returns a set of terminals.

Thus, if $X$ is the current non-terminal, a is the next symbol on the input, and we have a production rule for $X$ which allows it to derive $\varepsilon$, then we apply this rule only if a is in the FOLLOW set for$X$.

FIRST and FOLLOW help us to pick a rule when we have a choice between two or more r.h.s. by predicting the first symbol that each r.h.s. can derive. Even if there is only one r.h.s. we can still use them to tell us whether or not we have an error - if the current input symbol cannot be derived from the only r.h.s. available, then we know immediately that the sentence does not belong to the grammar, without having to (attempt to) finish the parse.


next up previous contents
Next: Implementing a Parser Up: Lookahead Previous: Lookahead   Contents
James Power 2002-11-29