next up previous contents
Next: Eliminating Immediate Left Recursion Up: Modifying Grammars Previous: Left Factoring   Contents

Eliminating Ambiguity

A grammar is said to be ambiguous if it can produce more than one parse tree for a particular sentence; this occurs when two different sequences of leftmost (or rightmost) derivations can produce the same sentence from the same start symbol.

Consider the following example:

\begin{displaymath}\begin{array}{rcllr}
STAT & \rightarrow & {\bf if}\; EXPR\; {...
...\\
EXPR & \rightarrow & {\bf num} & & Rule\; 5 \\
\end{array}\end{displaymath}

Suppose we wanted to parse the sentence: if num then if num then st else st

We could try to derive this (in the manner of a leftmost, top-down parse) as follows:

Rules Sentential Form
  $STAT$
Rule 1 if $EXPR$ then STAT ESTAT
Rule 5 if num then STAT ESTAT
Rule 1 if num then if EXPR STAT ESTAT ESTAT
Rule 5 if num then if num then STAT ESTAT ESTAT
Rule 2 if num then if num then st ESTAT ESTAT
   

So far, so good. But in order to finish the derivation, we could choose either of two routes:

  1. We apply rule 4, 3 and 2 to get:
    Rules Sentential Form
    Rule 4 if num then if num then st $ESTAT$
    Rule 3 if num then if num then st else $STAT$
    Rule 2 if num then if num then st else st
    =1.00mm
    \begin{picture}(130.00,80.00)(0,50)
\put(75.00,120.00){\makebox(0,0)[cc]{STAT}}
...
...00){\vector(0,-1){14.67}}
\put(112.33,98.00){\vector(1,-1){15.00}}
\end{picture}

  2. Alternatively, we could apply rule 3 first, and then 2 and 4 to get:
    Rules Sentential Form
    Rule 3 if num then if num then st else STAT ESTAT
    Rule 2 if num then if num then st else st ESTAT
    Rule 4 if num then if num then st else st
    =1.00mm
    \begin{picture}(121.00,100.00)(0,35)
\put(75.00,120.00){\makebox(0,0)[cc]{STAT}}...
...67){\vector(4,-3){17.67}}
\put(121.00,57.67){\vector(0,-1){13.67}}
\end{picture}

In both cases the correct sentence is derived, but the difference in the structure of the parse trees reflects the fact that the grammar is ambiguous. In this case, the ambiguity arises because we have not specified whether the else is to be associated with the first if (as in the first derivation) or with the second if (as in the second); ie. which of the following it is semantically equivalent to:

  1. if num then { if num then st } else st
  2. if num then { if num then st else st }

The latter is preferable, since in most programming languages we will want to be able to match each else with the closest previous unmatched if.

In order to incorporate this convention into our grammar, we must change the rules so that we can tell the difference between matched statements (an equal number of if and else terminals) and unmatched statements. This gives us:


\begin{displaymath}\begin{array}{rcllr}
STAT & \rightarrow & MSTAT \\
& \mid &...
...lse}\; USTAT \\
EXPR & \rightarrow & {\bf num} \\
\end{array}\end{displaymath}


Unlike left-factoring, we cannot provide any general procedure which will eliminate ambiguity; instead, it will depend on our knowledge of the semantics of the language.

In fact, it is not possible to write an algorithm which will take an arbitrary grammar and decide whether or not it is ambiguous. Even if we do identify ambiguity, we may not always be able to remove it, since some languages cannot be described by a grammar which is unambiguous.


next up previous contents
Next: Eliminating Immediate Left Recursion Up: Modifying Grammars Previous: Left Factoring   Contents
James Power 2002-11-29