A grammar is said to be ambiguous if it can produce more than one parse tree for a particular sentence; this occurs when two different sequences of leftmost (or rightmost) derivations can produce the same sentence from the same start symbol.
Consider the following example:
Suppose we wanted to parse the sentence:
if num then if num then st else st
We could try to derive this (in the manner of a leftmost, top-down parse) as follows:
Rules | Sentential Form |
Rule 1 | if then STAT ESTAT |
Rule 5 | if num then STAT ESTAT |
Rule 1 | if num then if EXPR STAT ESTAT ESTAT |
Rule 5 | if num then if num then STAT ESTAT ESTAT |
Rule 2 | if num then if num then st ESTAT ESTAT |
So far, so good. But in order to finish the derivation, we could choose either of two routes:
Rules | Sentential Form |
Rule 4 | if num then if num then st |
Rule 3 | if num then if num then st else |
Rule 2 | if num then if num then st else st |
Rules | Sentential Form |
Rule 3 | if num then if num then st else STAT ESTAT |
Rule 2 | if num then if num then st else st ESTAT |
Rule 4 | if num then if num then st else st |
In both cases the correct sentence is derived, but the difference in the structure of the parse trees reflects the fact that the grammar is ambiguous. In this case, the ambiguity arises because we have not specified whether the else is to be associated with the first if (as in the first derivation) or with the second if (as in the second); ie. which of the following it is semantically equivalent to:
The latter is preferable, since in most programming languages we will
want to be able to match each else with the closest previous
unmatched if.
In order to incorporate this convention into our grammar, we must change the rules so that we can tell the difference between matched statements (an equal number of if and else terminals) and unmatched statements. This gives us:
Unlike left-factoring, we cannot provide any general procedure which
will eliminate ambiguity; instead, it will depend on our knowledge of
the semantics of the language.
In fact, it is not possible to write an algorithm which will take an
arbitrary grammar and decide whether or not it is ambiguous. Even if
we do identify ambiguity, we may not always be able to remove it,
since some languages cannot be described by a grammar which is
unambiguous.