next up previous contents
Next: Leftmost and Rightmost Derivations Up: CONTEXT-FREE LANGUAGES Previous: BNF Notation   Contents

Derivations and Parse Trees

Parsing is the process where we take a particular sequence of words and check to see if they correspond to the language defined by some grammar.

Basically what we have to do is to show how we can get:

from the start symbol of the grammar
to the sequence of words that supposedly belong to the language
by using the production rules

We can do this by constructing the appropriate parse tree for the sequence of words, which is simply a graphical way of listing the production rules that we have used.

Considering the following simple grammar for arithmetic expressions:

\begin{displaymath}\begin{array}{rcl}
E & \rightarrow & E\; OP\; E\; \;\mid\; (\...
...ightarrow & + \mid\; -\; \mid\; *\; \mid\; \div \\
\end{array}\end{displaymath}

Here there are just two non-terminals, $E$ and $OP$ (where $E$ is the start symbol); the terminal symbols are: +, -, *, $\div$, (, ), num.

Suppose the lexical analysis phase has given us the sequence of tokens which correspond to: (num + num) $\div$ num. We can parse this as follows:


\begin{displaymath}\begin{array}{rcl\vert r}
\hline
\multicolumn{3}{c}{Rule\; Ap...
...f num} & {\bf ( num + num )}\; \div\; {\bf num} \\
\end{array}\end{displaymath}


We say that $E$ derives (num + num) $\div$ num, and write:

$E$ $\rightarrow^*$ (num + num) $\div$ num

We can represent the above derivation graphically by means of a parse tree. The root of the tree is the start symbol $E$, and its leaves are the terminal symbols in the sentence which has been derived. Each internal node is labelled with a non-terminal, and the children are the symbols obtained by applying one of the production rules.

=1.00mm
\begin{picture}(95.00,100.00)
\put(70.00,100.00){\makebox(0,0)[cc]{E}}
\put(70.0...
...55.00,37){\vector(0,-1){13.5}}
\put(40.00,37){\vector(0,-1){13.5}}
\end{picture}

There are two ways of constructing a parse tree whose root is the start symbol and whose leaves are the sentence:

Top-Down
As we did earlier, we start with the start symbol and apply the production rules (replacing l.h.s. with r.h.s.) until we derive the sentence

Bottom-Up
We start with the sentence, and apply the production rules in reverse (ie. replacing r.h.s. of the rule with the l.h.s.) until we derive the start symbol.

Both of these methods will be discussed in detail later.



Subsections
next up previous contents
Next: Leftmost and Rightmost Derivations Up: CONTEXT-FREE LANGUAGES Previous: BNF Notation   Contents
James Power 2002-11-29