BNF Notation

Next: Derivations and Parse Trees Up: Grammars for Programming Languages Previous: Grammars for Programming Languages Contents

BNF Notation

One of the earliest forms of notation used for describing the syntax of a programming language was Backus-Naur Form (BNF); this is basically just a variant of a context-free grammar, with the symbol ``::='' used in place of `` $\rightarrow$ '' to mean ``is defined as''. Additionally, non-terminals are usually written in angle-brackets and terminals in quotes.

For example, we could describe a block of statements in Pascal as:

$\langle$ block $\rangle$	::=	``BEGIN'' $\langle$ opt-stats $\rangle$ ``END''.
$\langle$ opt-stats $\rangle$	::=	$\langle$ stats-list $\rangle$ $\mid$ $\varepsilon$ .
$\langle$ stats-list $\rangle$	::=	$\langle$ statement $\rangle$ $\mid$ $\langle$ statement $\rangle$ ``;'' $\langle$ stats-list $\rangle$ .

An extension of this notation, known as extended BNF (EBNF), allows us to use regular- expression-like constructs on the right-hand-side of the rules: for any sequence of symbols $\alpha$ , we write ``[ $\alpha$ ]'' to say that $\alpha$ is optional, and ``{ $\alpha$ }'' to denote 0 or more repetitions of $\alpha$ (ie. the Kleene closure of $\alpha$ ).

Thus we could re-write the above rules in EBNF as:

$\langle$ block $\rangle$	::=	``BEGIN'' [ $\langle$ stats-list $\rangle$ ] ``END''.
$\langle$ stats-list $\rangle$	::=	{ $\langle$ statement $\rangle$ ";"} $\langle$ statement $\rangle$ .

Using EBNF has the advantage of simplifying the grammar by reducing the need to use recursive rules.

We could describe the complete syntax of a programming language using EBNF (or equivalently a CFG), but it tends to make things easier if we break it up into two sections:

The lexical syntax, which defines which patterns of characters make up which words. This can be described using regular expressions
The context-free syntax, which defines which sequences of words which make up phrases from the programming language; we describe this using a context-free grammar.

Thus the first task in writing a parser for a language is breaking up its syntax into the ``regular'' and ``context-free'' part. The choice is arbitrary but a simple guideline is that:

characters from the input should not appear in the context-free syntax;
these should be tokenised by the lexical phase.

As well as modularising our task, this approach will facilitate debugging in parser generators such as yacc.

Next: Derivations and Parse Trees Up: Grammars for Programming Languages Previous: Grammars for Programming Languages Contents

James Power 2002-11-29