One of the earliest forms of notation used for describing the syntax
of a programming language was Backus-Naur Form (BNF); this is
basically just a variant of a context-free grammar, with the symbol
``::='' used in place of ``
'' to mean ``is defined as''.
Additionally, non-terminals are usually written in angle-brackets and
terminals in quotes.
For example, we could describe a block of statements in Pascal as:
| ::= | ``BEGIN'' |
|
| ::= | ||
| ::= |
An extension of this notation, known as extended BNF (EBNF),
allows us to use regular- expression-like constructs on the
right-hand-side of the rules: for any sequence of symbols
, we
write ``[
]'' to say that
is optional, and
``{
}'' to denote 0 or more repetitions of
(ie. the
Kleene closure of
).
Thus we could re-write the above rules in EBNF as:
| ::= | ``BEGIN'' [ |
|
| ::= | { |
Using EBNF has the advantage of simplifying the grammar by reducing
the need to use recursive rules.
We could describe the complete syntax of a programming language using EBNF (or equivalently a CFG), but it tends to make things easier if we break it up into two sections:
Thus the first task in writing a parser for a language is breaking up its syntax into the ``regular'' and ``context-free'' part. The choice is arbitrary but a simple guideline is that: