One of the earliest forms of notation used for describing the syntax
of a programming language was Backus-Naur Form (BNF); this is
basically just a variant of a context-free grammar, with the symbol
``::='' used in place of ``'' to mean ``is defined as''.
Additionally, non-terminals are usually written in angle-brackets and
terminals in quotes.
For example, we could describe a block of statements in Pascal as:
![]() ![]() |
::= | ``BEGIN'' ![]() ![]() |
![]() ![]() |
::= | ![]() ![]() ![]() ![]() |
![]() ![]() |
::= | ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
An extension of this notation, known as extended BNF (EBNF),
allows us to use regular- expression-like constructs on the
right-hand-side of the rules: for any sequence of symbols , we
write ``[
]'' to say that
is optional, and
``{
}'' to denote 0 or more repetitions of
(ie. the
Kleene closure of
).
Thus we could re-write the above rules in EBNF as:
![]() ![]() |
::= | ``BEGIN'' [![]() ![]() |
![]() ![]() |
::= | {![]() ![]() ![]() ![]() |
Using EBNF has the advantage of simplifying the grammar by reducing
the need to use recursive rules.
We could describe the complete syntax of a programming language using EBNF (or equivalently a CFG), but it tends to make things easier if we break it up into two sections:
Thus the first task in writing a parser for a language is breaking up its syntax into the ``regular'' and ``context-free'' part. The choice is arbitrary but a simple guideline is that: