Top-down parsing is one of the two main approaches to parsing (the other being bottom-up parsing). In top-down parsing, the parser starts at the top of the parse tree (beginning with the start symbol) and attempts to construct the tree down to the leaves (tokens or terminal symbols). The parser recursively applies production rules from the grammar to match the input string. This process is known as derivation.
Top-down parsing is widely used in compilers and interpreters due to its simplicity and ease of implementation.
A recursive descent parser is a straightforward and simple method of top-down parsing. It is implemented by creating a set of recursive procedures or functions, where each function corresponds to a non-terminal in the grammar.
Given the grammar:
E -> T + E | T
T -> F * T | F
F -> ( E ) | id
You would define a function for each non-terminal:
T + E or just T.F * T or just F.( E ) or id.In pseudocode, it might look like this:
def E():
if lookahead == "id" or lookahead == "(":
T()
if lookahead == "+":
consume("+")
E()
def T():
if lookahead == "id" or lookahead == "(":
F()
if lookahead == "*":
consume("*")
T()
def F():
if lookahead == "id":
consume("id")
elif lookahead == "(":
consume("(")
E()
consume(")")
Left Recursion: Recursive descent parsers cannot handle left-recursive grammars. For example, a grammar like:
A -> A α | β
will cause infinite recursion in a recursive descent parser. To overcome this issue, left recursion must be eliminated from the grammar.
Backtracking: If a wrong choice is made (i.e., a production rule doesn't match), backtracking is required to try a different rule, which can lead to inefficiency.
An LL parser is a more efficient top-down parsing approach. It uses lookahead to predict which rule to apply without needing to backtrack.
LL(k) parsers use left-to-right scanning of the input, and they build a leftmost derivation using k tokens of lookahead.
k tokens (though in practice, k is often 1 for simplicity).In LL(1) parsing, the grammar is transformed into a table that specifies, for each non-terminal and lookahead token, which production rule to apply.
For example, if the grammar is:
E -> T + E | T
T -> F * T | F
F -> ( E ) | id
The LL(1) parsing table might look like:
| Non-terminal | lookahead = id |
lookahead = ( |
lookahead = + |
lookahead = * |
|---|---|---|---|---|
| E | T + E |
T |
||
| T | F * T |
F |
||
| F | id |
( E ) |
E -> E + T | T is problematic.Top-down parsing is a fundamental approach used in compilers and interpreters. It is intuitive and easy to implement, but it does have limitations, especially when dealing with left recursion and complex grammars. Here’s a summary:
Both approaches are crucial in compiler design, and their use depends on the specific characteristics of the grammar and language being parsed.
Open this section to load past papers