DC-322›Top-Down Parsing

Compiler ConstructionTopic 8 of 14

Top-Down Parsing

6 minread

1,012words

Intermediatelevel

Top-Down Parsing

Top-down parsing is one of the two main approaches to parsing (the other being bottom-up parsing). In top-down parsing, the parser starts at the top of the parse tree (beginning with the start symbol) and attempts to construct the tree down to the leaves (tokens or terminal symbols). The parser recursively applies production rules from the grammar to match the input string. This process is known as derivation.

Top-down parsing is widely used in compilers and interpreters due to its simplicity and ease of implementation.

Key Concepts of Top-Down Parsing

Start Symbol: The parser begins with the start symbol of the grammar.
Production Rules: The grammar consists of production rules that describe how non-terminal symbols can be expanded into sequences of terminals and other non-terminals.
Recursive Expansion: The parser recursively tries to match the input string with the right-hand side of the grammar's production rules.
Leftmost Derivation: Top-down parsers typically build a leftmost derivation, meaning that they expand the leftmost non-terminal symbol first.
Lookahead: In many top-down parsing techniques (e.g., LL parsers), the parser uses lookahead to decide which production rule to apply based on the next token in the input.

Types of Top-Down Parsers

Recursive Descent Parsing
LL Parsers (Predictive Parsers)

1. Recursive Descent Parsing

A recursive descent parser is a straightforward and simple method of top-down parsing. It is implemented by creating a set of recursive procedures or functions, where each function corresponds to a non-terminal in the grammar.

How Recursive Descent Parsing Works

For every non-terminal in the grammar, you create a recursive function.
Each function will attempt to match the corresponding non-terminal with the input string, using the production rules.
If a match is found, the function proceeds to the next token or non-terminal.
If no match is found, the function backtracks (i.e., tries a different alternative or backtracks to a previous decision point).

Example of Recursive Descent Parser

Given the grammar:

E -> T + E | T
T -> F * T | F
F -> ( E ) | id

You would define a function for each non-terminal:

E() will try to match T + E or just T.
T() will try to match F * T or just F.
F() will try to match ( E ) or id.

In pseudocode, it might look like this:

def E():
    if lookahead == "id" or lookahead == "(":
        T()
        if lookahead == "+":
            consume("+")
            E()
            
def T():
    if lookahead == "id" or lookahead == "(":
        F()
        if lookahead == "*":
            consume("*")
            T()

def F():
    if lookahead == "id":
        consume("id")
    elif lookahead == "(":
        consume("(")
        E()
        consume(")")

Advantages of Recursive Descent Parsing:

Simplicity: Recursive descent parsers are easy to implement and understand.
Flexibility: It is straightforward to modify or extend the grammar.

Limitations of Recursive Descent Parsing:

Left Recursion: Recursive descent parsers cannot handle left-recursive grammars. For example, a grammar like:
```
A -> A α | β
```
will cause infinite recursion in a recursive descent parser. To overcome this issue, left recursion must be eliminated from the grammar.
Backtracking: If a wrong choice is made (i.e., a production rule doesn't match), backtracking is required to try a different rule, which can lead to inefficiency.

2. LL Parsers (Predictive Parsers)

An LL parser is a more efficient top-down parsing approach. It uses lookahead to predict which rule to apply without needing to backtrack.

LL(k) parsers use left-to-right scanning of the input, and they build a leftmost derivation using k tokens of lookahead.
- LL(1) parsers look ahead one token to decide which rule to apply.
- LL(k) parsers look ahead k tokens (though in practice, k is often 1 for simplicity).

How LL Parsers Work

Predictive Parsing: LL parsers predict which production rule to apply based on the current non-terminal and the lookahead token.
Parsing Table: An LL parser typically uses a parsing table to guide its decisions. The table helps decide which production rule to apply for each combination of non-terminal and lookahead token.

LL(1) Parsing Table

In LL(1) parsing, the grammar is transformed into a table that specifies, for each non-terminal and lookahead token, which production rule to apply.

For example, if the grammar is:

E -> T + E | T
T -> F * T | F
F -> ( E ) | id

The LL(1) parsing table might look like:

Non-terminal	lookahead = `id`	lookahead = `(`
E	`T + E`	`T`
T	`F * T`	`F`
F	`id`	`( E )`

The first column represents the non-terminals.
The other columns represent possible lookahead tokens.
The cells of the table contain the corresponding production rule to apply.

Advantages of LL Parsers:

Efficiency: LL(1) parsers do not require backtracking, making them more efficient than recursive descent parsers in practice.
Predictive Parsing: They use lookahead to make parsing decisions, which allows for fast and unambiguous parsing.

Limitations of LL Parsers:

Left Recursion: Like recursive descent parsers, LL parsers cannot handle left-recursive grammars. For example, the grammar E -> E + T | T is problematic.
Limited to Context-Free Grammars: LL parsers are limited to a subset of context-free grammars and may not handle more complex grammars.

Conclusion: Top-Down Parsing Summary

Top-down parsing is a fundamental approach used in compilers and interpreters. It is intuitive and easy to implement, but it does have limitations, especially when dealing with left recursion and complex grammars. Here’s a summary:

Recursive Descent Parsing is simple and works well for simple grammars but cannot handle left recursion directly.
LL Parsing (especially LL(1)) is more efficient and avoids backtracking by using a parsing table and one-token lookahead, but it is also restricted to grammars that do not have left recursion.

Both approaches are crucial in compiler design, and their use depends on the specific characteristics of the grammar and language being parsed.

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

def E(): if lookahead == "id" or lookahead == "(": T() if lookahead == "+": consume("+") E() def T(): if lookahead == "id" or lookahead == "(": F() if lookahead == "*": consume("*") T() def F(): if lookahead == "id": consume("id") elif lookahead == "(": consume("(") E() consume(")")

Non-terminal

lookahead = id

lookahead = (

lookahead = +

lookahead = *

T + E

T

F * T

F

id

( E )