Normal Form Grammars and Parsing
In formal language theory, normal form grammars are special kinds of context-free grammars (CFGs) that follow a specific set of rules, making them easier to handle in parsing algorithms. These normal forms provide a structured way to represent languages and simplify the process of designing efficient parsers.
Normal Forms for Context-Free Grammars (CFGs)
The two most common normal forms for context-free grammars are Chomsky Normal Form (CNF) and Greibach Normal Form (GNF). Each of these normal forms has specific rules that govern the structure of the production rules.
1. Chomsky Normal Form (CNF)
A CFG is in Chomsky Normal Form (CNF) if all of its production rules satisfy one of the following two conditions:
- A→BC, where A is a non-terminal and B,C are non-terminals.
- A→a, where A is a non-terminal and a is a terminal symbol.
This means that each production either has exactly two non-terminals on the right-hand side or just one terminal symbol.
Properties of CNF:
- Each production is binary or a single terminal, which simplifies parsing algorithms like CYK (Cocke-Younger-Kasami).
- CNF ensures that every derivation in the grammar has a very specific structure, which can be leveraged for efficient parsing and computations.
Converting a CFG to CNF:
The process of converting a CFG to CNF generally involves these steps:
- Eliminate ϵ-productions: Remove any rules that generate the empty string ϵ.
- Eliminate unit productions: Remove productions of the form A→B, where both A and B are non-terminals.
- Remove productions with more than two non-terminals on the right-hand side: For productions like A→X1X2X3, split them into binary rules.
- Ensure that terminal symbols appear only in the form A→a: If a production contains both terminals and non-terminals, introduce a new non-terminal to handle the terminal.
Example of CNF:
Consider the following CFG:
S→aSb∣SS∣ϵ
-
Eliminate ϵ-productions:
- S→ϵ is removed, and new productions are added to account for this, like S→aSb and S→ab.
-
Eliminate unit productions:
- The rule S→SS remains as is since it's already in the form A→BC.
-
Convert rules like S→aSb into binary rules:
- Introduce a new non-terminal X→Sb, and rewrite the rule as S→aX.
The resulting CNF might look like:
S→aX∣ab
X→Sb
2. Greibach Normal Form (GNF)
A CFG is in Greibach Normal Form (GNF) if every production is of the form:
A→aα
Where:
- A is a non-terminal,
- a is a terminal symbol, and
- α is a (possibly empty) string of non-terminals.
Properties of GNF:
- GNF ensures that the right-hand side of every production starts with a terminal symbol, which is helpful for top-down parsers like LL parsers.
- GNF makes it possible to generate strings in a left-to-right manner, which is efficient for recursive descent parsing.
Converting a CFG to GNF:
The steps to convert a CFG to GNF generally include:
- Ensure that every production starts with a terminal.
- If a production contains non-terminals, transform it into a sequence starting with a terminal symbol.
Example of GNF:
Consider the grammar:
S→A∣b
A→c
This is already in GNF because each production has the form A→aα.
However, if a production has the form A→B, you would need to ensure the right-hand side starts with a terminal. This may involve introducing new non-terminals to generate terminals in the right order.
Parsing and Normal Form Grammars
Parsing is the process of analyzing a string of symbols based on a given grammar and determining if it can be derived from the start symbol of the grammar. Parsing involves determining the structure of the string according to the rules of the grammar.
Parsing with CNF (CYK Algorithm)
One of the most notable algorithms for parsing context-free languages in CNF is the CYK (Cocke-Younger-Kasami) algorithm. It is a dynamic programming algorithm that checks if a string can be generated by a CFG in CNF.
CYK Algorithm Steps:
- Create a parse table that will store which non-terminals can generate each substring of the input string.
- Start with substrings of length 1 and fill in the table using the CNF production rules.
- Gradually increase the length of the substrings, using the table entries of smaller substrings to build longer ones.
- Check if the start symbol can generate the entire input string.
Since CNF ensures that productions only have binary or terminal forms, the CYK algorithm is efficient in terms of space and computation, making it a powerful tool for parsing in CNF.
Example of CYK Algorithm:
Consider the grammar in CNF and the string "ab":
S→aX∣ab
X→Sb
The CYK algorithm would fill the parse table by:
- Checking substrings of length 1 ("a", "b").
- Using the grammar's rules to combine substrings and fill the table for longer substrings.
Parsing with GNF (Recursive Descent Parser)
For a grammar in Greibach Normal Form (GNF), a recursive descent parser can be used. A recursive descent parser is a top-down parsing technique where each non-terminal in the grammar corresponds to a function that attempts to match the terminal symbols of the input string.
Advantages of GNF for Parsing:
- Since every production in GNF begins with a terminal symbol, the parser can immediately attempt to match the input symbol, making it naturally suited for top-down parsing.
- It avoids the complications that arise in left-recursive grammars, which can be problematic for top-down parsers.
Example of a Recursive Descent Parser for GNF:
Consider a grammar in GNF:
S→aA
A→b
The recursive descent parser would:
- Start with S, expecting the input to start with "a".
- Upon matching "a", it calls the A function, which expects "b".
- If the parser successfully matches both "a" and "b", the string is accepted.
Conclusion
-
Normal Form Grammars (like Chomsky Normal Form (CNF) and Greibach Normal Form (GNF)) provide structured representations of context-free languages, making parsing algorithms more efficient.
- CNF is ideal for bottom-up parsing techniques such as the CYK algorithm.
- GNF is suitable for top-down parsers like recursive descent parsers.
-
Parsing is the process of determining if a string can be generated by a given grammar, and different normal forms can make the parsing process more efficient, either by simplifying the structure of the grammar or ensuring that parsing can be done in a systematic and predictable manner.
By transforming a grammar into a normal form, we can take advantage of efficient parsing algorithms, making the process of syntax analysis in compilers and interpreters faster and more reliable.