ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Theory of Automata
    COMP3148
    Progress0 / 25 topics
    Topics
    1. Finite State Models2. Language definitions preliminaries3. Regular expressions/Regular languages4. Finite automata (FAs)5. Transition graphs (TGs)6. NFAs7. Kleene's theorem8. Transducers (automata with output)9. Pumping lemma and non-regular language10. Grammars and PDA11. CFGs12. Derivations, derivation trees and ambiguity13. Simplifying CFLs14. Normal form grammars and parsing15. Decidability16. Context sensitive languages17. Grammars and linear bounded automata (LBA)18. Chomsky's hierarchy of grammars19. Turing Machines Theory20. Turing machines21. Post machine22. Variations on TM23. TM encoding24. Universal Turing Machine25. Defining Computers by TMs
    COMP3148›Normal form grammars and parsing
    Theory of AutomataTopic 14 of 25

    Normal form grammars and parsing

    8 minread
    1,390words
    Intermediatelevel

    Normal Form Grammars and Parsing

    In formal language theory, normal form grammars are special kinds of context-free grammars (CFGs) that follow a specific set of rules, making them easier to handle in parsing algorithms. These normal forms provide a structured way to represent languages and simplify the process of designing efficient parsers.

    Normal Forms for Context-Free Grammars (CFGs)

    The two most common normal forms for context-free grammars are Chomsky Normal Form (CNF) and Greibach Normal Form (GNF). Each of these normal forms has specific rules that govern the structure of the production rules.

    1. Chomsky Normal Form (CNF)

    A CFG is in Chomsky Normal Form (CNF) if all of its production rules satisfy one of the following two conditions:

    • A→BCA \to BCA→BC, where AAA is a non-terminal and B,CB, CB,C are non-terminals.
    • A→aA \to aA→a, where AAA is a non-terminal and aaa is a terminal symbol.

    This means that each production either has exactly two non-terminals on the right-hand side or just one terminal symbol.

    Properties of CNF:
    • Each production is binary or a single terminal, which simplifies parsing algorithms like CYK (Cocke-Younger-Kasami).
    • CNF ensures that every derivation in the grammar has a very specific structure, which can be leveraged for efficient parsing and computations.
    Converting a CFG to CNF:

    The process of converting a CFG to CNF generally involves these steps:

    1. Eliminate ϵ\epsilonϵ-productions: Remove any rules that generate the empty string ϵ\epsilonϵ.
    2. Eliminate unit productions: Remove productions of the form A→BA \to BA→B, where both AAA and BBB are non-terminals.
    3. Remove productions with more than two non-terminals on the right-hand side: For productions like A→X1X2X3A \to X_1 X_2 X_3A→X1​X2​X3​, split them into binary rules.
    4. Ensure that terminal symbols appear only in the form A→aA \to aA→a: If a production contains both terminals and non-terminals, introduce a new non-terminal to handle the terminal.
    Example of CNF:

    Consider the following CFG:

    S→aSb∣SS∣ϵS \to aSb \mid SS \mid \epsilonS→aSb∣SS∣ϵ
    1. Eliminate ϵ\epsilonϵ-productions:

      • S→ϵS \to \epsilonS→ϵ is removed, and new productions are added to account for this, like S→aSbS \to aSbS→aSb and S→abS \to abS→ab.
    2. Eliminate unit productions:

      • The rule S→SSS \to SSS→SS remains as is since it's already in the form A→BCA \to BCA→BC.
    3. Convert rules like S→aSbS \to aSbS→aSb into binary rules:

      • Introduce a new non-terminal X→SbX \to SbX→Sb, and rewrite the rule as S→aXS \to aXS→aX.

    The resulting CNF might look like:

    S→aX∣abS \to aX \mid abS→aX∣ab X→SbX \to SbX→Sb

    2. Greibach Normal Form (GNF)

    A CFG is in Greibach Normal Form (GNF) if every production is of the form:

    A→aαA \to a \alphaA→aα

    Where:

    • AAA is a non-terminal,
    • aaa is a terminal symbol, and
    • α\alphaα is a (possibly empty) string of non-terminals.
    Properties of GNF:
    • GNF ensures that the right-hand side of every production starts with a terminal symbol, which is helpful for top-down parsers like LL parsers.
    • GNF makes it possible to generate strings in a left-to-right manner, which is efficient for recursive descent parsing.
    Converting a CFG to GNF:

    The steps to convert a CFG to GNF generally include:

    1. Ensure that every production starts with a terminal.
    2. If a production contains non-terminals, transform it into a sequence starting with a terminal symbol.
    Example of GNF:

    Consider the grammar:

    S→A∣bS \to A \mid bS→A∣b A→cA \to cA→c

    This is already in GNF because each production has the form A→aαA \to a \alphaA→aα.

    However, if a production has the form A→BA \to BA→B, you would need to ensure the right-hand side starts with a terminal. This may involve introducing new non-terminals to generate terminals in the right order.

    Parsing and Normal Form Grammars

    Parsing is the process of analyzing a string of symbols based on a given grammar and determining if it can be derived from the start symbol of the grammar. Parsing involves determining the structure of the string according to the rules of the grammar.

    Parsing with CNF (CYK Algorithm)

    One of the most notable algorithms for parsing context-free languages in CNF is the CYK (Cocke-Younger-Kasami) algorithm. It is a dynamic programming algorithm that checks if a string can be generated by a CFG in CNF.

    CYK Algorithm Steps:
    1. Create a parse table that will store which non-terminals can generate each substring of the input string.
    2. Start with substrings of length 1 and fill in the table using the CNF production rules.
    3. Gradually increase the length of the substrings, using the table entries of smaller substrings to build longer ones.
    4. Check if the start symbol can generate the entire input string.

    Since CNF ensures that productions only have binary or terminal forms, the CYK algorithm is efficient in terms of space and computation, making it a powerful tool for parsing in CNF.

    Example of CYK Algorithm:

    Consider the grammar in CNF and the string "ab":

    S→aX∣abS \to aX \mid abS→aX∣ab X→SbX \to SbX→Sb

    The CYK algorithm would fill the parse table by:

    • Checking substrings of length 1 ("a", "b").
    • Using the grammar's rules to combine substrings and fill the table for longer substrings.

    Parsing with GNF (Recursive Descent Parser)

    For a grammar in Greibach Normal Form (GNF), a recursive descent parser can be used. A recursive descent parser is a top-down parsing technique where each non-terminal in the grammar corresponds to a function that attempts to match the terminal symbols of the input string.

    Advantages of GNF for Parsing:
    • Since every production in GNF begins with a terminal symbol, the parser can immediately attempt to match the input symbol, making it naturally suited for top-down parsing.
    • It avoids the complications that arise in left-recursive grammars, which can be problematic for top-down parsers.
    Example of a Recursive Descent Parser for GNF:

    Consider a grammar in GNF:

    S→aAS \to aAS→aA A→bA \to bA→b

    The recursive descent parser would:

    • Start with SSS, expecting the input to start with "a".
    • Upon matching "a", it calls the AAA function, which expects "b".
    • If the parser successfully matches both "a" and "b", the string is accepted.

    Conclusion

    • Normal Form Grammars (like Chomsky Normal Form (CNF) and Greibach Normal Form (GNF)) provide structured representations of context-free languages, making parsing algorithms more efficient.

      • CNF is ideal for bottom-up parsing techniques such as the CYK algorithm.
      • GNF is suitable for top-down parsers like recursive descent parsers.
    • Parsing is the process of determining if a string can be generated by a given grammar, and different normal forms can make the parsing process more efficient, either by simplifying the structure of the grammar or ensuring that parsing can be done in a systematic and predictable manner.

    By transforming a grammar into a normal form, we can take advantage of efficient parsing algorithms, making the process of syntax analysis in compilers and interpreters faster and more reliable.

    Previous topic 13
    Simplifying CFLs
    Next topic 15
    Decidability

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time8 min
      Word count1,390
      Code examples0
      DifficultyIntermediate