ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Theory of Automata
    COMP3148
    Progress0 / 25 topics
    Topics
    1. Finite State Models2. Language definitions preliminaries3. Regular expressions/Regular languages4. Finite automata (FAs)5. Transition graphs (TGs)6. NFAs7. Kleene's theorem8. Transducers (automata with output)9. Pumping lemma and non-regular language10. Grammars and PDA11. CFGs12. Derivations, derivation trees and ambiguity13. Simplifying CFLs14. Normal form grammars and parsing15. Decidability16. Context sensitive languages17. Grammars and linear bounded automata (LBA)18. Chomsky's hierarchy of grammars19. Turing Machines Theory20. Turing machines21. Post machine22. Variations on TM23. TM encoding24. Universal Turing Machine25. Defining Computers by TMs
    COMP3148›CFGs
    Theory of AutomataTopic 11 of 25

    CFGs

    10 minread
    1,734words
    Intermediatelevel

    Context-Free Grammars (CFGs)

    A Context-Free Grammar (CFG) is a formal grammar used to describe context-free languages (CFLs). These grammars are a powerful tool for defining the syntax of programming languages, arithmetic expressions, and other hierarchical structures.

    Components of a Context-Free Grammar (CFG)

    A CFG is formally defined by a 4-tuple:

    G=(V,Σ,R,S)G = (V, \Sigma, R, S)G=(V,Σ,R,S)

    Where:

    • V is a set of non-terminal symbols.
    • Σ\SigmaΣ is a set of terminal symbols.
    • R is a set of production rules (or production rules).
    • S is the start symbol (a special non-terminal symbol from which the derivation starts).
    1. Non-terminal symbols (V):
    • These are symbols that are used to define the structure of the language and are not part of the strings generated by the grammar.
    • Non-terminals are placeholders or variables that can be replaced by other non-terminals or terminal symbols.
    • Commonly represented by uppercase letters like A,B,SA, B, SA,B,S, etc.
    2. Terminal symbols (Σ\SigmaΣ):
    • These are the actual symbols that appear in the strings generated by the grammar.
    • They represent the alphabet of the language.
    • Terminals are the "building blocks" of the strings in the language and cannot be replaced.
    • They are usually represented by lowercase letters like a,b,xa, b, xa,b,x, etc.
    3. Production rules (R):
    • These are the rules used to replace non-terminals with other non-terminals or terminals.
    • A production rule is of the form: A→αA \to \alphaA→α where AAA is a non-terminal and α\alphaα is a string of non-terminals and/or terminals (which can be empty).
    • For example, a production rule could be S→aSbS \to aSbS→aSb or S→abS \to abS→ab.
    4. Start symbol (S):
    • This is the non-terminal from which derivations begin.
    • The start symbol is where the generation process starts.

    Derivation and Language Generation

    The primary purpose of a context-free grammar is to generate strings in the language it defines. This is done through a derivation process where we start with the start symbol and repeatedly apply production rules to generate a string of terminal symbols.

    Example of Derivation:

    Consider the following grammar GGG:

    V={S}V = \{ S \}V={S} Σ={a,b}\Sigma = \{ a, b \}Σ={a,b} R={S→aSb,S→ab}R = \{ S \to aSb, S \to ab \}R={S→aSb,S→ab} S=SS = SS=S
    • Start with the start symbol SSS.
    • Apply the production rule S→aSbS \to aSbS→aSb: S→aSbS \to aSbS→aSb
    • Now, replace the SSS in aSbaSbaSb with ababab (using S→abS \to abS→ab): aSb→aab baSb \to aab \, baSb→aabb This results in the string ababab, which is a valid string in the language generated by this CFG.

    So, the string ababab can be derived from this CFG.

    Properties of Context-Free Grammars (CFGs)

    1. Generative Power:

      • CFGs are capable of generating a wide range of languages, especially those with recursive or nested structures, like programming languages, mathematical expressions, or even natural languages (with some limitations).
      • Context-free languages (CFLs) can be recognized by pushdown automata (PDA) and are a broader class than regular languages, but a subset of recursive enumerable languages.
    2. Ambiguity:

      • A CFG is ambiguous if there is more than one way to derive a string in the language. That is, the same string can be generated by the grammar in more than one way, leading to multiple leftmost or rightmost derivations or parse trees.
      • Ambiguity in grammars can lead to challenges in designing compilers and interpreters because it can result in multiple interpretations of a program or expression.

      Example of Ambiguity: Consider the CFG for arithmetic expressions:

      E→E+E∣E→E⋅E∣(E)∣idE \to E + E \quad | \quad E \to E \cdot E \quad | \quad ( E ) \quad | \quad idE→E+E∣E→E⋅E∣(E)∣id

      The expression id+id⋅idid + id \cdot idid+id⋅id can be derived in two ways:

      1. (id+id)⋅id(id + id) \cdot id(id+id)⋅id
      2. id+(id⋅id)id + (id \cdot id)id+(id⋅id)

      The different derivations correspond to different ways of interpreting the order of operations.

    3. Closure Properties:

      • The class of context-free languages is closed under:
        • Union: If L1L_1L1​ and L2L_2L2​ are CFLs, then L1∪L2L_1 \cup L_2L1​∪L2​ is also a CFL.
        • Concatenation: If L1L_1L1​ and L2L_2L2​ are CFLs, then L1⋅L2L_1 \cdot L_2L1​⋅L2​ is a CFL.
        • Kleene star: If LLL is a CFL, then L∗L^*L∗ (zero or more repetitions of strings in LLL) is a CFL.
      • However, context-free languages are not closed under intersection or difference with regular languages.
    4. Normal Forms: A normal form is a specific representation of a grammar that can be useful for simplifying analysis or proving properties. The two most important normal forms for CFGs are:

      • Chomsky Normal Form (CNF): A CFG is in CNF if every production is of the form:

        • A→BCA \to BCA→BC, where BBB and CCC are non-terminals.
        • A→aA \to aA→a, where aaa is a terminal.
        • A→ϵA \to \epsilonA→ϵ (only if the language includes the empty string).
      • Greibach Normal Form (GNF): A CFG is in GNF if every production is of the form:

        • A→aαA \to a\alphaA→aα, where aaa is a terminal and α\alphaα is a (possibly empty) string of non-terminals.

    Examples of Context-Free Grammars (CFGs)

    1. Balanced Parentheses Grammar:

    Consider the grammar that generates balanced parentheses:

    V={S}V = \{ S \}V={S} Σ={(,)}\Sigma = \{ (, ) \}Σ={(,)} R={S→(S) ∣ SS ∣ ϵ}R = \{ S \to (S) \, | \, SS \, | \, \epsilon \}R={S→(S)∣SS∣ϵ} S=SS = SS=S

    This CFG generates strings like:

    • ϵ\epsilonϵ
    • ()()()
    • (())(())(())
    • ()()()()()()
    • ((()))((()))((())), etc.

    2. Arithmetic Expressions Grammar:

    Here is a simple CFG for arithmetic expressions with addition and multiplication:

    V={E}V = \{ E \}V={E} Σ={+,⋅,(,),id}\Sigma = \{ +, \cdot, (, ), id \}Σ={+,⋅,(,),id} R={E→E+E ∣ E⋅E ∣ (E) ∣ id}R = \{ E \to E + E \, | \, E \cdot E \, | \, (E) \, | \, id \}R={E→E+E∣E⋅E∣(E)∣id} S=ES = ES=E

    This grammar can generate expressions like:

    • id+idid + idid+id
    • id⋅(id+id)id \cdot (id + id)id⋅(id+id)
    • (id+id)⋅id(id + id) \cdot id(id+id)⋅id

    Applications of Context-Free Grammars

    1. Programming Languages:

      • CFGs are used to define the syntax of programming languages. For example, the syntax of expressions, statements, and blocks in most programming languages can be specified using CFGs.
    2. Compilers:

      • CFGs play a crucial role in parsing, where a source code is analyzed to check if it adheres to the grammar of the programming language.
    3. Mathematical Expressions:

      • CFGs are used to define the syntax of mathematical expressions involving operators like addition, multiplication, etc., with appropriate precedence and associativity.
    4. Natural Language Processing (NLP):

      • CFGs can be used to describe the grammatical structure of natural languages, such as sentence structures in English, though more advanced models (like dependency grammars or tree-adjoining grammars) are typically used for more complex tasks.

    Conclusion

    A Context-Free Grammar (CFG) is a formal system used to generate context-free languages (CFLs), which are essential for defining the syntax of many computational systems, especially programming languages. CFGs are composed of non-terminals, terminals, production rules, and a start symbol, and they allow recursive definitions that are ideal for expressing hierarchical structures. Understanding CFGs is foundational to fields such as compilers, formal language theory, and natural language processing.

    Previous topic 10
    Grammars and PDA
    Next topic 12
    Derivations, derivation trees and ambiguity

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time10 min
      Word count1,734
      Code examples0
      DifficultyIntermediate