ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Compiler Construction
    COMP3149
    Progress0 / 32 topics
    Topics
    1. Introduction to interpreter and compiler2. Structure of a Compiler and its Phases3. Lexical Analyzer and Input Buffering4. Specifications and Recognitions of Tokens5. Regular Expressions and Finite Automata6. Transition Table and Transition Graph7. Definitions of Grammars, Derivations, and Parse Trees8. Ambiguity, Associativity, and Precedence of Operators9. Syntax Analysis and Role of the Parser10. Eliminating Ambiguity, Left Recursion, and Left Factoring11. Top-Down Parsing and Recursive-Descent Parsing12. First and Follow Sets13. LL(1) Grammars and Non-recursive Predictive Parsing14. Bottom-Up Parsing: Reductions and Shift-Reduce Parsing15. LR Parsing and LR(0) Parsers16. LR(0) Automaton and Parsing Table17. Shift-Reduce Conflicts18. SLR(1) Parsers: Automaton and Parsing Table19. LR(1) Parsers: Automaton and Parsing Table20. LALR Parsing: Automaton and Parsing Table21. Semantic Analysis and Intermediate Code Generation22. Three Address Code23. Tasks of Semantic Analyzer and Types of Errors24. Type Checking and Environments25. Type Conversions: Implicit vs Explicit26. Back Patching and Switch Statements27. Storage Organization and Stack Allocation of Space28. Heap Management and Optimization29. Code Generation: Design of a Code Generator30. Target Language and Addresses in Target Code31. Basic Blocks and Flow Graphs32. Optimization of Basic Blocks
    COMP3149›Lexical Analyzer and Input Buffering
    Compiler ConstructionTopic 3 of 32

    Lexical Analyzer and Input Buffering

    4 minread
    693words
    Beginnerlevel

    📘 1. Lexical Analyzer (Scanner)

    ✅ Definition

    A Lexical Analyzer is the first phase of a compiler that reads the source code character by character and converts it into tokens.

    👉 It acts as an interface between source code and the parser (syntax analyzer).


    🔄 Basic Working

    📊 Diagram

    Source Code → [Lexical Analyzer] → Tokens → Syntax Analyzer
    

    📌 What are Tokens?

    A token is the smallest meaningful unit in a program.

    Example:

    int x = 10;
    

    👉 Tokens:

    int | x | = | 10 | ;
    

    🧠 Components of a Token

    Each token has:

    • Token Name (Type) → keyword, identifier, operator
    • Attribute Value → actual value or pointer

    Example:

    Token Type Attribute
    int keyword —
    x identifier pointer to symbol table
    10 number value = 10

    🔧 Functions of Lexical Analyzer

    1. Reads input characters
    2. Groups them into tokens
    3. Removes whitespace and comments
    4. Identifies keywords, identifiers, constants
    5. Adds identifiers to Symbol Table
    6. Reports lexical errors

    ⚠️ Lexical Errors

    Examples:

    • Invalid symbols → @, #
    • Wrong identifiers → 2abc
    • Unclosed strings

    ⭐ Key Points (Exam Focus)

    👉 Converts characters → tokens 👉 Ignores spaces and comments 👉 Works using patterns (regular expressions)


    🧠 2. Role of Lexical Analyzer in Compiler

    • First phase → foundation of compilation
    • Simplifies syntax analysis
    • Improves efficiency

    🔍 3. Input Buffering

    ✅ Definition

    Input Buffering is a technique used by the lexical analyzer to read input efficiently by storing it in memory buffers instead of reading one character at a time.


    ❓ Why Input Buffering is Needed?

    Reading character-by-character from file is slow.

    👉 Solution: Use buffers to read large chunks at once.


    🔄 4. Types of Input Buffering

    🔹 1. Single Buffer Scheme

    📊 Diagram

    [ Input Buffer ]
    | a | b | c | d | e | EOF |
    

    ❌ Problems

    • Frequent I/O operations
    • Slow performance
    • Difficult to handle lookahead

    🔹 2. Double Buffer Scheme (Important)

    ✅ Definition

    Uses two buffers to improve efficiency.


    📊 Diagram

    Buffer 1         Buffer 2
    [ a b c d ]      [ e f g h ]
        ↑                ↑
     forward pointer     reloaded
    

    🔧 Working (Step-by-Step)

    1. Two buffers of equal size are used
    2. Each buffer has a sentinel (EOF marker) at end
    3. Lexical analyzer reads from one buffer
    4. When buffer ends → switch to second buffer
    5. First buffer reloads while second is used

    📌 Important Concept: Sentinels

    ✅ Definition

    A sentinel is a special character (usually EOF) placed at the end of buffer to indicate buffer boundary.


    ⭐ Advantages

    • No need to check end-of-buffer repeatedly
    • Faster scanning

    🔑 Pointers Used

    • Lexeme Begin Pointer → start of token
    • Forward Pointer → scans ahead

    📊 Pointer Movement Example

    int x = 10;
    ^    ^
    LB   FP
    

    (LB = Lexeme Begin, FP = Forward Pointer)


    🧪 5. Example (Token Recognition with Buffering)

    Input:

    count = count + 1;
    

    Steps:

    1. Buffer loads input

    2. FP moves character by character

    3. Tokens identified:

      count | = | count | + | 1 | ;
      

    🎯 6. Important Exam Concepts

    👉 Frequently asked:

    • Definition of lexical analyzer
    • Functions of lexical analyzer
    • What are tokens?
    • Input buffering techniques
    • Double buffer scheme diagram
    • Sentinel concept
    • Lexeme begin & forward pointer

    📝 7. Short Notes (Quick Revision)

    Lexical Analyzer

    • First phase
    • Converts characters → tokens
    • Removes spaces/comments
    • Uses symbol table

    Input Buffering

    • Improves reading speed
    • Uses buffers instead of char-by-char reading

    Double Buffer

    • Two buffers
    • Uses sentinel
    • Efficient and fast

    📊 8. Final Summary Table

    Topic Description Key Feature Advantage Exam Importance
    Lexical Analyzer Converts source code to tokens First compiler phase Simplifies parsing Very High
    Token Smallest unit Type + attribute Easy processing High
    Input Buffering Efficient input reading Uses memory buffer Faster than char reading High
    Single Buffer One buffer Simple Slow Medium
    Double Buffer Two buffers Uses sentinel Fast, efficient Very High
    Sentinel End marker EOF symbol Avoids checks Very High
    Pointers Track positions LB & Forward Helps scanning High

    ✅ Final Conclusion

    • The Lexical Analyzer is crucial because it prepares clean input for the parser.
    • Input Buffering, especially double buffering, makes the process fast and efficient.
    • Concepts like tokens, lexemes, sentinel, and pointers are very important for exams.

    Previous topic 2
    Structure of a Compiler and its Phases
    Next topic 4
    Specifications and Recognitions of Tokens

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time4 min
      Word count693
      Code examples0
      DifficultyBeginner