📘 1. Lexical Analyzer (Scanner)
✅ Definition
A Lexical Analyzer is the first phase of a compiler that reads the source code character by character and converts it into tokens.
👉 It acts as an interface between source code and the parser (syntax analyzer).
🔄 Basic Working
📊 Diagram
Source Code → [Lexical Analyzer] → Tokens → Syntax Analyzer
📌 What are Tokens?
A token is the smallest meaningful unit in a program.
Example:
int x = 10;
👉 Tokens:
int | x | = | 10 | ;
🧠 Components of a Token
Each token has:
- Token Name (Type) → keyword, identifier, operator
- Attribute Value → actual value or pointer
Example:
| Token |
Type |
Attribute |
| int |
keyword |
— |
| x |
identifier |
pointer to symbol table |
| 10 |
number |
value = 10 |
🔧 Functions of Lexical Analyzer
- Reads input characters
- Groups them into tokens
- Removes whitespace and comments
- Identifies keywords, identifiers, constants
- Adds identifiers to Symbol Table
- Reports lexical errors
⚠️ Lexical Errors
Examples:
- Invalid symbols →
@, #
- Wrong identifiers →
2abc
- Unclosed strings
⭐ Key Points (Exam Focus)
👉 Converts characters → tokens
👉 Ignores spaces and comments
👉 Works using patterns (regular expressions)
🧠 2. Role of Lexical Analyzer in Compiler
- First phase → foundation of compilation
- Simplifies syntax analysis
- Improves efficiency
🔍 3. Input Buffering
✅ Definition
Input Buffering is a technique used by the lexical analyzer to read input efficiently by storing it in memory buffers instead of reading one character at a time.
❓ Why Input Buffering is Needed?
Reading character-by-character from file is slow.
👉 Solution: Use buffers to read large chunks at once.
🔄 4. Types of Input Buffering
🔹 1. Single Buffer Scheme
📊 Diagram
[ Input Buffer ]
| a | b | c | d | e | EOF |
❌ Problems
- Frequent I/O operations
- Slow performance
- Difficult to handle lookahead
🔹 2. Double Buffer Scheme (Important)
✅ Definition
Uses two buffers to improve efficiency.
📊 Diagram
Buffer 1 Buffer 2
[ a b c d ] [ e f g h ]
↑ ↑
forward pointer reloaded
🔧 Working (Step-by-Step)
- Two buffers of equal size are used
- Each buffer has a sentinel (EOF marker) at end
- Lexical analyzer reads from one buffer
- When buffer ends → switch to second buffer
- First buffer reloads while second is used
📌 Important Concept: Sentinels
✅ Definition
A sentinel is a special character (usually EOF) placed at the end of buffer to indicate buffer boundary.
⭐ Advantages
- No need to check end-of-buffer repeatedly
- Faster scanning
🔑 Pointers Used
- Lexeme Begin Pointer → start of token
- Forward Pointer → scans ahead
📊 Pointer Movement Example
int x = 10;
^ ^
LB FP
(LB = Lexeme Begin, FP = Forward Pointer)
🧪 5. Example (Token Recognition with Buffering)
Input:
count = count + 1;
Steps:
-
Buffer loads input
-
FP moves character by character
-
Tokens identified:
count | = | count | + | 1 | ;
🎯 6. Important Exam Concepts
👉 Frequently asked:
- Definition of lexical analyzer
- Functions of lexical analyzer
- What are tokens?
- Input buffering techniques
- Double buffer scheme diagram
- Sentinel concept
- Lexeme begin & forward pointer
📝 7. Short Notes (Quick Revision)
Lexical Analyzer
- First phase
- Converts characters → tokens
- Removes spaces/comments
- Uses symbol table
Input Buffering
- Improves reading speed
- Uses buffers instead of char-by-char reading
Double Buffer
- Two buffers
- Uses sentinel
- Efficient and fast
📊 8. Final Summary Table
| Topic |
Description |
Key Feature |
Advantage |
Exam Importance |
| Lexical Analyzer |
Converts source code to tokens |
First compiler phase |
Simplifies parsing |
Very High |
| Token |
Smallest unit |
Type + attribute |
Easy processing |
High |
| Input Buffering |
Efficient input reading |
Uses memory buffer |
Faster than char reading |
High |
| Single Buffer |
One buffer |
Simple |
Slow |
Medium |
| Double Buffer |
Two buffers |
Uses sentinel |
Fast, efficient |
Very High |
| Sentinel |
End marker |
EOF symbol |
Avoids checks |
Very High |
| Pointers |
Track positions |
LB & Forward |
Helps scanning |
High |
✅ Final Conclusion
- The Lexical Analyzer is crucial because it prepares clean input for the parser.
- Input Buffering, especially double buffering, makes the process fast and efficient.
- Concepts like tokens, lexemes, sentinel, and pointers are very important for exams.