COMP3149›Lexical Analyzer and Input Buffering

Compiler ConstructionTopic 3 of 32

Lexical Analyzer and Input Buffering

4 minread

693words

Beginnerlevel

📘 1. Lexical Analyzer (Scanner)

✅ Definition

A Lexical Analyzer is the first phase of a compiler that reads the source code character by character and converts it into tokens.

👉 It acts as an interface between source code and the parser (syntax analyzer).

🔄 Basic Working

📊 Diagram

Source Code → [Lexical Analyzer] → Tokens → Syntax Analyzer

📌 What are Tokens?

A token is the smallest meaningful unit in a program.

Example:

int x = 10;

👉 Tokens:

int | x | = | 10 | ;

🧠 Components of a Token

Each token has:

Token Name (Type) → keyword, identifier, operator
Attribute Value → actual value or pointer

Example:

Token	Type	Attribute
int	keyword	—
x	identifier	pointer to symbol table
10	number	value = 10

🔧 Functions of Lexical Analyzer

Reads input characters
Groups them into tokens
Removes whitespace and comments
Identifies keywords, identifiers, constants
Adds identifiers to Symbol Table
Reports lexical errors

⚠️ Lexical Errors

Examples:

Invalid symbols → @, #
Wrong identifiers → 2abc
Unclosed strings

⭐ Key Points (Exam Focus)

👉 Converts characters → tokens 👉 Ignores spaces and comments 👉 Works using patterns (regular expressions)

🧠 2. Role of Lexical Analyzer in Compiler

First phase → foundation of compilation
Simplifies syntax analysis
Improves efficiency

🔍 3. Input Buffering

✅ Definition

Input Buffering is a technique used by the lexical analyzer to read input efficiently by storing it in memory buffers instead of reading one character at a time.

❓ Why Input Buffering is Needed?

Reading character-by-character from file is slow.

👉 Solution: Use buffers to read large chunks at once.

🔄 4. Types of Input Buffering

🔹 1. Single Buffer Scheme

📊 Diagram

[ Input Buffer ]
| a | b | c | d | e | EOF |

❌ Problems

Frequent I/O operations
Slow performance
Difficult to handle lookahead

🔹 2. Double Buffer Scheme (Important)

✅ Definition

Uses two buffers to improve efficiency.

📊 Diagram

Buffer 1         Buffer 2
[ a b c d ]      [ e f g h ]
    ↑                ↑
 forward pointer     reloaded

🔧 Working (Step-by-Step)

Two buffers of equal size are used
Each buffer has a sentinel (EOF marker) at end
Lexical analyzer reads from one buffer
When buffer ends → switch to second buffer
First buffer reloads while second is used

📌 Important Concept: Sentinels

✅ Definition

A sentinel is a special character (usually EOF) placed at the end of buffer to indicate buffer boundary.

⭐ Advantages

No need to check end-of-buffer repeatedly
Faster scanning

🔑 Pointers Used

Lexeme Begin Pointer → start of token
Forward Pointer → scans ahead

📊 Pointer Movement Example

int x = 10;
^    ^
LB   FP

(LB = Lexeme Begin, FP = Forward Pointer)

🧪 5. Example (Token Recognition with Buffering)

Input:

count = count + 1;

Steps:

Buffer loads input
FP moves character by character
Tokens identified:
```
count | = | count | + | 1 | ;
```

🎯 6. Important Exam Concepts

👉 Frequently asked:

Definition of lexical analyzer
Functions of lexical analyzer
What are tokens?
Input buffering techniques
Double buffer scheme diagram
Sentinel concept
Lexeme begin & forward pointer

📝 7. Short Notes (Quick Revision)

Lexical Analyzer

First phase
Converts characters → tokens
Removes spaces/comments
Uses symbol table

Input Buffering

Improves reading speed
Uses buffers instead of char-by-char reading

Double Buffer

Two buffers
Uses sentinel
Efficient and fast

📊 8. Final Summary Table

Topic	Description	Key Feature	Advantage	Exam Importance
Lexical Analyzer	Converts source code to tokens	First compiler phase	Simplifies parsing	Very High
Token	Smallest unit	Type + attribute	Easy processing	High
Input Buffering	Efficient input reading	Uses memory buffer	Faster than char reading	High
Single Buffer	One buffer	Simple	Slow	Medium
Double Buffer	Two buffers	Uses sentinel	Fast, efficient	Very High
Sentinel	End marker	EOF symbol	Avoids checks	Very High
Pointers	Track positions	LB & Forward	Helps scanning	High

✅ Final Conclusion

The Lexical Analyzer is crucial because it prepares clean input for the parser.
Input Buffering, especially double buffering, makes the process fast and efficient.
Concepts like tokens, lexemes, sentinel, and pointers are very important for exams.

Previous topic 2

Structure of a Compiler and its Phases

Next topic 4

Specifications and Recognitions of Tokens

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

Token

Type

Attribute

int

keyword

—

identifier

pointer to symbol table

number

value = 10

📊 8. Final Summary Table

Topic

Description

Key Feature

Advantage

Exam Importance

Lexical Analyzer

Converts source code to tokens

First compiler phase

Simplifies parsing

Very High

Token

Smallest unit

Type + attribute

Easy processing

High

Input Buffering

Efficient input reading

Uses memory buffer

Faster than char reading

High

Single Buffer

One buffer

Simple

Slow

Medium

Double Buffer

Two buffers

Uses sentinel

Fast, efficient

Very High

Sentinel

End marker

EOF symbol

Avoids checks

Very High

Pointers

Track positions

LB & Forward

Helps scanning

High