COMP3147›VLIW Pipelines: Local scheduling

Computer ArchitectureTopic 10 of 24

VLIW Pipelines: Local scheduling

4 minread

622words

Beginnerlevel

⭐ VLIW Pipelines: Local Scheduling

1. What is VLIW?

VLIW (Very Long Instruction Word) is a CPU architecture that allows multiple independent operations to be encoded in a single long instruction word.

Each VLIW instruction contains multiple operations (like ALU ops, load/store ops, or branches).
All operations in the same VLIW instruction execute in parallel.

Example: A VLIW instruction may contain 4 independent operations:

ADD R1, R2, R3 | MUL R4, R5, R6 | LOAD R7, 0(R8) | STORE R9, 0(R10)

Key point: VLIW relies on compiler scheduling, not hardware, to extract parallelism.

⭐ 2. What is Local Scheduling in VLIW?

Definition:

Local scheduling is the compiler’s technique of arranging instructions within a single basic block to maximize parallel execution in a VLIW pipeline while avoiding hazards.

Focuses on instruction-level parallelism (ILP) within a block of straight-line code.
Ensures that dependent instructions are ordered correctly, and independent instructions are grouped into the same VLIW instruction.

Local scheduling is “local” because it only considers one basic block at a time, not the entire program.

⭐ 3. Why Local Scheduling is Needed

VLIW pipelines execute multiple operations per instruction. To fully utilize the functional units:

Independent instructions must be packed together.
Dependent instructions must be separated to avoid stalls.
Pipeline hazards (RAW, structural, control) must be handled at compile-time.

Without local scheduling:

Many functional units remain idle
Performance is wasted

⭐ 4. Steps in Local Scheduling

Analyze instruction dependencies
- RAW (Read After Write)
- WAW (Write After Write)
- WAR (Write After Read)
Identify functional units available in the VLIW machine
- Example: ALU, FP-MUL, Load/Store, Branch
Pack independent instructions into one VLIW instruction
- Assign each instruction to a free functional unit
Insert NOPs if no independent instruction is available
- Avoids hazards when parallel slots are empty

⭐ 5. Example of Local Scheduling

Original code (basic block)

I1: R1 = R2 + R3
I2: R4 = R5 * R6
I3: R7 = R1 - R8
I4: R9 = R10 + R11

Dependencies

I3 depends on I1 → cannot execute in same VLIW word.
I1, I2, and I4 are independent → can execute together.

Scheduled VLIW Instructions

VLIW Instruction	Functional Units
VLIW1	I1 (ALU)	I2 (MUL)	I4 (ALU)	NOP
VLIW2	I3 (ALU)	NOP	NOP	NOP

I1, I2, I4 execute in parallel
I3 executes after I1 is complete

Local scheduling ensures parallelism is exploited without violating dependencies.

⭐ 6. Characteristics of Local Scheduling in VLIW

Compiler-driven: Hardware does not perform dynamic scheduling.
Intra-block only: Only instructions in the same basic block are considered.
Hazard-free: Compiler avoids RAW/WAR/WAW hazards.
Slot utilization: Maximizes the use of functional units per VLIW instruction.
May insert NOPs: When insufficient independent instructions exist.

⭐ 7. Local Scheduling vs Global Scheduling

Feature	Local Scheduling	Global Scheduling
Scope	Single basic block	Across multiple blocks
Complexity	Low	High
Performance gain	Moderate	High (more ILP)
Hazard handling	Easy (within block)	Hard (requires analysis across blocks)
NOP insertion	Common	Less common

⭐ 8. Advantages of Local Scheduling in VLIW

Exploits instruction-level parallelism
Reduces idle functional units
Ensures hazard-free execution
Simplifies hardware: no dynamic scheduling needed

⭐ 9. Limitations

Cannot exploit inter-block parallelism (needs global scheduling)
Dependent on program structure; if many dependent instructions → low parallelism
May introduce NOPs, reducing efficiency

⭐ 10. Exam-Focused Summary

VLIW (Very Long Instruction Word): Multiple operations per instruction, executed in parallel.
Local scheduling: Compiler reorders instructions within a basic block to maximize parallel execution and avoid hazards.
Goal: Fill all functional units in a VLIW instruction with independent instructions.
Steps: Analyze dependencies → assign instructions to functional units → insert NOPs if needed.
Advantage: High performance with simple hardware.
Limitation: Only exploits parallelism within a basic block.

Previous topic 9

Pipeline performance analysis

Next topic 11

Loop unrolling and Software pipelining

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

COMP3147›VLIW Pipelines: Local scheduling

Computer ArchitectureTopic 10 of 24

VLIW Pipelines: Local scheduling

4 minread

622words

Beginnerlevel

⭐ VLIW Pipelines: Local Scheduling

1. What is VLIW?

VLIW (Very Long Instruction Word) is a CPU architecture that allows multiple independent operations to be encoded in a single long instruction word.

Each VLIW instruction contains multiple operations (like ALU ops, load/store ops, or branches).
All operations in the same VLIW instruction execute in parallel.

Example: A VLIW instruction may contain 4 independent operations:

ADD R1, R2, R3 | MUL R4, R5, R6 | LOAD R7, 0(R8) | STORE R9, 0(R10)

Key point: VLIW relies on compiler scheduling, not hardware, to extract parallelism.

⭐ 2. What is Local Scheduling in VLIW?

Definition:

Local scheduling is the compiler’s technique of arranging instructions within a single basic block to maximize parallel execution in a VLIW pipeline while avoiding hazards.

Focuses on instruction-level parallelism (ILP) within a block of straight-line code.
Ensures that dependent instructions are ordered correctly, and independent instructions are grouped into the same VLIW instruction.

Local scheduling is “local” because it only considers one basic block at a time, not the entire program.

⭐ 3. Why Local Scheduling is Needed

VLIW pipelines execute multiple operations per instruction. To fully utilize the functional units:

Independent instructions must be packed together.
Dependent instructions must be separated to avoid stalls.
Pipeline hazards (RAW, structural, control) must be handled at compile-time.

Without local scheduling:

Many functional units remain idle
Performance is wasted

⭐ 4. Steps in Local Scheduling

Analyze instruction dependencies
- RAW (Read After Write)
- WAW (Write After Write)
- WAR (Write After Read)
Identify functional units available in the VLIW machine
- Example: ALU, FP-MUL, Load/Store, Branch
Pack independent instructions into one VLIW instruction
- Assign each instruction to a free functional unit
Insert NOPs if no independent instruction is available
- Avoids hazards when parallel slots are empty

⭐ 5. Example of Local Scheduling

Original code (basic block)

I1: R1 = R2 + R3
I2: R4 = R5 * R6
I3: R7 = R1 - R8
I4: R9 = R10 + R11

Dependencies

I3 depends on I1 → cannot execute in same VLIW word.
I1, I2, and I4 are independent → can execute together.

Scheduled VLIW Instructions

VLIW Instruction	Functional Units
VLIW1	I1 (ALU)	I2 (MUL)	I4 (ALU)	NOP
VLIW2	I3 (ALU)	NOP	NOP	NOP

I1, I2, I4 execute in parallel
I3 executes after I1 is complete

Local scheduling ensures parallelism is exploited without violating dependencies.

⭐ 6. Characteristics of Local Scheduling in VLIW

Compiler-driven: Hardware does not perform dynamic scheduling.
Intra-block only: Only instructions in the same basic block are considered.
Hazard-free: Compiler avoids RAW/WAR/WAW hazards.
Slot utilization: Maximizes the use of functional units per VLIW instruction.
May insert NOPs: When insufficient independent instructions exist.

⭐ 7. Local Scheduling vs Global Scheduling

Feature	Local Scheduling	Global Scheduling
Scope	Single basic block	Across multiple blocks
Complexity	Low	High
Performance gain	Moderate	High (more ILP)
Hazard handling	Easy (within block)	Hard (requires analysis across blocks)
NOP insertion	Common	Less common

⭐ 8. Advantages of Local Scheduling in VLIW

Exploits instruction-level parallelism
Reduces idle functional units
Ensures hazard-free execution
Simplifies hardware: no dynamic scheduling needed

⭐ 9. Limitations

Cannot exploit inter-block parallelism (needs global scheduling)
Dependent on program structure; if many dependent instructions → low parallelism
May introduce NOPs, reducing efficiency

⭐ 10. Exam-Focused Summary

VLIW (Very Long Instruction Word): Multiple operations per instruction, executed in parallel.
Local scheduling: Compiler reorders instructions within a basic block to maximize parallel execution and avoid hazards.
Goal: Fill all functional units in a VLIW instruction with independent instructions.
Steps: Analyze dependencies → assign instructions to functional units → insert NOPs if needed.
Advantage: High performance with simple hardware.
Limitation: Only exploits parallelism within a basic block.

Previous topic 9

Pipeline performance analysis

Next topic 11

Loop unrolling and Software pipelining

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.