⭐ Trace Scheduling
1. Definition
Trace scheduling is a compiler optimization technique used to increase instruction-level parallelism (ILP) across basic blocks by reordering instructions along likely execution paths (traces).
It is mainly used in VLIW (Very Long Instruction Word) and superscalar architectures to produce statically scheduled code that exploits parallelism beyond a single basic block.
2. Key Concepts
-
Trace
- A trace is a sequence of basic blocks that is likely to execute consecutively during program execution.
- Usually determined using profiling (branch probabilities).
-
Goal
- Schedule instructions along the entire trace to maximize parallel execution and keep functional units busy.
- Reduces pipeline stalls and idle cycles.
-
Difference from Local Scheduling
| Feature |
Local Scheduling |
Trace Scheduling |
| Scope |
Single basic block |
Across multiple blocks (trace) |
| Parallelism |
Limited (intra-block) |
Higher (inter-block) |
| Complexity |
Low |
High |
| Hardware Support |
Not needed |
Still static, VLIW-friendly |
3. Steps in Trace Scheduling
-
Trace Selection
- Identify the most likely path of execution using branch probabilities.
- Example: if
if branch taken 90% of the time, include it in trace.
-
Instruction Scheduling Along the Trace
- Reorder instructions from multiple basic blocks in the trace to exploit parallelism.
- Use pipeline hazards analysis to prevent RAW/WAR/WAW conflicts.
-
Compensating Code
- Instructions moved outside their original blocks may need extra code in other paths to maintain program correctness.
- Example: If an instruction is scheduled before a branch, add fix-up code in the path not taken.
-
Register Allocation and Finalization
- After scheduling, assign registers carefully to avoid conflicts caused by moving instructions across blocks.
4. Example
Original code:
if (x > 0) {
A = B + C;
} else {
D = E + F;
}
G = H + I;
Likely trace: if (x > 0) path
- Trace includes:
{A = B + C} → {G = H + I}
Trace scheduling:
- Instructions from the else-block
{D = E + F} may be moved or compensated in the unlikely path.
- Instructions are reordered to maximize parallel execution along the likely trace.
Result:
- More instructions executed per cycle on a VLIW or superscalar pipeline.
- Less pipeline idle time.
5. Advantages
- Exploits ILP across basic blocks → higher performance than local scheduling.
- Improves VLIW pipeline utilization.
- Static scheduling → simpler hardware (no dynamic scheduling needed).
6. Limitations
- Complex compiler algorithm – selecting traces and inserting compensating code is tricky.
- Code size increase – compensating code may duplicate instructions in unlikely paths.
- Accuracy depends on profiling – mispredicted traces can reduce performance.
- Best suited for frequently executed paths.
7. Comparison: Local Scheduling vs Trace Scheduling vs Software Pipelining
| Feature |
Local Scheduling |
Trace Scheduling |
Software Pipelining |
| Scope |
Single basic block |
Multiple basic blocks (trace) |
Single loop, across iterations |
| Parallelism |
Limited |
High (inter-block) |
High (loop iterations) |
| Complexity |
Low |
High |
Medium to High |
| Hardware Requirements |
None |
VLIW or statically scheduled |
None (compiler-based) |
| Code Size Overhead |
Low |
Medium/High (compensating code) |
Low to Medium |
8. Exam Summary
- Trace Scheduling: A compiler technique to maximize ILP across basic blocks by scheduling instructions along likely execution paths (traces).
- Key idea: Move instructions across blocks, use compensating code for correctness.
- Used in: VLIW and statically scheduled superscalar architectures.
- Advantages: High ILP, better pipeline utilization.
- Limitations: Compiler complexity, code size increase, dependent on profiling accuracy.