Loop unrolling is a compiler optimization technique where the body of a loop is replicated multiple times to reduce the overhead of branch instructions and increase instruction-level parallelism (ILP).
Essentially, the compiler transforms a loop to do multiple iterations in a single loop pass.
for (i = 0; i < 8; i++) {
A[i] = B[i] + C[i];
}
for (i = 0; i < 8; i += 2) {
A[i] = B[i] + C[i];
A[i+1] = B[i+1] + C[i+1];
}
for (i = 0; i < 8; i += 4) {
A[i] = B[i] + C[i];
A[i+1] = B[i+1] + C[i+1];
A[i+2] = B[i+2] + C[i+2];
A[i+3] = B[i+3] + C[i+3];
}
Benefits:
Trade-offs:
Software pipelining is a compiler technique that reorders instructions from different iterations of a loop to maximize parallelism and pipeline utilization.
Think of it as overlapping successive loop iterations, similar to how hardware pipelines overlap stages of instructions.
Instead of executing one iteration at a time:
Goal: Achieve steady-state execution, where each clock cycle executes some part of an iteration without stalls.
for (i = 0; i < N; i++) {
load R1, A[i]
add R2, R1, #1
store B[i], R2
}
| Cycle | Instruction |
|---|---|
| 1 | load A[0] |
| 2 | add R2, R1, #1 (i=0), load A[1] |
| 3 | store B[0], R2; add R2, R1, #1 (i=1); load A[2] |
| 4 | store B[1], R2; add R2, R1, #1 (i=2); load A[3] |
| … | … |
Explanation:
| Feature | Loop Unrolling | Software Pipelining |
|---|---|---|
| Purpose | Reduce branch overhead, increase ILP | Overlap iterations to improve pipeline utilization |
| How it works | Duplicate loop body multiple times | Reorder instructions from different iterations |
| Effect on pipeline | More instructions per iteration | Keeps pipeline stages busy continuously |
| Code size | Increases significantly | Slightly increases, often less than unrolling |
| Dependency handling | Within iteration only | Across iterations |
Open this section to load past papers