Pipelining is a technique used in computer architecture to improve the performance and efficiency of a processor. It allows the processor to work on multiple instructions simultaneously by dividing the processing of instructions into distinct stages. Each stage performs a part of the overall instruction processing, and different instructions can be processed in different stages at the same time, similar to an assembly line in a factory. This way, pipelining increases instruction throughput—the number of instructions completed per unit of time—without speeding up the execution of individual instructions.
To better understand the general principles of pipelining, let’s break it down:
Pipelining works by dividing instruction execution into several stages, with each stage performing a different task. For example, a common breakdown of stages is as follows:
In a pipelined processor, instead of waiting for one instruction to fully complete before starting the next, the processor fetches the next instruction as soon as the current instruction moves to the next stage. Multiple instructions are processed in parallel, but each instruction is in a different stage.
Think of pipelining like a car assembly line:
Similarly, in pipelining, the processor works on several instructions at once, each in a different stage of execution.
Pipelining is an example of instruction-level parallelism (ILP), where multiple instructions are executed at different stages in parallel. The more ILP a program has, the more efficiently pipelining can improve performance. Programs with many independent instructions benefit the most from pipelining.
Each instruction is split into several stages, and each stage is processed by a different piece of hardware. Typical stages include:
These stages often have equal lengths to keep the pipeline balanced, but some instructions might require more time in one stage, leading to complexities.
One of the challenges in pipelining is dealing with pipeline hazards, which occur when the flow of instructions is disrupted. There are three main types of hazards:
These occur when two or more instructions need the same hardware resource at the same time. For example, if the processor has only one memory access port, and two instructions try to access memory simultaneously, this causes a structural hazard.
Solution: Adding more hardware resources (like separate memory access ports) can reduce structural hazards.
Data hazards happen when one instruction depends on the result of a previous instruction that hasn’t finished yet.
Example:
add %eax, %ebx # Instruction 1
sub %ecx, %eax # Instruction 2 depends on the result of Instruction 1
Instruction 2 needs the result of add %eax, %ebx, but the addition hasn't completed by the time the second instruction begins.
Solution:
Control hazards occur when the processor encounters a branch (e.g., an if statement or jump) and doesn’t know which instruction to fetch next.
Example:
cmp %eax, %ebx # Compare
jeq label # Jump to label if %eax == %ebx
Until the cmp instruction is completed, the processor doesn't know whether to fetch the next sequential instruction or jump to the target label.
Solution:
While pipelining can significantly increase instruction throughput, it introduces certain complexities:
The number of stages in a pipeline is referred to as pipeline depth. A deeper pipeline (with more stages) allows higher throughput because more instructions can be in different stages at once. However, deeper pipelines also increase the impact of hazards.
To maximize efficiency, the time it takes to complete each pipeline stage should be roughly equal. If one stage takes significantly longer than others, it can delay the entire process. This is called a pipeline imbalance.
If a branch prediction or data forwarding fails, the pipeline may need to be flushed, meaning partially completed instructions are discarded, and the processor starts again with the correct instructions. Flushing the pipeline reduces performance temporarily.
Imagine executing the following set of instructions in a simple 5-stage pipeline:
1. add %eax, %ebx # Fetch -> Decode -> Execute -> Mem -> Write-back
2. sub %ecx, %edx # Fetch -> Decode -> Execute -> Mem -> Write-back
3. mul %eax, %ebx # Fetch -> Decode -> Execute -> Mem -> Write-back
In a sequential processor, each instruction would execute one after the other, meaning you’d have to wait for add to complete all its stages before starting sub.
In a pipelined processor, the second instruction (sub) can be fetched while the first instruction (add) is in the decode stage, and so on. By the time the first instruction is in the execute stage, the second one is in decode, and the third one is in fetch, allowing them to overlap.
Pipelining is a powerful technique used to improve the performance of processors by allowing multiple instructions to be processed at once, each in a different stage of execution. While it significantly boosts throughput, it also introduces complexities such as hazards, stalls, and the need for techniques like forwarding and branch prediction. Understanding these principles helps developers and engineers design efficient systems and optimize software performance on modern processors.
Open this section to load past papers