DC-221›General Principles of Pipelining

Computer Organization and Assembly LanguageTopic 34 of 35

General Principles of Pipelining

8 minread

1,323words

Intermediatelevel

General Principles of Pipelining

Pipelining is a technique used in computer architecture to improve the performance and efficiency of a processor. It allows the processor to work on multiple instructions simultaneously by dividing the processing of instructions into distinct stages. Each stage performs a part of the overall instruction processing, and different instructions can be processed in different stages at the same time, similar to an assembly line in a factory. This way, pipelining increases instruction throughput—the number of instructions completed per unit of time—without speeding up the execution of individual instructions.

To better understand the general principles of pipelining, let’s break it down:

1. Pipelining Basics

Pipelining works by dividing instruction execution into several stages, with each stage performing a different task. For example, a common breakdown of stages is as follows:

Fetch (IF): Retrieve the instruction from memory.
Decode (ID): Decode the instruction and identify the operands.
Execute (EX): Perform the operation (arithmetic, logic, etc.) using the ALU (Arithmetic Logic Unit).
Memory (MEM): Access memory to load or store data (if necessary).
Write-Back (WB): Write the result of the instruction back to a register.

In a pipelined processor, instead of waiting for one instruction to fully complete before starting the next, the processor fetches the next instruction as soon as the current instruction moves to the next stage. Multiple instructions are processed in parallel, but each instruction is in a different stage.

2. Analogy: Assembly Line

Think of pipelining like a car assembly line:

Each stage of the assembly line works on a different part of the car, such as the chassis, engine, and paint job.
A different car is at each stage of the assembly process simultaneously.
This means that while one car is getting its engine installed, another car might be getting its paint, and a third car might be having its wheels attached.

Similarly, in pipelining, the processor works on several instructions at once, each in a different stage of execution.

3. Key Principles of Pipelining

3.1. Instruction-Level Parallelism (ILP)

Pipelining is an example of instruction-level parallelism (ILP), where multiple instructions are executed at different stages in parallel. The more ILP a program has, the more efficiently pipelining can improve performance. Programs with many independent instructions benefit the most from pipelining.

3.2. Pipeline Stages

Each instruction is split into several stages, and each stage is processed by a different piece of hardware. Typical stages include:

Fetch (IF): Load an instruction from memory.
Decode (ID): Figure out what the instruction does.
Execute (EX): Perform the operation (e.g., addition, subtraction, memory access).
Memory (MEM): Load or store data in memory.
Write-Back (WB): Write results to a register.

These stages often have equal lengths to keep the pipeline balanced, but some instructions might require more time in one stage, leading to complexities.

3.3. Throughput vs. Latency

Throughput: The number of instructions that can be completed in a given amount of time. Pipelining increases throughput because multiple instructions are processed at once.
Latency: The time it takes for a single instruction to go through all stages of the pipeline. Pipelining doesn’t reduce the latency of an individual instruction—it still takes the same time for an instruction to complete all stages—but more instructions can be completed per time unit.

4. Pipeline Hazards

One of the challenges in pipelining is dealing with pipeline hazards, which occur when the flow of instructions is disrupted. There are three main types of hazards:

4.1. Structural Hazards

These occur when two or more instructions need the same hardware resource at the same time. For example, if the processor has only one memory access port, and two instructions try to access memory simultaneously, this causes a structural hazard.

Solution: Adding more hardware resources (like separate memory access ports) can reduce structural hazards.

4.2. Data Hazards

Data hazards happen when one instruction depends on the result of a previous instruction that hasn’t finished yet.

Example:

add %eax, %ebx   # Instruction 1
sub %ecx, %eax   # Instruction 2 depends on the result of Instruction 1

Instruction 2 needs the result of add %eax, %ebx, but the addition hasn't completed by the time the second instruction begins.

Solution:

Forwarding (bypassing): This technique allows the result of an operation to be forwarded to a later instruction before it’s written to a register.
Stalling: Temporarily delaying (stalling) an instruction until the necessary data is available.

4.3. Control Hazards

Control hazards occur when the processor encounters a branch (e.g., an if statement or jump) and doesn’t know which instruction to fetch next.

Example:

cmp %eax, %ebx   # Compare
jeq label        # Jump to label if %eax == %ebx

Until the cmp instruction is completed, the processor doesn't know whether to fetch the next sequential instruction or jump to the target label.

Solution:

Branch Prediction: The processor guesses (predicts) the outcome of a branch to keep the pipeline full. If the prediction is wrong, it discards the incorrect instructions and fetches the correct ones (this is called flushing the pipeline).
Stalling: The processor can also pause fetching new instructions until the branch is resolved, but this leads to lost performance.

5. Performance Considerations

While pipelining can significantly increase instruction throughput, it introduces certain complexities:

5.1. Pipeline Depth

The number of stages in a pipeline is referred to as pipeline depth. A deeper pipeline (with more stages) allows higher throughput because more instructions can be in different stages at once. However, deeper pipelines also increase the impact of hazards.

5.2. Pipeline Balancing

To maximize efficiency, the time it takes to complete each pipeline stage should be roughly equal. If one stage takes significantly longer than others, it can delay the entire process. This is called a pipeline imbalance.

5.3. Pipeline Flushes

If a branch prediction or data forwarding fails, the pipeline may need to be flushed, meaning partially completed instructions are discarded, and the processor starts again with the correct instructions. Flushing the pipeline reduces performance temporarily.

6. Example of Pipelining in Action

Imagine executing the following set of instructions in a simple 5-stage pipeline:

1. add %eax, %ebx   # Fetch -> Decode -> Execute -> Mem -> Write-back
2. sub %ecx, %edx   # Fetch -> Decode -> Execute -> Mem -> Write-back
3. mul %eax, %ebx   # Fetch -> Decode -> Execute -> Mem -> Write-back

In a sequential processor, each instruction would execute one after the other, meaning you’d have to wait for add to complete all its stages before starting sub.

In a pipelined processor, the second instruction (sub) can be fetched while the first instruction (add) is in the decode stage, and so on. By the time the first instruction is in the execute stage, the second one is in decode, and the third one is in fetch, allowing them to overlap.

7. Advantages of Pipelining

Increased Throughput: Pipelining increases the number of instructions that can be completed per unit of time.
Efficient Use of Hardware: All parts of the processor are busy working on different stages of multiple instructions.
Scalability: Pipelining can be extended by adding more stages, increasing instruction throughput.

8. Disadvantages of Pipelining

Complexity: Pipelines introduce complexity in terms of handling hazards (data, control, and structural) and balancing the stages.
Pipeline Stalls: Hazards can cause delays (stalls) or require flushing, which reduces performance.
Branch Prediction Failures: Incorrect branch predictions waste time by causing pipeline flushes.

Conclusion

Pipelining is a powerful technique used to improve the performance of processors by allowing multiple instructions to be processed at once, each in a different stage of execution. While it significantly boosts throughput, it also introduces complexities such as hazards, stalls, and the need for techniques like forwarding and branch prediction. Understanding these principles helps developers and engineers design efficient systems and optimize software performance on modern processors.

Previous topic 33

Sequential Y86 Implementations

Next topic 35

Pipelined Y86 Implementations

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

1. add %eax, %ebx # Fetch -> Decode -> Execute -> Mem -> Write-back 2. sub %ecx, %edx # Fetch -> Decode -> Execute -> Mem -> Write-back 3. mul %eax, %ebx # Fetch -> Decode -> Execute -> Mem -> Write-back