COMP3147›Thread-Level Parallelism: Cache coherency

Computer ArchitectureTopic 19 of 24

Thread-Level Parallelism: Cache coherency

3 minread

510words

Beginnerlevel

⭐ Thread-Level Parallelism (TLP) and Cache Coherency

1. Thread-Level Parallelism (TLP) – Definition

Thread-Level Parallelism (TLP) refers to the ability of a computer system to execute multiple threads simultaneously, typically on multiple cores or processors, to improve overall throughput.

While Instruction-Level Parallelism (ILP) exploits parallelism within a single thread, TLP exploits parallelism across multiple threads.

Key Goals of TLP:

Improve CPU utilization when some threads are stalled (e.g., memory access).
Increase overall system throughput by running multiple threads concurrently.
Exploit multi-core and multi-processor architectures.

2. Forms of TLP

Multithreading (MT)
- Fine-grained multithreading: Switch threads every cycle → hides latency of one thread.
- Coarse-grained multithreading: Switch threads on long-latency events (like cache misses).
Simultaneous Multithreading (SMT)
- Multiple threads share a single core pipeline simultaneously, issuing instructions in the same cycle.
- Example: Intel Hyper-Threading Technology.
Multi-core Processing
- Each core executes independent threads → true parallel execution.

3. Challenges in TLP

Shared resources contention: Cores or threads may compete for CPU caches, memory bandwidth, or functional units.
Synchronization: Threads often need to share data, requiring careful coordination.
Cache Coherency: Ensures data consistency across multiple caches in different cores.

4. Cache Coherency – Definition

Cache coherency is a hardware mechanism that ensures that all processors or cores see a consistent view of memory when multiple caches store copies of the same data.

Without coherency, different cores may read stale or inconsistent data, leading to incorrect program behavior.

Key Problems Addressed

Write Propagation
- Changes made by one processor must be propagated to other caches.
Transaction Ordering
- Reads and writes to the same memory location must appear in correct order to all threads.

5. Cache Coherency Protocols

a) Write-Invalidate Protocol

When a processor writes to a cache line, all other caches with a copy invalidate it.
Example: MESI Protocol (Modified, Exclusive, Shared, Invalid)

b) Write-Update (Write-Broadcast) Protocol

When a processor writes to a cache line, the new value is broadcast to other caches that have it.

c) MESI Protocol States

State	Meaning
M (Modified)	Cache has the only valid copy, memory not updated
E (Exclusive)	Cache has the only valid copy, memory is updated
S (Shared)	Cache shares the data with other caches
I (Invalid)	Cache line is invalid

6. Example Scenario

Core 1: X = 5 → modifies cache line holding X → invalidates copies in Core 2’s cache.
Core 2: Reads X → fetches updated value 5 from Core 1 or memory.

This ensures that all threads see a consistent value for X.

7. Importance of Cache Coherency in TLP

Correctness: Ensures threads see the latest memory updates.
Performance: Minimizes stalls due to stale cache reads.
Scalability: Essential for multi-core systems where each core has private caches.

8. Exam-Friendly Summary

Concept	Definition / Purpose
Thread-Level Parallelism (TLP)	Execution of multiple threads concurrently to increase throughput
Forms of TLP	Multithreading, Simultaneous Multithreading (SMT), Multi-core execution
Cache Coherency	Ensures consistent memory view across multiple caches
Protocols	MESI (Modified, Exclusive, Shared, Invalid), write-invalidate, write-update
Challenges	Synchronization, stale data, resource contention

Previous topic 18

Trace cache

Next topic 20

Sequential consistency

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

COMP3147›Thread-Level Parallelism: Cache coherency

Computer ArchitectureTopic 19 of 24

Thread-Level Parallelism: Cache coherency

3 minread

510words

Beginnerlevel

⭐ Thread-Level Parallelism (TLP) and Cache Coherency

1. Thread-Level Parallelism (TLP) – Definition

While Instruction-Level Parallelism (ILP) exploits parallelism within a single thread, TLP exploits parallelism across multiple threads.

Key Goals of TLP:

Improve CPU utilization when some threads are stalled (e.g., memory access).
Increase overall system throughput by running multiple threads concurrently.
Exploit multi-core and multi-processor architectures.

2. Forms of TLP

Multithreading (MT)
- Fine-grained multithreading: Switch threads every cycle → hides latency of one thread.
- Coarse-grained multithreading: Switch threads on long-latency events (like cache misses).
Simultaneous Multithreading (SMT)
- Multiple threads share a single core pipeline simultaneously, issuing instructions in the same cycle.
- Example: Intel Hyper-Threading Technology.
Multi-core Processing
- Each core executes independent threads → true parallel execution.

3. Challenges in TLP

Shared resources contention: Cores or threads may compete for CPU caches, memory bandwidth, or functional units.
Synchronization: Threads often need to share data, requiring careful coordination.
Cache Coherency: Ensures data consistency across multiple caches in different cores.

4. Cache Coherency – Definition

Cache coherency is a hardware mechanism that ensures that all processors or cores see a consistent view of memory when multiple caches store copies of the same data.

Without coherency, different cores may read stale or inconsistent data, leading to incorrect program behavior.

Key Problems Addressed

Write Propagation
- Changes made by one processor must be propagated to other caches.
Transaction Ordering
- Reads and writes to the same memory location must appear in correct order to all threads.

5. Cache Coherency Protocols

a) Write-Invalidate Protocol

When a processor writes to a cache line, all other caches with a copy invalidate it.
Example: MESI Protocol (Modified, Exclusive, Shared, Invalid)

b) Write-Update (Write-Broadcast) Protocol

When a processor writes to a cache line, the new value is broadcast to other caches that have it.

c) MESI Protocol States

State	Meaning
M (Modified)	Cache has the only valid copy, memory not updated
E (Exclusive)	Cache has the only valid copy, memory is updated
S (Shared)	Cache shares the data with other caches
I (Invalid)	Cache line is invalid

6. Example Scenario

Core 1: X = 5 → modifies cache line holding X → invalidates copies in Core 2’s cache.
Core 2: Reads X → fetches updated value 5 from Core 1 or memory.

This ensures that all threads see a consistent value for X.

7. Importance of Cache Coherency in TLP

Correctness: Ensures threads see the latest memory updates.
Performance: Minimizes stalls due to stale cache reads.
Scalability: Essential for multi-core systems where each core has private caches.

8. Exam-Friendly Summary

Concept	Definition / Purpose
Thread-Level Parallelism (TLP)	Execution of multiple threads concurrently to increase throughput
Forms of TLP	Multithreading, Simultaneous Multithreading (SMT), Multi-core execution
Cache Coherency	Ensures consistent memory view across multiple caches
Protocols	MESI (Modified, Exclusive, Shared, Invalid), write-invalidate, write-update
Challenges	Synchronization, stale data, resource contention

Previous topic 18

Trace cache

Next topic 20

Sequential consistency

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.