⭐ Thread-Level Parallelism (TLP) and Cache Coherency
1. Thread-Level Parallelism (TLP) – Definition
Thread-Level Parallelism (TLP) refers to the ability of a computer system to execute multiple threads simultaneously, typically on multiple cores or processors, to improve overall throughput.
While Instruction-Level Parallelism (ILP) exploits parallelism within a single thread, TLP exploits parallelism across multiple threads.
Key Goals of TLP:
- Improve CPU utilization when some threads are stalled (e.g., memory access).
- Increase overall system throughput by running multiple threads concurrently.
- Exploit multi-core and multi-processor architectures.
2. Forms of TLP
-
Multithreading (MT)
- Fine-grained multithreading: Switch threads every cycle → hides latency of one thread.
- Coarse-grained multithreading: Switch threads on long-latency events (like cache misses).
-
Simultaneous Multithreading (SMT)
- Multiple threads share a single core pipeline simultaneously, issuing instructions in the same cycle.
- Example: Intel Hyper-Threading Technology.
-
Multi-core Processing
- Each core executes independent threads → true parallel execution.
3. Challenges in TLP
- Shared resources contention: Cores or threads may compete for CPU caches, memory bandwidth, or functional units.
- Synchronization: Threads often need to share data, requiring careful coordination.
- Cache Coherency: Ensures data consistency across multiple caches in different cores.
4. Cache Coherency – Definition
Cache coherency is a hardware mechanism that ensures that all processors or cores see a consistent view of memory when multiple caches store copies of the same data.
Without coherency, different cores may read stale or inconsistent data, leading to incorrect program behavior.
Key Problems Addressed
-
Write Propagation
- Changes made by one processor must be propagated to other caches.
-
Transaction Ordering
- Reads and writes to the same memory location must appear in correct order to all threads.
5. Cache Coherency Protocols
a) Write-Invalidate Protocol
- When a processor writes to a cache line, all other caches with a copy invalidate it.
- Example: MESI Protocol (Modified, Exclusive, Shared, Invalid)
b) Write-Update (Write-Broadcast) Protocol
- When a processor writes to a cache line, the new value is broadcast to other caches that have it.
c) MESI Protocol States
| State |
Meaning |
| M (Modified) |
Cache has the only valid copy, memory not updated |
| E (Exclusive) |
Cache has the only valid copy, memory is updated |
| S (Shared) |
Cache shares the data with other caches |
| I (Invalid) |
Cache line is invalid |
6. Example Scenario
- Core 1:
X = 5 → modifies cache line holding X → invalidates copies in Core 2’s cache.
- Core 2: Reads X → fetches updated value 5 from Core 1 or memory.
This ensures that all threads see a consistent value for X.
7. Importance of Cache Coherency in TLP
- Correctness: Ensures threads see the latest memory updates.
- Performance: Minimizes stalls due to stale cache reads.
- Scalability: Essential for multi-core systems where each core has private caches.
8. Exam-Friendly Summary
| Concept |
Definition / Purpose |
| Thread-Level Parallelism (TLP) |
Execution of multiple threads concurrently to increase throughput |
| Forms of TLP |
Multithreading, Simultaneous Multithreading (SMT), Multi-core execution |
| Cache Coherency |
Ensures consistent memory view across multiple caches |
| Protocols |
MESI (Modified, Exclusive, Shared, Invalid), write-invalidate, write-update |
| Challenges |
Synchronization, stale data, resource contention |