Shared-memory programming is a type of parallel programming where multiple processors or cores have access to the same physical memory. In this model, all the processors can read and write to a shared global memory, which makes it different from distributed-memory systems, where each processor has its own local memory. Shared-memory programming is commonly used in multi-core systems, where multiple processors or cores work together to solve a problem more efficiently.
In shared-memory systems, the challenge is to coordinate access to the shared memory in a way that avoids conflicts (such as multiple processors trying to modify the same memory location at the same time), ensures correct results, and maximizes parallelism.
Let’s break down the key aspects of shared-memory programming:
In a shared-memory system, all processors or cores have access to a common memory space. This shared memory allows different processors to communicate by reading and writing to the same data structures.
Since multiple threads can access shared data, synchronization is critical to ensure that one thread doesn’t overwrite data that another thread is using or modifying at the same time. Some common synchronization mechanisms include:
Locks (Mutexes): A lock allows only one thread to access a shared resource at a time. When a thread acquires a lock, other threads must wait until it releases the lock. This prevents race conditions (where two or more threads attempt to modify the same data simultaneously).
Example:
// Example of a simple mutex lock in C
pthread_mutex_t lock;
pthread_mutex_lock(&lock); // Acquire lock
shared_resource++; // Modify shared resource
pthread_mutex_unlock(&lock); // Release lock
Atomic Operations: Some operations can be done atomically, meaning they complete without interruption. For example, atomic increment operations allow a thread to increment a counter without interference from other threads.
Example in C++ using std::atomic:
std::atomic<int> counter(0);
counter++; // Atomic increment
Barriers: A barrier is a synchronization mechanism where threads must wait for all other threads to reach a certain point before continuing. Barriers are used to synchronize phases in a parallel algorithm.
Example: In parallel algorithms that perform computations in stages, all threads might need to wait for others to finish their current stage before starting the next one.
There are several programming models and frameworks designed to work with shared-memory systems. Here are the most common:
OpenMP is an API that provides a simple and flexible way to parallelize code in C, C++, and Fortran. It allows you to add parallelism to existing sequential programs with minimal code changes by using compiler directives (special comments that are interpreted by the compiler) to specify parallel regions.
Key features of OpenMP:
for loops (#pragma omp for).#pragma omp critical for critical sections, and #pragma omp barrier for barriers.Example of OpenMP code:
#include <omp.h>
#include <iostream>
int main() {
int arr[100];
#pragma omp parallel for
for (int i = 0; i < 100; i++) {
arr[i] = i * 2;
}
// Parallel loop
return 0;
}
Pthreads is a low-level threading library available on Unix-like systems. It provides more control over threads than OpenMP but requires more manual effort to manage threads, synchronization, and communication.
Pthreads is useful for fine-grained control and when you need to implement custom threading models, but it can be more complex and error-prone than higher-level APIs like OpenMP.
Example of Pthreads:
#include <pthread.h>
#include <stdio.h>
void* thread_func(void* arg) {
printf("Hello from thread\n");
return NULL;
}
int main() {
pthread_t thread;
pthread_create(&thread, NULL, thread_func, NULL);
pthread_join(thread, NULL); // Wait for thread to finish
return 0;
}
The C++11 standard introduced a simple thread API in the <thread> library. This provides a higher-level, easier-to-use alternative to Pthreads for parallel programming in C++.
Example of C++ thread:
#include <iostream>
#include <thread>
void say_hello() {
std::cout << "Hello from thread!" << std::endl;
}
int main() {
std::thread t(say_hello); // Start thread
t.join(); // Wait for thread to finish
return 0;
}
Java provides a built-in threading model with the Thread class. You can extend the Thread class or implement the Runnable interface to create parallel threads. Synchronization can be managed using the synchronized keyword.
Example of Java threads:
class MyRunnable implements Runnable {
public void run() {
System.out.println("Hello from thread!");
}
}
public class Main {
public static void main(String[] args) {
Thread thread = new Thread(new MyRunnable());
thread.start(); // Start the thread
}
}
While shared-memory programming simplifies communication between threads, it comes with its own set of challenges:
To achieve efficient and correct parallel programs in shared-memory systems, here are a few best practices:
Shared-memory programming allows for efficient parallel computing in systems with multiple processors or cores that share the same memory space. Key elements in shared-memory programming include the use of threads, synchronization mechanisms, and parallel programming models like OpenMP and Pthreads. While shared-memory systems simplify communication between threads, they come with challenges such as race conditions, deadlocks, and false sharing. By using proper synchronization, efficient load balancing, and careful design, shared-memory programming can provide significant performance improvements for parallel applications.
Open this section to load past papers