COMP3139›Shared-Memory Programming

Parallel & Distributed ComputingTopic 19 of 33

Shared-Memory Programming

8 minread

1,400words

Intermediatelevel

Shared-Memory Programming

Shared-memory programming is a type of parallel programming where multiple processors or cores have access to the same physical memory. In this model, all the processors can read and write to a shared global memory, which makes it different from distributed-memory systems, where each processor has its own local memory. Shared-memory programming is commonly used in multi-core systems, where multiple processors or cores work together to solve a problem more efficiently.

In shared-memory systems, the challenge is to coordinate access to the shared memory in a way that avoids conflicts (such as multiple processors trying to modify the same memory location at the same time), ensures correct results, and maximizes parallelism.

Let’s break down the key aspects of shared-memory programming:

1. Shared Memory Model

In a shared-memory system, all processors or cores have access to a common memory space. This shared memory allows different processors to communicate by reading and writing to the same data structures.

Memory Access: All processors can access any part of the memory directly, meaning they can read from or write to any variable in memory.
Synchronization: Since multiple processors may access the same memory location simultaneously, synchronization mechanisms (like locks, barriers, or atomic operations) are needed to coordinate access to shared data and ensure correctness.
Communication: Communication between processors happens implicitly through the shared memory. Unlike in distributed systems, where processors communicate by passing messages over a network, in shared-memory systems, processors communicate by reading and writing to shared variables or data structures.

2. Key Concepts in Shared-Memory Programming

a. Threads

Thread: A thread is a unit of execution within a program. In shared-memory programming, multiple threads run concurrently on different processors or cores, each working on a part of the problem.
Thread-level Parallelism: Shared-memory programming typically focuses on thread-level parallelism, where tasks are divided into multiple threads that can run simultaneously on different processors.

b. Synchronization Mechanisms

Since multiple threads can access shared data, synchronization is critical to ensure that one thread doesn’t overwrite data that another thread is using or modifying at the same time. Some common synchronization mechanisms include:

Locks (Mutexes): A lock allows only one thread to access a shared resource at a time. When a thread acquires a lock, other threads must wait until it releases the lock. This prevents race conditions (where two or more threads attempt to modify the same data simultaneously).

Example:
```
// Example of a simple mutex lock in C
pthread_mutex_t lock;

pthread_mutex_lock(&lock);  // Acquire lock
shared_resource++;          // Modify shared resource
pthread_mutex_unlock(&lock); // Release lock
```
Atomic Operations: Some operations can be done atomically, meaning they complete without interruption. For example, atomic increment operations allow a thread to increment a counter without interference from other threads.

Example in C++ using std::atomic:
```
std::atomic<int> counter(0);
counter++;  // Atomic increment
```
Barriers: A barrier is a synchronization mechanism where threads must wait for all other threads to reach a certain point before continuing. Barriers are used to synchronize phases in a parallel algorithm.

Example: In parallel algorithms that perform computations in stages, all threads might need to wait for others to finish their current stage before starting the next one.

3. Parallel Programming Models for Shared Memory

There are several programming models and frameworks designed to work with shared-memory systems. Here are the most common:

a. OpenMP (Open Multi-Processing)

OpenMP is an API that provides a simple and flexible way to parallelize code in C, C++, and Fortran. It allows you to add parallelism to existing sequential programs with minimal code changes by using compiler directives (special comments that are interpreted by the compiler) to specify parallel regions.

Key features of OpenMP:
- Parallel Regions: You specify sections of code that can be executed in parallel.
- Work-sharing Constructs: These constructs divide tasks among threads, such as for loops (#pragma omp for).
- Synchronization Constructs: OpenMP provides mechanisms to handle synchronization, such as #pragma omp critical for critical sections, and #pragma omp barrier for barriers.
Example of OpenMP code:
```
#include <omp.h>
#include <iostream>

int main() {
    int arr[100];
    #pragma omp parallel for
    for (int i = 0; i < 100; i++) {
        arr[i] = i * 2;
    }
    // Parallel loop
    return 0;
}
```

b. Pthreads (POSIX Threads)

Pthreads is a low-level threading library available on Unix-like systems. It provides more control over threads than OpenMP but requires more manual effort to manage threads, synchronization, and communication.

Pthreads is useful for fine-grained control and when you need to implement custom threading models, but it can be more complex and error-prone than higher-level APIs like OpenMP.

Example of Pthreads:
```
#include <pthread.h>
#include <stdio.h>

void* thread_func(void* arg) {
    printf("Hello from thread\n");
    return NULL;
}

int main() {
    pthread_t thread;
    pthread_create(&thread, NULL, thread_func, NULL);
    pthread_join(thread, NULL);  // Wait for thread to finish
    return 0;
}
```

c. C++ Threads

The C++11 standard introduced a simple thread API in the <thread> library. This provides a higher-level, easier-to-use alternative to Pthreads for parallel programming in C++.

Example of C++ thread:

#include <iostream>
#include <thread>

void say_hello() {
    std::cout << "Hello from thread!" << std::endl;
}

int main() {
    std::thread t(say_hello);  // Start thread
    t.join();  // Wait for thread to finish
    return 0;
}

d. Threads in Java

Java provides a built-in threading model with the Thread class. You can extend the Thread class or implement the Runnable interface to create parallel threads. Synchronization can be managed using the synchronized keyword.

Example of Java threads:

class MyRunnable implements Runnable {
    public void run() {
        System.out.println("Hello from thread!");
    }
}

public class Main {
    public static void main(String[] args) {
        Thread thread = new Thread(new MyRunnable());
        thread.start();  // Start the thread
    }
}

4. Challenges in Shared-Memory Programming

While shared-memory programming simplifies communication between threads, it comes with its own set of challenges:

a. Race Conditions

Race conditions occur when two or more threads access shared data concurrently, and the final result depends on the order of execution. This can lead to unpredictable or incorrect behavior.

b. Deadlock

Deadlock happens when two or more threads are waiting for each other to release resources (e.g., locks), and none of them ever proceed. This can halt the entire program if not managed properly.

c. False Sharing

False sharing occurs when threads on different processors or cores access different variables that happen to be located on the same cache line. Even though the variables are independent, the processors repeatedly invalidate each other’s caches, which can lead to performance degradation.

d. Memory Consistency

Memory consistency refers to ensuring that all threads have a consistent view of memory. In some cases, updates to shared variables by one thread may not immediately be visible to other threads due to caching or compiler optimizations. Ensuring proper visibility requires memory barriers or atomic operations.

5. Best Practices for Shared-Memory Programming

To achieve efficient and correct parallel programs in shared-memory systems, here are a few best practices:

Minimize Synchronization: While synchronization is necessary to avoid data races, it can introduce overhead. Where possible, minimize the use of locks, and try to reduce the scope of critical sections.
Use Fine-Grained Locks: Instead of locking entire data structures, use finer-grained locks that allow threads to operate concurrently on different parts of the data.
Avoid False Sharing: Ensure that threads do not operate on data that shares the same cache line unless necessary. Padding data structures can help.
Balance Workload: Try to distribute work evenly across threads to avoid idle time and inefficient parallelism.
Profile Performance: Use profiling tools to identify bottlenecks, synchronization issues, or memory access problems that may be hindering performance.

Conclusion

Shared-memory programming allows for efficient parallel computing in systems with multiple processors or cores that share the same memory space. Key elements in shared-memory programming include the use of threads, synchronization mechanisms, and parallel programming models like OpenMP and Pthreads. While shared-memory systems simplify communication between threads, they come with challenges such as race conditions, deadlocks, and false sharing. By using proper synchronization, efficient load balancing, and careful design, shared-memory programming can provide significant performance improvements for parallel applications.

Previous topic 18

Examples of Parallel Algorithms: Parallel Sorting

Next topic 20

Threads in Shared-Memory Programming

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

// Example of a simple mutex lock in C pthread_mutex_t lock; pthread_mutex_lock(&lock); // Acquire lock shared_resource++; // Modify shared resource pthread_mutex_unlock(&lock); // Release lock

#include <omp.h> #include <iostream> int main() { int arr[100]; #pragma omp parallel for for (int i = 0; i < 100; i++) { arr[i] = i * 2; } // Parallel loop return 0; }

#include <pthread.h> #include <stdio.h> void* thread_func(void* arg) { printf("Hello from thread\n"); return NULL; } int main() { pthread_t thread; pthread_create(&thread, NULL, thread_func, NULL); pthread_join(thread, NULL); // Wait for thread to finish return 0; }

#include <iostream> #include <thread> void say_hello() { std::cout << "Hello from thread!" << std::endl; } int main() { std::thread t(say_hello); // Start thread t.join(); // Wait for thread to finish return 0; }

class MyRunnable implements Runnable { public void run() { System.out.println("Hello from thread!"); } } public class Main { public static void main(String[] args) { Thread thread = new Thread(new MyRunnable()); thread.start(); // Start the thread } }