ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Parallel & Distributed Computing
    COMP3139
    Progress0 / 33 topics
    Topics
    1. Introduction to Parallel and Distributed Systems2. Why Use Parallel and Distributed Systems?3. Speedup and Amdahl's Law4. Hardware Architectures: Multi Processors (Shared Memory)5. Hardware Architectures: Networks of Workstations (Distributed Memory)6. Hardware Architectures: Clusters (Latest Variation)7. Software Architectures: Threads and Shared Memory8. Software Architectures: Processes and Message Passing9. Software Architectures: Distributed Shared Memory (DSM)10. Software Architectures: Distributed Shared Data (DSD)11. Parallel Algorithms12. Concurrency and Synchronization13. Data and Work Partitioning14. Common Parallelization Strategies15. Granularity16. Load Balancing17. Examples of Parallel Algorithms: Parallel Search18. Examples of Parallel Algorithms: Parallel Sorting19. Shared-Memory Programming20. Threads in Shared-Memory Programming21. P Threads22. Locks and Semaphores23. Distributed-Memory Programming24. Message Passing25. Map Reduce26. Distributed-Memory Programming with PI27. Google's Map Reduce28. Hadoop29. Other Parallel Programming Systems30. Tread Marks31. Distributed Shared Memory32. Aurora: Scoped Behavior and Abstract Data Types33. S Enterprise: Process Templates
    COMP3139›Shared-Memory Programming
    Parallel & Distributed ComputingTopic 19 of 33

    Shared-Memory Programming

    8 minread
    1,400words
    Intermediatelevel

    Shared-Memory Programming

    Shared-memory programming is a type of parallel programming where multiple processors or cores have access to the same physical memory. In this model, all the processors can read and write to a shared global memory, which makes it different from distributed-memory systems, where each processor has its own local memory. Shared-memory programming is commonly used in multi-core systems, where multiple processors or cores work together to solve a problem more efficiently.

    In shared-memory systems, the challenge is to coordinate access to the shared memory in a way that avoids conflicts (such as multiple processors trying to modify the same memory location at the same time), ensures correct results, and maximizes parallelism.

    Let’s break down the key aspects of shared-memory programming:


    1. Shared Memory Model

    In a shared-memory system, all processors or cores have access to a common memory space. This shared memory allows different processors to communicate by reading and writing to the same data structures.

    • Memory Access: All processors can access any part of the memory directly, meaning they can read from or write to any variable in memory.
    • Synchronization: Since multiple processors may access the same memory location simultaneously, synchronization mechanisms (like locks, barriers, or atomic operations) are needed to coordinate access to shared data and ensure correctness.
    • Communication: Communication between processors happens implicitly through the shared memory. Unlike in distributed systems, where processors communicate by passing messages over a network, in shared-memory systems, processors communicate by reading and writing to shared variables or data structures.

    2. Key Concepts in Shared-Memory Programming

    a. Threads

    • Thread: A thread is a unit of execution within a program. In shared-memory programming, multiple threads run concurrently on different processors or cores, each working on a part of the problem.
    • Thread-level Parallelism: Shared-memory programming typically focuses on thread-level parallelism, where tasks are divided into multiple threads that can run simultaneously on different processors.

    b. Synchronization Mechanisms

    Since multiple threads can access shared data, synchronization is critical to ensure that one thread doesn’t overwrite data that another thread is using or modifying at the same time. Some common synchronization mechanisms include:

    • Locks (Mutexes): A lock allows only one thread to access a shared resource at a time. When a thread acquires a lock, other threads must wait until it releases the lock. This prevents race conditions (where two or more threads attempt to modify the same data simultaneously).

      Example:

      // Example of a simple mutex lock in C
      pthread_mutex_t lock;
      
      pthread_mutex_lock(&lock);  // Acquire lock
      shared_resource++;          // Modify shared resource
      pthread_mutex_unlock(&lock); // Release lock
      
    • Atomic Operations: Some operations can be done atomically, meaning they complete without interruption. For example, atomic increment operations allow a thread to increment a counter without interference from other threads.

      Example in C++ using std::atomic:

      std::atomic<int> counter(0);
      counter++;  // Atomic increment
      
    • Barriers: A barrier is a synchronization mechanism where threads must wait for all other threads to reach a certain point before continuing. Barriers are used to synchronize phases in a parallel algorithm.

      Example: In parallel algorithms that perform computations in stages, all threads might need to wait for others to finish their current stage before starting the next one.


    3. Parallel Programming Models for Shared Memory

    There are several programming models and frameworks designed to work with shared-memory systems. Here are the most common:

    a. OpenMP (Open Multi-Processing)

    • OpenMP is an API that provides a simple and flexible way to parallelize code in C, C++, and Fortran. It allows you to add parallelism to existing sequential programs with minimal code changes by using compiler directives (special comments that are interpreted by the compiler) to specify parallel regions.

      Key features of OpenMP:

      • Parallel Regions: You specify sections of code that can be executed in parallel.
      • Work-sharing Constructs: These constructs divide tasks among threads, such as for loops (#pragma omp for).
      • Synchronization Constructs: OpenMP provides mechanisms to handle synchronization, such as #pragma omp critical for critical sections, and #pragma omp barrier for barriers.

      Example of OpenMP code:

      #include <omp.h>
      #include <iostream>
      
      int main() {
          int arr[100];
          #pragma omp parallel for
          for (int i = 0; i < 100; i++) {
              arr[i] = i * 2;
          }
          // Parallel loop
          return 0;
      }
      

    b. Pthreads (POSIX Threads)

    • Pthreads is a low-level threading library available on Unix-like systems. It provides more control over threads than OpenMP but requires more manual effort to manage threads, synchronization, and communication.

      Pthreads is useful for fine-grained control and when you need to implement custom threading models, but it can be more complex and error-prone than higher-level APIs like OpenMP.

      Example of Pthreads:

      #include <pthread.h>
      #include <stdio.h>
      
      void* thread_func(void* arg) {
          printf("Hello from thread\n");
          return NULL;
      }
      
      int main() {
          pthread_t thread;
          pthread_create(&thread, NULL, thread_func, NULL);
          pthread_join(thread, NULL);  // Wait for thread to finish
          return 0;
      }
      

    c. C++ Threads

    • The C++11 standard introduced a simple thread API in the <thread> library. This provides a higher-level, easier-to-use alternative to Pthreads for parallel programming in C++.

      Example of C++ thread:

      #include <iostream>
      #include <thread>
      
      void say_hello() {
          std::cout << "Hello from thread!" << std::endl;
      }
      
      int main() {
          std::thread t(say_hello);  // Start thread
          t.join();  // Wait for thread to finish
          return 0;
      }
      

    d. Threads in Java

    • Java provides a built-in threading model with the Thread class. You can extend the Thread class or implement the Runnable interface to create parallel threads. Synchronization can be managed using the synchronized keyword.

      Example of Java threads:

      class MyRunnable implements Runnable {
          public void run() {
              System.out.println("Hello from thread!");
          }
      }
      
      public class Main {
          public static void main(String[] args) {
              Thread thread = new Thread(new MyRunnable());
              thread.start();  // Start the thread
          }
      }
      

    4. Challenges in Shared-Memory Programming

    While shared-memory programming simplifies communication between threads, it comes with its own set of challenges:

    a. Race Conditions

    • Race conditions occur when two or more threads access shared data concurrently, and the final result depends on the order of execution. This can lead to unpredictable or incorrect behavior.

    b. Deadlock

    • Deadlock happens when two or more threads are waiting for each other to release resources (e.g., locks), and none of them ever proceed. This can halt the entire program if not managed properly.

    c. False Sharing

    • False sharing occurs when threads on different processors or cores access different variables that happen to be located on the same cache line. Even though the variables are independent, the processors repeatedly invalidate each other’s caches, which can lead to performance degradation.

    d. Memory Consistency

    • Memory consistency refers to ensuring that all threads have a consistent view of memory. In some cases, updates to shared variables by one thread may not immediately be visible to other threads due to caching or compiler optimizations. Ensuring proper visibility requires memory barriers or atomic operations.

    5. Best Practices for Shared-Memory Programming

    To achieve efficient and correct parallel programs in shared-memory systems, here are a few best practices:

    • Minimize Synchronization: While synchronization is necessary to avoid data races, it can introduce overhead. Where possible, minimize the use of locks, and try to reduce the scope of critical sections.
    • Use Fine-Grained Locks: Instead of locking entire data structures, use finer-grained locks that allow threads to operate concurrently on different parts of the data.
    • Avoid False Sharing: Ensure that threads do not operate on data that shares the same cache line unless necessary. Padding data structures can help.
    • Balance Workload: Try to distribute work evenly across threads to avoid idle time and inefficient parallelism.
    • Profile Performance: Use profiling tools to identify bottlenecks, synchronization issues, or memory access problems that may be hindering performance.

    Conclusion

    Shared-memory programming allows for efficient parallel computing in systems with multiple processors or cores that share the same memory space. Key elements in shared-memory programming include the use of threads, synchronization mechanisms, and parallel programming models like OpenMP and Pthreads. While shared-memory systems simplify communication between threads, they come with challenges such as race conditions, deadlocks, and false sharing. By using proper synchronization, efficient load balancing, and careful design, shared-memory programming can provide significant performance improvements for parallel applications.

    Previous topic 18
    Examples of Parallel Algorithms: Parallel Sorting
    Next topic 20
    Threads in Shared-Memory Programming

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time8 min
      Word count1,400
      Code examples0
      DifficultyIntermediate