ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Parallel & Distributed Computing
    COMP3139
    Progress0 / 33 topics
    Topics
    1. Introduction to Parallel and Distributed Systems2. Why Use Parallel and Distributed Systems?3. Speedup and Amdahl's Law4. Hardware Architectures: Multi Processors (Shared Memory)5. Hardware Architectures: Networks of Workstations (Distributed Memory)6. Hardware Architectures: Clusters (Latest Variation)7. Software Architectures: Threads and Shared Memory8. Software Architectures: Processes and Message Passing9. Software Architectures: Distributed Shared Memory (DSM)10. Software Architectures: Distributed Shared Data (DSD)11. Parallel Algorithms12. Concurrency and Synchronization13. Data and Work Partitioning14. Common Parallelization Strategies15. Granularity16. Load Balancing17. Examples of Parallel Algorithms: Parallel Search18. Examples of Parallel Algorithms: Parallel Sorting19. Shared-Memory Programming20. Threads in Shared-Memory Programming21. P Threads22. Locks and Semaphores23. Distributed-Memory Programming24. Message Passing25. Map Reduce26. Distributed-Memory Programming with PI27. Google's Map Reduce28. Hadoop29. Other Parallel Programming Systems30. Tread Marks31. Distributed Shared Memory32. Aurora: Scoped Behavior and Abstract Data Types33. S Enterprise: Process Templates
    COMP3139›Software Architectures: Distributed Shared Data (DSD)
    Parallel & Distributed ComputingTopic 10 of 33

    Software Architectures: Distributed Shared Data (DSD)

    8 minread
    1,417words
    Intermediatelevel

    Software Architectures: Distributed Shared Data (DSD)

    Distributed Shared Data (DSD) is an architecture in which data is shared across multiple distributed systems or nodes in a way that allows the nodes to access and modify the data in a coordinated and consistent manner. While Distributed Shared Memory (DSM) focuses on creating a global address space for memory, Distributed Shared Data typically refers to sharing and managing data (not just memory) across different processes or nodes in a distributed system. This model ensures that different nodes can interact with the data in a consistent manner, regardless of where the data is physically stored or how it is managed.

    1. What is Distributed Shared Data (DSD)?

    Distributed Shared Data refers to a design pattern in distributed systems where data is shared between multiple nodes (machines, processes, or services) across a network. These systems enable data to be accessed, modified, and synchronized across different locations, without the need for each node to have a local copy of the data.

    In this architecture, data might be stored on multiple nodes, but it is accessible in a shared manner across the system. The key challenge in DSD systems is to ensure data consistency, provide synchronization mechanisms, and handle network latencies effectively to maintain the illusion of shared access, while also ensuring efficient use of resources.


    2. Key Concepts of Distributed Shared Data (DSD)

    To better understand DSD, it is important to consider the core concepts involved in a distributed shared data architecture:

    1. Global Data Space

    • Global data space refers to the logical view of the data that is accessible from any node in the distributed system. This can be thought of as a unified data store where data is distributed across multiple nodes, but the system provides an interface for accessing the data in a consistent manner.

    2. Data Partitioning

    • Distributed data is often partitioned to improve performance and scalability. Partitioning involves breaking data into smaller, manageable chunks and distributing those chunks across different nodes. Each node may own and manage a subset of the global data, and queries to the data are routed to the correct node based on partitioning.

    3. Replication

    • Replication is the process of maintaining copies of data across multiple nodes. Replication helps with availability, fault tolerance, and load balancing. However, it also raises challenges related to ensuring consistency between replicas (the replication consistency problem).

    4. Consistency Models

    • Ensuring that the data accessed across distributed nodes is consistent is a critical part of DSD. There are various models of consistency, such as strong consistency, eventual consistency, and causal consistency, each with trade-offs in terms of performance, fault tolerance, and user experience.

    5. Concurrency Control

    • In a distributed shared data system, multiple nodes or processes may attempt to read or write the same data at the same time. Concurrency control mechanisms such as locks, optimistic concurrency, or transactional processing are needed to manage simultaneous access and prevent race conditions or data corruption.

    6. Fault Tolerance

    • A distributed system must be designed to handle node failures without losing data or compromising availability. Fault tolerance mechanisms include data replication, distributed transaction logs, and failure detection and recovery protocols.

    3. Distributed Shared Data vs. Distributed Shared Memory (DSM)

    While both Distributed Shared Data and Distributed Shared Memory (DSM) abstract the idea of sharing data across nodes, there are key differences:

    Feature Distributed Shared Data (DSD) Distributed Shared Memory (DSM)
    Granularity Typically at the data level (structured data, objects, tables, etc.) Typically at the memory or page level (raw memory access)
    Access Model Data is shared via higher-level abstractions (e.g., files, objects, tables, databases) Memory is shared through a low-level memory access interface (i.e., virtual memory)
    Consistency Focuses on data consistency (e.g., transactions, eventual consistency) Focuses on memory consistency (coherency and synchronization of memory reads/writes)
    Common Use Cases Databases, file systems, key-value stores Parallel computing, scientific computing, and high-performance applications
    Synchronization Data synchronization is done at the application or middleware level Memory synchronization is managed by the underlying DSM protocol
    Communication Uses message-passing or distributed databases to sync data across nodes Uses low-level memory communication protocols (often via a network or interconnect)

    4. Architectural Models for Distributed Shared Data (DSD)

    Several architectural approaches are used for implementing Distributed Shared Data systems, and these models differ in how they manage data distribution, consistency, and communication:

    1. Client-Server Architecture

    • In a client-server architecture, one or more server nodes manage the data and share it with multiple client nodes. The clients request data from the server, which processes the request and provides the data.
    • The server typically handles data consistency, updates, and synchronization.
    • Example: Traditional relational databases (like MySQL or PostgreSQL) operate on this model, where data resides on a central server, and clients query the data over a network.

    2. Peer-to-Peer (P2P) Architecture

    • In a P2P architecture, each node can act as both a client and a server, sharing and accessing data from any other node. There is no central server, and nodes communicate directly with each other to share data and maintain consistency.
    • Example: Distributed file systems like Freenet or BitTorrent where each peer shares pieces of data with other peers.

    3. Master-Slave Architecture

    • A master-slave architecture is a type of client-server architecture where the master node controls the data and manages replication, while slave nodes maintain copies of the data. The slaves can access and read the data but are not allowed to modify it unless directed by the master.
    • This model provides more control over the data and is useful for systems requiring strict consistency.
    • Example: MySQL replication where the master node handles writes, and the slave nodes handle reads.

    4. Distributed Database Systems

    • Distributed databases provide an abstraction for sharing and managing data across multiple nodes. These systems typically include mechanisms for partitioning and replicating data across nodes, as well as ensuring ACID (Atomicity, Consistency, Isolation, Durability) properties in transactions.
    • Example: Cassandra, HBase, MongoDB are examples of distributed database systems that implement DSD by allowing distributed nodes to share data.

    5. Distributed Caching Systems

    • Distributed caching systems provide shared data that is used for faster access to frequently used data. These caches are typically distributed across multiple nodes and can hold frequently queried data to reduce load on a database or server.
    • Example: Redis or Memcached are widely used distributed caching systems that allow data to be shared across multiple nodes for faster data retrieval.

    5. Key Challenges in Distributed Shared Data (DSD)

    While DSD provides an efficient and scalable way to manage data in distributed systems, there are several challenges to address:

    1. Data Consistency

    • Maintaining consistency across distributed data copies is challenging. Different nodes might hold copies of the same data, and changes made to one copy must be propagated to all other copies to ensure that they remain consistent.
    • Consistency Models: Strong consistency (immediate propagation), eventual consistency (eventual convergence), and causal consistency are different approaches to managing data consistency in DSD systems.

    2. Fault Tolerance and Availability

    • Distributed systems are vulnerable to node failures, network partitions, and other disruptions. A robust DSD system must be able to recover from failures without losing data or compromising availability.
    • Replication and partitioning strategies (like Quorum-based approaches or leader election) are used to ensure data availability and consistency even in the event of partial system failures.

    3. Latency

    • Since DSD systems often involve communication over a network, latency can become a significant factor. When data is distributed across multiple nodes, accessing the data from a distant node may introduce delays, especially in high-latency networks.
    • Caching and data locality techniques are used to reduce the impact of latency by storing frequently accessed data closer to where it is needed.

    4. Scalability

    • As the number of nodes in a distributed system grows, managing the distribution, replication, and synchronization of data becomes more complex. Scalability requires careful partitioning of data, efficient synchronization algorithms, and mechanisms to balance the load across nodes.
    • Sharding and load balancing are common techniques used to manage data and ensure that the system can scale efficiently.

    5. Concurrency Control

    • With multiple processes or nodes potentially modifying the same data simultaneously, concurrency control mechanisms such as locking, optimistic concurrency, and transactional consistency are needed to prevent race conditions and data corruption.

    6. Advantages and Disadvantages of DSD

    Advantages:

    1. Scalability: Distributed shared data architectures can scale to accommodate large volumes of data and high levels
    Previous topic 9
    Software Architectures: Distributed Shared Memory (DSM)
    Next topic 11
    Parallel Algorithms

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time8 min
      Word count1,417
      Code examples0
      DifficultyIntermediate