🌐 Distributed Database Systems (DDS)
1. What is a Distributed Database System?
A Distributed Database System is a collection of multiple interconnected databases that are distributed across different physical locations (different computers, networks, or sites) but function as a single logical database to users.
- Data is stored across multiple sites.
- Users can access and manipulate data transparently, without knowing its physical location.
- Supports concurrency, reliability, and fault tolerance.
2. Key Characteristics
| Feature |
Description |
| Distribution |
Data stored at multiple sites geographically separated. |
| Autonomy |
Each site can operate independently but coordinate with others. |
| Transparency |
Users see a unified database, hiding distribution details. |
| Replication |
Copies of data stored at multiple sites for reliability and performance. |
| Concurrency Control |
Ensures consistency when multiple sites access data simultaneously. |
3. Types of Distributed Databases
| Type |
Description |
| Homogeneous DDS |
All sites use the same DBMS software and schema. |
| Heterogeneous DDS |
Different sites may use different DBMS products and schemas; requires complex integration. |
4. Data Distribution Strategies
| Strategy |
Description |
| Fragmentation |
Dividing tables into smaller fragments to store across sites. |
| - Horizontal |
Rows of a table are distributed (e.g., by region). |
| - Vertical |
Columns of a table are distributed (e.g., sensitive columns separate). |
| Replication |
Copying entire tables or fragments at multiple sites. |
5. Advantages of Distributed Databases
- Improved reliability and availability (due to replication).
- Faster access to data by locating it closer to users.
- Scalability by adding more sites.
- Local autonomy while maintaining a global database.
- Parallel query processing can improve performance.
6. Challenges in Distributed Databases
| Challenge |
Explanation |
| Data Integrity |
Maintaining consistency across distributed copies. |
| Distributed Transactions |
Coordinating transactions spanning multiple sites. |
| Concurrency Control |
Handling concurrent access and locking across sites. |
| Query Processing |
Optimizing queries over distributed data. |
| Network Issues |
Handling latency, failures, and communication delays. |
| Security |
Securing data across multiple sites and networks. |
7. Distributed Transaction Management
- Ensures ACID properties across multiple sites.
- Uses protocols like Two-Phase Commit (2PC) for atomic commits.
- Coordination between sites to commit or rollback changes consistently.
8. Query Processing in DDS
- Query decomposition into subqueries executed at local sites.
- Data localization: minimizing data transfer over the network.
- Query optimization considers site capabilities and network cost.
9. Example Scenario
Imagine a retail company with databases at different branches worldwide:
- Each branch stores local sales data (horizontal fragmentation).
- A central site may hold replicated summary data.
- Users can query the distributed system seamlessly.
10. Summary Table
| Aspect |
Description |
| Distributed Database |
Database spread across multiple sites |
| Homogeneous vs Heterogeneous |
Same or different DBMS at sites |
| Fragmentation |
Dividing data horizontally or vertically |
| Replication |
Storing copies of data for fault tolerance |
| Distributed Transactions |
Maintaining ACID across sites |
| Query Processing |
Decomposing and optimizing distributed queries |
| Challenges |
Integrity, concurrency, network, security |
Why Distributed Databases?
- Support global applications requiring data sharing.
- Enhance fault tolerance and disaster recovery.
- Allow local control while enabling a global view.