COMP2114›Concurrency Control

Database SystemsTopic 31 of 34

Concurrency Control

9 minread

1,464words

Intermediatelevel

Concurrency Control in Database Systems

Concurrency control is a fundamental aspect of database management systems (DBMS) that ensures multiple transactions can be executed concurrently without violating the consistency of the database. In a multi-user environment, where many transactions are running simultaneously, concurrency control mechanisms are necessary to maintain the integrity of the database and prevent problems like data inconsistency, lost updates, and deadlocks.

Goals of Concurrency Control

The primary goals of concurrency control are:

Isolation: Transactions should execute as if they are the only transactions in the system, meaning that their intermediate results should not be visible to other transactions until they are committed.
Consistency: Concurrency control ensures that even when transactions run concurrently, the database remains in a consistent state after all transactions are completed.
Recoverability: After a failure, the database system should be able to recover to a consistent state, ensuring that the effects of partial or failed transactions are properly managed.

Types of Concurrency Problems

Several problems may arise when transactions are executed concurrently:

Lost Update: Occurs when two or more transactions update the same data item simultaneously, and one update is lost.
- Example: Transaction 1 updates a bank account balance to $100, and at the same time, Transaction 2 updates the same balance to$ 150. One of the updates may be overwritten, resulting in the loss of data.
Temporary Inconsistent Data: Occurs when one transaction reads data that is in the process of being updated by another transaction.
- Example: Transaction 1 is updating an account balance, while Transaction 2 reads the balance. If Transaction 2 reads the data before Transaction 1 completes, it might see an inconsistent state.
Uncommitted Data (Dirty Read): Occurs when a transaction reads data that is written by another transaction that has not yet committed.
- Example: Transaction 1 writes a value, but before it commits, Transaction 2 reads that value. If Transaction 1 is rolled back, Transaction 2 would have used incorrect data.
Non-repeatable Read: Occurs when a transaction reads the same data twice and the values are different each time because another transaction has updated the data in between.
- Example: Transaction 1 reads a value, but before it can process it, Transaction 2 updates the value, making the second read in Transaction 1 inconsistent.
Phantom Reads: Occurs when a transaction reads a set of rows based on a query condition, but another transaction inserts or deletes rows that affect the result set in between the reads.
- Example: Transaction 1 reads all accounts with a balance greater than $1000, but before it finishes, Transaction 2 inserts new records that change the set of records Transaction 1 originally read.

Concurrency Control Techniques

To manage concurrency control and avoid these problems, DBMSs use different techniques. These techniques can be broadly categorized into two main categories:

1. Lock-Based Concurrency Control

Locking is the most widely used method for ensuring that transactions do not interfere with each other when accessing shared data.
A lock is a mechanism that prevents other transactions from accessing a data item while one transaction is using it.

Types of Locks:

Shared Lock (S-lock): Allows a transaction to read a data item but prevents it from being updated by other transactions. Multiple transactions can hold a shared lock on the same data item simultaneously, but no transaction can acquire an exclusive lock while any shared lock is held.
Exclusive Lock (X-lock): Allows a transaction to both read and write a data item. No other transaction can acquire a shared or exclusive lock on the same data item when an exclusive lock is held.

Locking Protocols:

Two-Phase Locking (2PL): This protocol ensures serializability by requiring transactions to acquire all locks they need before releasing any locks. It has two phases:
1. Growing Phase: A transaction can acquire locks but cannot release any locks.
2. Shrinking Phase: A transaction can release locks but cannot acquire any new locks.
This guarantees that once a transaction releases a lock, it cannot obtain any further locks, thereby avoiding deadlocks and ensuring serializability.
Strict Two-Phase Locking (S2PL): A variant of 2PL where a transaction holds all its locks until it commits or aborts. This avoids dirty reads and ensures recoverability.

Deadlock Management:

Deadlock occurs when two or more transactions are waiting for each other to release locks, causing a circular wait. To prevent or manage deadlocks:
- Deadlock Prevention: Systems can use algorithms that ensure deadlock cannot occur (e.g., by ordering locks or disallowing circular wait).
- Deadlock Detection: The DBMS periodically checks for deadlocks and aborts one or more transactions involved in the deadlock.
- Deadlock Resolution: When a deadlock is detected, the system will usually abort one or more transactions to break the deadlock.

2. Timestamp-Based Concurrency Control

In timestamp-based concurrency control, each transaction is assigned a unique timestamp when it starts. This timestamp determines the transaction's priority.
Basic Idea: Transactions are executed in the order of their timestamps, which ensures serializability.

Two Common Types:

Basic Timestamp Ordering:
- Each data item has two timestamps: Read_TS and Write_TS.
- When a transaction tries to read or write a data item, the DBMS checks the transaction's timestamp against the data item's timestamps.
- If a transaction’s request conflicts with another transaction’s timestamp, it is either delayed or aborted to maintain consistency.
Thomas' Write Rule: A refinement of basic timestamp ordering, where a transaction's write operation can be ignored if the data item has already been written by a later transaction with a higher timestamp.

3. Optimistic Concurrency Control

Optimistic Concurrency Control (OCC) assumes that transactions will not conflict and executes them without locking resources.
Each transaction is divided into three phases:
1. Read Phase: The transaction reads the data and performs operations.
2. Validation Phase: Before committing, the transaction checks whether any conflicting transaction has modified the data.
3. Write Phase: If no conflict is found, the transaction writes its changes; otherwise, it is rolled back.

This technique is effective when conflicts are rare, as it minimizes locking and allows for higher throughput.

4. Multiversion Concurrency Control (MVCC)

MVCC allows multiple versions of a data item to exist simultaneously. Each transaction sees a snapshot of the data at the time it started, and any updates do not interfere with other transactions.
This technique is particularly useful in read-heavy environments, as it allows transactions to read uncommitted data without being blocked by write operations.
How MVCC Works:
- When a transaction modifies a data item, a new version of that item is created with a timestamp.
- The system maintains a list of versions, and each transaction accesses the version of data corresponding to the time it started.
- This avoids conflicts between transactions that read and write the same data simultaneously.

Transaction Isolation Levels

Transaction isolation levels define the extent to which one transaction’s operations are isolated from the operations of other transactions. SQL provides several standard isolation levels, each with different trade-offs in terms of concurrency and consistency:

Read Uncommitted: Transactions can read data that has been written by other transactions but not yet committed (dirty reads are allowed).
Read Committed: Transactions can only read committed data. Dirty reads are prevented, but non-repeatable reads may occur.
Repeatable Read: Transactions are guaranteed to see the same data each time they read it. Dirty reads and non-repeatable reads are prevented, but phantom reads can still occur.
Serializable: This is the highest level of isolation, where transactions are executed in such a way that the result is equivalent to executing them serially (one after the other). It prevents dirty reads, non-repeatable reads, and phantom reads, but it can result in lower concurrency.

Conclusion

Concurrency control is essential for managing simultaneous access to a database in a multi-user environment. It ensures that the database maintains its integrity and consistency even when multiple transactions are executed concurrently. Various techniques, such as lock-based concurrency control, timestamp ordering, optimistic concurrency control, and multiversion concurrency control, are used to address the challenges of concurrency, such as lost updates, dirty reads, and deadlocks. The choice of concurrency control mechanism and transaction isolation level depends on the specific requirements of the database system, balancing performance and consistency needs.

Previous topic 30

Grouping and Aggregation in SQL

Next topic 32

Database Backup and Recovery

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

COMP2114›Concurrency Control

Database SystemsTopic 31 of 34

Concurrency Control

9 minread

1,464words

Intermediatelevel

Concurrency Control in Database Systems

Goals of Concurrency Control

The primary goals of concurrency control are:

Isolation: Transactions should execute as if they are the only transactions in the system, meaning that their intermediate results should not be visible to other transactions until they are committed.
Consistency: Concurrency control ensures that even when transactions run concurrently, the database remains in a consistent state after all transactions are completed.
Recoverability: After a failure, the database system should be able to recover to a consistent state, ensuring that the effects of partial or failed transactions are properly managed.

Types of Concurrency Problems

Several problems may arise when transactions are executed concurrently:

Lost Update: Occurs when two or more transactions update the same data item simultaneously, and one update is lost.
- Example: Transaction 1 updates a bank account balance to $100, and at the same time, Transaction 2 updates the same balance to$ 150. One of the updates may be overwritten, resulting in the loss of data.
Temporary Inconsistent Data: Occurs when one transaction reads data that is in the process of being updated by another transaction.
- Example: Transaction 1 is updating an account balance, while Transaction 2 reads the balance. If Transaction 2 reads the data before Transaction 1 completes, it might see an inconsistent state.
Uncommitted Data (Dirty Read): Occurs when a transaction reads data that is written by another transaction that has not yet committed.
- Example: Transaction 1 writes a value, but before it commits, Transaction 2 reads that value. If Transaction 1 is rolled back, Transaction 2 would have used incorrect data.
Non-repeatable Read: Occurs when a transaction reads the same data twice and the values are different each time because another transaction has updated the data in between.
- Example: Transaction 1 reads a value, but before it can process it, Transaction 2 updates the value, making the second read in Transaction 1 inconsistent.
Phantom Reads: Occurs when a transaction reads a set of rows based on a query condition, but another transaction inserts or deletes rows that affect the result set in between the reads.
- Example: Transaction 1 reads all accounts with a balance greater than $1000, but before it finishes, Transaction 2 inserts new records that change the set of records Transaction 1 originally read.

Concurrency Control Techniques

To manage concurrency control and avoid these problems, DBMSs use different techniques. These techniques can be broadly categorized into two main categories:

1. Lock-Based Concurrency Control

Locking is the most widely used method for ensuring that transactions do not interfere with each other when accessing shared data.
A lock is a mechanism that prevents other transactions from accessing a data item while one transaction is using it.

Types of Locks:

Shared Lock (S-lock): Allows a transaction to read a data item but prevents it from being updated by other transactions. Multiple transactions can hold a shared lock on the same data item simultaneously, but no transaction can acquire an exclusive lock while any shared lock is held.
Exclusive Lock (X-lock): Allows a transaction to both read and write a data item. No other transaction can acquire a shared or exclusive lock on the same data item when an exclusive lock is held.

Locking Protocols:

Two-Phase Locking (2PL): This protocol ensures serializability by requiring transactions to acquire all locks they need before releasing any locks. It has two phases:
1. Growing Phase: A transaction can acquire locks but cannot release any locks.
2. Shrinking Phase: A transaction can release locks but cannot acquire any new locks.
This guarantees that once a transaction releases a lock, it cannot obtain any further locks, thereby avoiding deadlocks and ensuring serializability.
Strict Two-Phase Locking (S2PL): A variant of 2PL where a transaction holds all its locks until it commits or aborts. This avoids dirty reads and ensures recoverability.

Deadlock Management:

Deadlock occurs when two or more transactions are waiting for each other to release locks, causing a circular wait. To prevent or manage deadlocks:
- Deadlock Prevention: Systems can use algorithms that ensure deadlock cannot occur (e.g., by ordering locks or disallowing circular wait).
- Deadlock Detection: The DBMS periodically checks for deadlocks and aborts one or more transactions involved in the deadlock.
- Deadlock Resolution: When a deadlock is detected, the system will usually abort one or more transactions to break the deadlock.

2. Timestamp-Based Concurrency Control

In timestamp-based concurrency control, each transaction is assigned a unique timestamp when it starts. This timestamp determines the transaction's priority.
Basic Idea: Transactions are executed in the order of their timestamps, which ensures serializability.

Two Common Types:

Basic Timestamp Ordering:
- Each data item has two timestamps: Read_TS and Write_TS.
- When a transaction tries to read or write a data item, the DBMS checks the transaction's timestamp against the data item's timestamps.
- If a transaction’s request conflicts with another transaction’s timestamp, it is either delayed or aborted to maintain consistency.
Thomas' Write Rule: A refinement of basic timestamp ordering, where a transaction's write operation can be ignored if the data item has already been written by a later transaction with a higher timestamp.

3. Optimistic Concurrency Control

Optimistic Concurrency Control (OCC) assumes that transactions will not conflict and executes them without locking resources.
Each transaction is divided into three phases:
1. Read Phase: The transaction reads the data and performs operations.
2. Validation Phase: Before committing, the transaction checks whether any conflicting transaction has modified the data.
3. Write Phase: If no conflict is found, the transaction writes its changes; otherwise, it is rolled back.

This technique is effective when conflicts are rare, as it minimizes locking and allows for higher throughput.

4. Multiversion Concurrency Control (MVCC)

MVCC allows multiple versions of a data item to exist simultaneously. Each transaction sees a snapshot of the data at the time it started, and any updates do not interfere with other transactions.
This technique is particularly useful in read-heavy environments, as it allows transactions to read uncommitted data without being blocked by write operations.
How MVCC Works:
- When a transaction modifies a data item, a new version of that item is created with a timestamp.
- The system maintains a list of versions, and each transaction accesses the version of data corresponding to the time it started.
- This avoids conflicts between transactions that read and write the same data simultaneously.

Transaction Isolation Levels

Read Uncommitted: Transactions can read data that has been written by other transactions but not yet committed (dirty reads are allowed).
Read Committed: Transactions can only read committed data. Dirty reads are prevented, but non-repeatable reads may occur.
Repeatable Read: Transactions are guaranteed to see the same data each time they read it. Dirty reads and non-repeatable reads are prevented, but phantom reads can still occur.
Serializable: This is the highest level of isolation, where transactions are executed in such a way that the result is equivalent to executing them serially (one after the other). It prevents dirty reads, non-repeatable reads, and phantom reads, but it can result in lower concurrency.

Conclusion

Previous topic 30

Grouping and Aggregation in SQL

Next topic 32

Database Backup and Recovery

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.