CSI-308›Concurrency Control

Database SystemsTopic 11 of 22

Concurrency Control

9 minread

1,471words

Intermediatelevel

Concurrency Control in Database Systems

Concurrency control in database systems refers to the management of simultaneous access to the database by multiple transactions. Its primary goal is to ensure that concurrent execution of transactions preserves the ACID properties (Atomicity, Consistency, Isolation, and Durability), especially the Isolation property. Concurrency control ensures that the database remains consistent and that the operations of concurrent transactions do not interfere with each other in a way that would lead to incorrect results.

Concurrency control techniques are essential because, in multi-user environments, multiple transactions may be trying to access and modify the same data at the same time. Without proper management, this could result in problems like lost updates, temporary inconsistency, uncommitted data, or phantom reads.

Problems Caused by Concurrent Transactions

When multiple transactions are executed concurrently, the following issues can arise:

Dirty Read:
- A transaction reads data that has been written by another transaction but not yet committed. If the second transaction is rolled back, the first transaction has read data that never became permanent.
Example:
- Transaction A updates a value but hasn’t committed yet.
- Transaction B reads that uncommitted value.
- If Transaction A rolls back, Transaction B has read data that doesn’t exist.
Non-repeatable Read:
- A transaction reads a value, but another transaction modifies that value before the first transaction completes. When the first transaction reads the value again, it sees a different result.
Example:
- Transaction A reads a balance from an account.
- Transaction B updates that balance.
- Transaction A reads the balance again and sees a different value, even though it had already read it earlier.
Phantom Read:
- A transaction reads a set of rows that match a certain condition, but another transaction inserts or deletes rows that would change the result of the original query before the first transaction completes.
Example:
- Transaction A queries all accounts with a balance greater than $1000.
- Transaction B inserts a new account with a balance greater than $1000.
- When Transaction A re-queries the database, it sees the newly inserted account, which is known as a phantom read.
Lost Update:
- Two transactions concurrently update the same data, and one of the updates is overwritten or "lost" because the other transaction has committed its changes without considering the first.
Example:
- Transaction A reads a value, say the balance of an account, and updates it.
- Transaction B reads the same value, updates it, and commits.
- Transaction A then commits its changes, but Transaction B’s update is lost.

Concurrency Control Techniques

To prevent the above problems, database systems employ various concurrency control mechanisms. The two primary approaches are locking-based protocols and optimistic concurrency control.

1. Lock-Based Protocols

Lock-based protocols are the most common method of concurrency control. They rely on locks to manage access to data items, ensuring that only one transaction can access a specific data item at a time.

a. Types of Locks

Shared Lock (S-lock): Allows a transaction to read a data item but prevents any other transaction from modifying it. Other transactions can also obtain shared locks for reading, but not for writing.

Example: Multiple transactions can hold shared locks on the same data to read it, but no transaction can modify the data.
Exclusive Lock (X-lock): Allows a transaction to both read and write a data item, and prevents other transactions from either reading or writing that data item.

Example: If Transaction A holds an exclusive lock on a data item, no other transaction can read or write that item until Transaction A releases the lock.

b. Two-Phase Locking Protocol (2PL)

This is a popular locking protocol that ensures serializability, the highest isolation level. In 2PL, a transaction must hold all its locks until it commits or aborts and can release locks only after it has acquired all of them.
The protocol is called "two-phase" because it has two phases:
1. Growing Phase: The transaction can acquire locks but cannot release any locks.
2. Shrinking Phase: The transaction can release locks but cannot acquire any new locks.
Example: Transaction A can acquire locks during the growing phase and releases them only after the transaction is complete.

c. Deadlock in Locking

A deadlock occurs when two or more transactions are waiting indefinitely for locks to be released by the other transaction. This can result in a situation where no progress is made.

Example:
- Transaction A holds a lock on Resource 1 and is waiting for Resource 2.
- Transaction B holds a lock on Resource 2 and is waiting for Resource 1.
- Both transactions are now deadlocked.
Deadlock Handling:
- Deadlock Detection: The system periodically checks for cycles in the transaction wait-for graph (a graph where nodes represent transactions and edges represent lock dependencies). If a cycle is detected, one of the transactions is aborted to break the deadlock.
- Deadlock Prevention: The system ensures that transactions do not hold locks while waiting for others, or that a transaction must acquire all necessary locks at once.

2. Optimistic Concurrency Control

In optimistic concurrency control, transactions are allowed to execute without locking, assuming that conflicts will not occur. Instead, at the end of the transaction, the system checks if any other transaction has modified the data. If no conflicts are detected, the transaction is committed; otherwise, it is rolled back.

a. Phases of Optimistic Concurrency Control:

Read Phase: The transaction reads the data it needs and performs operations.
Validation Phase: The system checks if any conflicting changes were made by other transactions during the read phase.
Write Phase: If no conflicts are detected during the validation phase, the transaction commits; otherwise, it is rolled back.

Example: If two transactions are modifying different parts of a database, optimistic concurrency control allows them to execute without locks, but at commit time, the system checks for conflicts and aborts one if necessary.

b. Advantages of Optimistic Concurrency Control:

Better for environments with low contention (fewer conflicting transactions).
Lower overhead since transactions are not locked during execution.

c. Disadvantages:

Performance degradation when there is high contention, as many transactions may be rolled back.

3. Multiversion Concurrency Control (MVCC)

MVCC is a concurrency control method where multiple versions of a data item are maintained, allowing transactions to access older versions of data while new versions are being created by other transactions.

How it Works: When a transaction reads a data item, it is provided with the most recent version that was committed before the transaction started. Updates are performed on a new version of the data, and when the transaction commits, the new version is made visible to others.
Advantages: MVCC allows for greater concurrency by reducing the need for locks. Transactions do not block each other for reading since they can read older versions of data.
Example: In a database where transactions are frequent, MVCC allows one transaction to read a piece of data while another updates it, with the system keeping track of versions and making the updates visible only once the transaction commits.

4. Timestamp Ordering

Timestamp ordering is a concurrency control method that assigns each transaction a unique timestamp. Transactions are executed in timestamp order, with the system ensuring that older transactions take precedence over newer ones.

How it Works: Each transaction is assigned a timestamp at the start. During execution, the system ensures that no transaction performs operations that would violate the serializability order of transactions based on their timestamps. For example, if Transaction A has an earlier timestamp than Transaction B, Transaction A should be given priority for operations that might conflict.
Advantages: Timestamp ordering ensures that transactions are serialized without the need for locking, reducing contention between transactions.

5. Isolation Levels

The isolation level defines the degree to which a transaction is isolated from the effects of other concurrent transactions. The more isolated a transaction is, the fewer concurrency problems it will encounter, but this also comes with a performance cost.

Serializable: The highest isolation level, ensuring complete isolation between transactions. It prevents dirty reads, non-repeatable reads, and phantom reads.
Repeatable Read: Prevents dirty reads and non-repeatable reads but allows phantom reads.
Read Committed: Prevents dirty reads but allows non-repeatable reads.
Read Uncommitted: The lowest isolation level, allowing dirty reads, non-repeatable reads, and phantom reads.

Conclusion

Concurrency control is a critical aspect of database management, especially in multi-user environments. By ensuring that transactions are executed in a way that maintains database consistency and integrity, concurrency control helps prevent issues like dirty reads, lost updates, and non-repeatable reads. Methods such as lock-based protocols, optimistic concurrency control, MVCC, and timestamp ordering provide different strategies for managing concurrent access to the database, each with its own advantages and trade-offs.

Previous topic 10

Optimization Concepts

Next topic 12

Recovery Techniques

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

CSI-308›Concurrency Control

Database SystemsTopic 11 of 22

Concurrency Control

9 minread

1,471words

Intermediatelevel

Concurrency Control in Database Systems

Problems Caused by Concurrent Transactions

When multiple transactions are executed concurrently, the following issues can arise:

Dirty Read:
- A transaction reads data that has been written by another transaction but not yet committed. If the second transaction is rolled back, the first transaction has read data that never became permanent.
Example:
- Transaction A updates a value but hasn’t committed yet.
- Transaction B reads that uncommitted value.
- If Transaction A rolls back, Transaction B has read data that doesn’t exist.
Non-repeatable Read:
- A transaction reads a value, but another transaction modifies that value before the first transaction completes. When the first transaction reads the value again, it sees a different result.
Example:
- Transaction A reads a balance from an account.
- Transaction B updates that balance.
- Transaction A reads the balance again and sees a different value, even though it had already read it earlier.
Phantom Read:
- A transaction reads a set of rows that match a certain condition, but another transaction inserts or deletes rows that would change the result of the original query before the first transaction completes.
Example:
- Transaction A queries all accounts with a balance greater than $1000.
- Transaction B inserts a new account with a balance greater than $1000.
- When Transaction A re-queries the database, it sees the newly inserted account, which is known as a phantom read.
Lost Update:
- Two transactions concurrently update the same data, and one of the updates is overwritten or "lost" because the other transaction has committed its changes without considering the first.
Example:
- Transaction A reads a value, say the balance of an account, and updates it.
- Transaction B reads the same value, updates it, and commits.
- Transaction A then commits its changes, but Transaction B’s update is lost.

Concurrency Control Techniques

To prevent the above problems, database systems employ various concurrency control mechanisms. The two primary approaches are locking-based protocols and optimistic concurrency control.

1. Lock-Based Protocols

a. Types of Locks

Shared Lock (S-lock): Allows a transaction to read a data item but prevents any other transaction from modifying it. Other transactions can also obtain shared locks for reading, but not for writing.

Example: Multiple transactions can hold shared locks on the same data to read it, but no transaction can modify the data.
Exclusive Lock (X-lock): Allows a transaction to both read and write a data item, and prevents other transactions from either reading or writing that data item.

Example: If Transaction A holds an exclusive lock on a data item, no other transaction can read or write that item until Transaction A releases the lock.

b. Two-Phase Locking Protocol (2PL)

This is a popular locking protocol that ensures serializability, the highest isolation level. In 2PL, a transaction must hold all its locks until it commits or aborts and can release locks only after it has acquired all of them.
The protocol is called "two-phase" because it has two phases:
1. Growing Phase: The transaction can acquire locks but cannot release any locks.
2. Shrinking Phase: The transaction can release locks but cannot acquire any new locks.
Example: Transaction A can acquire locks during the growing phase and releases them only after the transaction is complete.

c. Deadlock in Locking

A deadlock occurs when two or more transactions are waiting indefinitely for locks to be released by the other transaction. This can result in a situation where no progress is made.

Example:
- Transaction A holds a lock on Resource 1 and is waiting for Resource 2.
- Transaction B holds a lock on Resource 2 and is waiting for Resource 1.
- Both transactions are now deadlocked.
Deadlock Handling:
- Deadlock Detection: The system periodically checks for cycles in the transaction wait-for graph (a graph where nodes represent transactions and edges represent lock dependencies). If a cycle is detected, one of the transactions is aborted to break the deadlock.
- Deadlock Prevention: The system ensures that transactions do not hold locks while waiting for others, or that a transaction must acquire all necessary locks at once.

2. Optimistic Concurrency Control

a. Phases of Optimistic Concurrency Control:

Read Phase: The transaction reads the data it needs and performs operations.
Validation Phase: The system checks if any conflicting changes were made by other transactions during the read phase.
Write Phase: If no conflicts are detected during the validation phase, the transaction commits; otherwise, it is rolled back.

Example: If two transactions are modifying different parts of a database, optimistic concurrency control allows them to execute without locks, but at commit time, the system checks for conflicts and aborts one if necessary.

b. Advantages of Optimistic Concurrency Control:

Better for environments with low contention (fewer conflicting transactions).
Lower overhead since transactions are not locked during execution.

c. Disadvantages:

Performance degradation when there is high contention, as many transactions may be rolled back.

3. Multiversion Concurrency Control (MVCC)

How it Works: When a transaction reads a data item, it is provided with the most recent version that was committed before the transaction started. Updates are performed on a new version of the data, and when the transaction commits, the new version is made visible to others.
Advantages: MVCC allows for greater concurrency by reducing the need for locks. Transactions do not block each other for reading since they can read older versions of data.
Example: In a database where transactions are frequent, MVCC allows one transaction to read a piece of data while another updates it, with the system keeping track of versions and making the updates visible only once the transaction commits.

4. Timestamp Ordering

How it Works: Each transaction is assigned a timestamp at the start. During execution, the system ensures that no transaction performs operations that would violate the serializability order of transactions based on their timestamps. For example, if Transaction A has an earlier timestamp than Transaction B, Transaction A should be given priority for operations that might conflict.
Advantages: Timestamp ordering ensures that transactions are serialized without the need for locking, reducing contention between transactions.

5. Isolation Levels

Serializable: The highest isolation level, ensuring complete isolation between transactions. It prevents dirty reads, non-repeatable reads, and phantom reads.
Repeatable Read: Prevents dirty reads and non-repeatable reads but allows phantom reads.
Read Committed: Prevents dirty reads but allows non-repeatable reads.
Read Uncommitted: The lowest isolation level, allowing dirty reads, non-repeatable reads, and phantom reads.

Conclusion

Previous topic 10

Optimization Concepts

Next topic 12

Recovery Techniques

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.