Concurrency control is a fundamental aspect of database management systems (DBMS) that ensures multiple transactions can be executed concurrently without violating the consistency of the database. In a multi-user environment, where many transactions are running simultaneously, concurrency control mechanisms are necessary to maintain the integrity of the database and prevent problems like data inconsistency, lost updates, and deadlocks.
The primary goals of concurrency control are:
Isolation: Transactions should execute as if they are the only transactions in the system, meaning that their intermediate results should not be visible to other transactions until they are committed.
Consistency: Concurrency control ensures that even when transactions run concurrently, the database remains in a consistent state after all transactions are completed.
Recoverability: After a failure, the database system should be able to recover to a consistent state, ensuring that the effects of partial or failed transactions are properly managed.
Several problems may arise when transactions are executed concurrently:
Lost Update: Occurs when two or more transactions update the same data item simultaneously, and one update is lost.
Temporary Inconsistent Data: Occurs when one transaction reads data that is in the process of being updated by another transaction.
Uncommitted Data (Dirty Read): Occurs when a transaction reads data that is written by another transaction that has not yet committed.
Non-repeatable Read: Occurs when a transaction reads the same data twice and the values are different each time because another transaction has updated the data in between.
Phantom Reads: Occurs when a transaction reads a set of rows based on a query condition, but another transaction inserts or deletes rows that affect the result set in between the reads.
To manage concurrency control and avoid these problems, DBMSs use different techniques. These techniques can be broadly categorized into two main categories:
Shared Lock (S-lock): Allows a transaction to read a data item but prevents it from being updated by other transactions. Multiple transactions can hold a shared lock on the same data item simultaneously, but no transaction can acquire an exclusive lock while any shared lock is held.
Exclusive Lock (X-lock): Allows a transaction to both read and write a data item. No other transaction can acquire a shared or exclusive lock on the same data item when an exclusive lock is held.
Two-Phase Locking (2PL): This protocol ensures serializability by requiring transactions to acquire all locks they need before releasing any locks. It has two phases:
This guarantees that once a transaction releases a lock, it cannot obtain any further locks, thereby avoiding deadlocks and ensuring serializability.
Strict Two-Phase Locking (S2PL): A variant of 2PL where a transaction holds all its locks until it commits or aborts. This avoids dirty reads and ensures recoverability.
Basic Timestamp Ordering:
Read_TS and Write_TS.Thomas' Write Rule: A refinement of basic timestamp ordering, where a transaction's write operation can be ignored if the data item has already been written by a later transaction with a higher timestamp.
This technique is effective when conflicts are rare, as it minimizes locking and allows for higher throughput.
Transaction isolation levels define the extent to which one transaction’s operations are isolated from the operations of other transactions. SQL provides several standard isolation levels, each with different trade-offs in terms of concurrency and consistency:
Read Uncommitted: Transactions can read data that has been written by other transactions but not yet committed (dirty reads are allowed).
Read Committed: Transactions can only read committed data. Dirty reads are prevented, but non-repeatable reads may occur.
Repeatable Read: Transactions are guaranteed to see the same data each time they read it. Dirty reads and non-repeatable reads are prevented, but phantom reads can still occur.
Serializable: This is the highest level of isolation, where transactions are executed in such a way that the result is equivalent to executing them serially (one after the other). It prevents dirty reads, non-repeatable reads, and phantom reads, but it can result in lower concurrency.
Concurrency control is essential for managing simultaneous access to a database in a multi-user environment. It ensures that the database maintains its integrity and consistency even when multiple transactions are executed concurrently. Various techniques, such as lock-based concurrency control, timestamp ordering, optimistic concurrency control, and multiversion concurrency control, are used to address the challenges of concurrency, such as lost updates, dirty reads, and deadlocks. The choice of concurrency control mechanism and transaction isolation level depends on the specific requirements of the database system, balancing performance and consistency needs.
Open this section to load past papers