Collections of operations that form a single logical unit of work are called Transactions.
A transaction is a unit of program execution that accesses and possibly updates various data items. The transaction consists of all operations executed between the begin transaction and end transaction.
Properties of Transactions : ACID properties
Atomicity : If a transaction begins to execute but fails for whatever reason, any changes to the database that the transaction may have made must be undone. This “all-or-none” property is referred to as atomicity.
Consistency : Execution of a transaction in isolation (that is, with no other transaction executing concurrently) preserves the consistency of the database.
Isolation : Database system must take special actions to ensure that transactions operate properly without interference from concurrently executing database statements.
Durability : If the system ensures correct execution of a transaction, this serves little purpose if the system subsequently crashes and, as a result, the system “forgets” about the transaction. Thus, a transaction’s actions must persist across crashes.
A Simple Transaction Model
We shall illustrate the transaction concept using a simple bank application consisting of several accounts and a set of transactions that access and update those accounts. Transactions access data using two operations:
read(X) : which transfers the data item X from the database to a variable, also called X, in a buffer in main memory belonging to the transaction that executed the read operation
write(X) : which transfers the value in the variable X in the main-memory buffer of the transaction that executed the write to the data item X in the database.
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined as
Ti: read(A); A := A − 50; write(A); read(B); B := B + 50; write(B)
Consistency : The consistency requirement here is that the sum of A and B be unchanged by the execution of the transaction.
Atomicity :
Suppose that, just before the execution of transaction Ti , the values of accounts A and B are $1000 and $2000, respectively. Now during the execution of transaction Ti, a failure occurs that prevents Ti from completing its execution.
The failure happened after the write(A) operation but before the write(B)operation. So the values of accounts A and B reflected in the database are $950 and $2000. The system destroyed $50 as a result of this failure. In particular, we note that the sum A + B is no longer preserved.
To prevent this inconsistency, we use atomicity property where the database system maintains logs and if the transaction fails then the recovery system of the database restores the old values.
Durability : The durability property guarantees that, once a transaction completes successfully, all the updates that it carried out on the database persist, even if there is a system failure after the transaction completes execution. The recovery system of the database is responsible for ensuring durability, in addition to ensuring atomicity.
Isolation :
The database is temporarily inconsistent while the transaction to transfer funds from A to B is executing, with the deducted total written to A (sender) and the increased total yet to be written to B(receiver).
If a second concurrently running transaction reads A and B at this intermediate point and computes A+B, it will observe an inconsistent value. Furthermore, if this second transaction then performs updates on A and B based on the inconsistent values that it read, the database may be left in an inconsistent state even after both transactions have completed
Ensuring the isolation property is the responsibility of a component of the database system called the concurrency-control system.
Although serial execution is desirable for preventing inconsistency by due to performance benefits we prefer parallel execution with safeguards.
A transaction must be in one of the following states:
Active : the initial state; the transaction stays in this state while it is executing.
Partially committed : after the final statement has been executed.
Failed : after the discovery that normal execution can no longer proceed.
Aborted : after the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction.
Committed : after successful completion.
A transaction enters the failed state after the system determines that the transaction can no longer proceed with its normal execution (for example, because of hardware or logical errors). Such a transaction must be rolled back. Then, it enters the aborted state. At this point, the system has two options:
We restart the transaction if it was aborted because of hardware errors.
We kill the transaction if it was aborted due to internal logic problems as we would have to rewrite the code.
Any schedules that are executed concurrently should have the same effect as a schedule that could have occurred without any concurrent execution. That is, the schedule should, in some sense, be equivalent to a serial schedule. Such schedules are called serializable schedules.
In a schedule when two transactions are having operations on a single data item, We say that the operations are in conflict if they are by different transactions on the same data item, and at least one of these instructions is a write operation.
If a schedule S which has conflicting operations can be transformed into a schedule S' which doesn't have conflicting operations by a series of swaps of nonconflicting instructions, we say that S and S are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
Determining conflict serializability of a schedule
Consider a schedule S. We construct a directed graph, called a precedence graph, from S whose set of vertices consists of all the transactions participating in the schedule. The set of edges consists of all edges Ti → Tj for which one of three conditions holds :
Ti executes write(Q) before Tj executes read(Q).
Ti executes read(Q) before Tj executes read(Q).
Ti executes write(Q) before Tj executes write(Q).
If the precedence graph for S has a cycle, then schedule S is not conflict serializable. If the graph contains no cycles, then the schedule S is conflict serializable.
If an edge Ti → Tj is present in the graph then in any serial schedule S' equivalent to S, Ti must appear before Tj.
Note : It is possible to have two schedules that produce the same outcome , but that are not conflict equivalent
Such schedules are known as "schedule equivalent".
View equivalence is a type of schedule equivalence. View serializability is not used in practice due to its high degree of computational complexity.
In the above schedule T7 reads a value from T6 and commits immediately. However if T6 fails and has to be rolledback we cannot undo T7.
T7 has read a value that is now redundant but we cannot undo this as T7 has already committed. This is a nonrecoverable schedule.
A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti , the commit operation of Ti appears before the commit operation of Tj.
Cascadeless Schedules
As seen above T9 reads a value written by T8 and T10 reads a value written by T9. So both T9 and T10 are dependent on T8.
Now when T8 is rolled back a significant amount of work goes into rolling back T9 and T10. This undesirable phenomenon where a single abort causes multiple dependent transactions to rollback is called "Cascading rollback".
A cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti. The commit operation of Ti appears before read of Tj.
It is easy to verify that every cascadeless schedule is also recoverable.
Protocols required to ensure serializability may allow too little concurrency for certain applications. In these cases, weaker levels of consistency are used.
SQL provides such features for the benefit of long transactions whose results do not need to be precise. If these transactions were to execute in a serializable fashion, they could interfere with other transactions, causing the others’ execution to be delayed.
The isolation levels specified by the SQL standard are as follows:
Serializable : Ensures serializable execution. However some database systems implement this isolation level in a manner that may allow nonserializable executions.
Repeatable read : allows only committed data to be read and further requires that, between two reads of a data item by a transaction, no other transaction is allowed to update it.
Read committed : allows only committed data to be read, but does not require repeatable reads.
Read uncommitted : allows uncommitted data to be read. It is the lowest isolation level allowed by SQL.
All the isolation levels above additionally disallow dirty writes, that is, they disallow writes to a data item that has already been written by another transaction that has not yet committed or aborted.