Database Recovery
Database recovery is the process of restoring the database to a correct state following a failure. The failure may be the result of a system crash due to hardware or software errors, a media failure, such as a head crash, or a software error in the application, such as a logical error in the program that is accessing the database. It may also be the result of unintentional or intentional corruption or destruction of data. Whatever the underlying cause of the failure, the DBMS must be able to recover from the failure and restore the database to a consistent state. It is the responsibility of DBMS to ensure that the database is reliable and remains in a consistent state in the presence of failures. In general, backup and recovery refers to the various strategies and procedures involved in protecting your database against data loss and reconstructing the data such that no data is lost after failure. Thus, recovery scheme is an integral part of the database system that is responsible for the restoration of the database to a consistent state that existed prior to the occurrence of the failure.
Data Storage
The storage of data generally includes four different types of media with an increasing degree of reliability. These are: ♦
Main memory
♦
Magnetic disk
♦
Magnetic tape
♦
Optical disk.
Stable Storage
There is a Stable Storage in which information is never lost. Stable storage devices are the theoretically impossible to obtain. But, we must use some technique to design a storage system in which the chances of data loss are extremely low.
The most important information needed for whole recovery process must be stored in stable storage.
Implementation of Stable Storage ∀ ♦ RAID (Redundant Array of Independent Disk) is commonly used for implementation of stable storage. In case of RAID, information is replicated on several disks in form of array and each disk has independent failure modes. So, in this case failure of single disk does not result in loss of data. RAID system however cannot prevent data loss in case disasters like fires or flooding. • ∀ ♦ In order to prevent from disasters like fire or flooding, the system storage on archival backup of tapes sent to off site however, since tapes cannot be sent off site continuously and hence some data may be lost in disasters like fire or flooding. • ∀ ♦ In order to remove above problem, more secure system keep a copy of each block of a stable storage at a remote site writing it out over a computer network in addition to storing the block on a local disk.
Causes of failures ∀♦ ∀♦ ∀♦ ∀♦ ∀♦ ∀♦ ∀♦ ∀♦ ∀♦
System Crashes User Error Carelessness Sabotage (intentional corruption of data) Statement Failure Application software errors Network Failure Media Failure Natural Physical Disasters
Recovery from failures ∀♦ Actions taken during normal transaction processing to ensure enough information exists to allow recovery from failure. ∀♦ Actions taken following a failure to recover the database to a state that is known to be correct. •
Recovery and Atomicity of a Transaction In order to understand the concept of atomicity of a transactions, consider again a simplified banking transactions that transfer Rs 50 from account A to account B with initial values of A and B being Rs 1000 and Rs 2000 respectively. T1 Read(A,a) a=a-50 Write(A,a) Output(AX) Read(B,b) b=b+50 Write(B,b) Output(BX)
In this example we suppose that after each write operation output operation is performed or in other-words we can say that this is an example of Force-Output operation.
Suppose that a system crash occurs during the execution of transaction Ti after Output (AX) has been taken place but before output (BX) was executed, here AX and BX are the buffer block which contains the values of data item A and B. Since output (AX) operation performed successfully it means that data item A has value 950 in disk and output (BX) does not performed successfully so its value remains 2000. It means that now system enters into inconsistent state because there is a loss of Rs 50 after execution of transaction T1.
In order to recover for above problem there are two simple means: (i)
Re-execute T1
If transaction T1 is re-executed successfully then value of A is Rs 900 and B is Rs 2050. It results again incorrect information because value A is Rs 900 rather than Rs 950. (ii)
Do not Re-execute T1
The current state of system has A =950 and B=2000 which again an inconsistent state.
Thus, in either case database is left in an inconsistent state, so this simple recovery scheme does not work. The reason for this difficulty is that we have modified the database without having assurance that the transaction will be indeed commit or completed.
We have to perform either all or no database modification made by Ti. This idea gives the concept of Atomicity of Transaction. It means that either 100% or 0% of database modification should be performed by the transaction. If the transaction is not completed successfully then some modification performed by transaction stored in database and others not, then database becomes inconsistent. Thus in order to achieve the consistency of database transaction should have the property of Atomicity. To achieve our goal of atomicity, we must first output information regarding the modification to a stable storage without modifying the database. There are two ways to perform such outputs. These are: ♦
Log Based Recovery
♦
Shadow Paging
Log Based Recovery
It is most used structure for recording database modification. In log based recovery a log file is maintained for recovery purpose.
Log file is a sequence of log records. Log Record maintain a record of all the operations (update) of the database. There are several types of log record.
<Start> Log Record: Contain information about the start of each transaction. It has transaction identification. Transaction identifier is the unique identification of the transaction that starts.Representation <Ti , start>
Log Record: It describes a single database write and has the following fields: < Ti, Xj, V1,V2 > Here, Ti is transaction identifier, Xj is the data item, V1 is the old value of data item and V2 is the modified or new value of the data item Xj. For example < T0, A, 1000, 1100 > Here, a transaction T0 perform a successful write operation on data Item A whose old value is 1000 and after write operation A has value 1100.
Log Record
When a transaction Ti is successfully committed or completed a <Ti, commit> log record is stored in the log file.
Log Record
When a transaction Ti is aborted due to any reason, a <Ti, abort> log record is stored in the log file.
Whenever a transaction performs a write A, it is essential that the log record for that write be created before the database is modified. Once a log record exists, we have the ability to undo a transaction.
Example Suppose that a system crash occurs during the execution of transaction Ti after Output (AX) has been taken place but before output (BX) was executed, here AX and BX are the buffer block which contains the values of data item A and B. Since output (AX) operation performed successfully it means that data item A has value 950 in disk and output (BX) does not performed successfully so its value remains 2000. It means that now system enters into inconsistent state because there is a loss of Rs 50 after execution of transaction Ti. Then in order to recover the data log record play very important role. We just copy the old value of A i.e. 1000 from the log record to the database stored in disk, the operation of taking the old value of the data item from the log record is called Undo operation. Since log record plays a vital role for the recovery of the database so the log must reside in stable storage. Here, we assume that every log record is written to stable storage as soon as it is created.
There are two techniques for log-based recovery: ♦
Deferred Database Modification
♦
Immediate Database Modification
Deferred Database Modification It ensures transaction atomicity by recording all database modifications in the log, but deferring the execution of all write operations of a transaction until the transaction partially commits. A transaction is said to be partially committed once the final action of the transaction has been executed. When a transaction has performed all the actions, then the information in the log associated with the transaction is used in executing the deferred writes. In other words, at partial commits time logged updates are “replayed” into database item. It means that in this case during write operation the modified values of local variable are not copied into database items, but the corresponding new and old values is stored in the log record. But when the transaction successfully performed all the operations, then the information stored at log record is used to set the value of data items, but if the transaction fail to complete, then the value of data item retain their old values and hence in that case information on the log record is simply ignored.
The execution of transaction Ti proceeds as follows: < Ti, Start > written to the log file.
Before Ti starts its execution, a log record is
< Ti, A, V2> The write operation by Ti results in the writing of new records to the log. This record indicates the new value of A i.e. V2 after the write operation performed by Ti. < Ti, Commit > the log.
When Ti partially commits this record is written to
It is important to note that only the new value of the data item is required by the deferred modification technique because if the transaction fails before the commit then log record is simply ignored because data item already contain old values due to deferred write operation. But if the transaction commit the new value present in the log record are copied to the data base item. Thus log record contain only the new values of data item, not the old values of data item. This approach save the stable storage space and reduce the complexity of log record.
Example We reconsider our simplified banking system. Let T1 be a transaction that transfers Rs 100 from account A to account B.This transaction is defined as follows: T1 Read (A,a) a=a-100 write(A,a) Read(B,b) b=b+100 Write(B,b)
Let T2 be a transaction that withdraws Rs 200 from account C. This transaction can be defined as follows:
T2 Read (C,c) c=c-200 Write(C,c) Suppose that these transaction are executed serially, in the order T1 followed by T2 and the value of account A,B and C before the execution takes place were Rs 1000, Rs 2000 and Rs 3000 respectively.
Recovery procedure The recovery procedure of deferred database modification is based on Redo operation as explain below: Redo(Ti) It sets the value of all data items updated by transaction Ti to the new values from the log of records. After a failure has occurred the recovery subsystem consults the log to determine which transaction need to be redone. Transaction Ti needs to be redone if an only if the log contain both the record <Ti, start> and the record <Ti, commit>. Thus, if the system crashes after the transaction completes its execution, then the information in the log is used in restoring the system to a previous consistence state.
Conclusion Redo: <Ti, commit>
If the log contain both records <Ti, start> and
This transaction may be or may not be stored to disk physically. So use Redo operation to get the modified values from the log record.
Re-execute: In all other cases ignore the log record and re-execute the transaction.
Case a: Crash occurs just after write (B,b), then the log record is as shown in case (a) of above figure 11.7. In this case, since no commit record appear in the log , then it mean that no write operation performs on the database item. So, database items just retain their old values. The values of accounts A and B are Rs1000 and Rs2000 respectively. Thus, when the system comes back no redo actions need to be taken, the log records of the incomplete transaction T1 can be deleted from the log. Case b: Crash occurs just after write(C,c), then the log record state is shown in case (b) of above figure 11.7. Since, transaction T1 is completed successfully before the crash, when the system comes back then the new value of items must be copied from the log record to the data items. It means that redo (T1) must be performed to recover the system after the crash. The transaction T2 is not committed, so there is no need of redo(T2) operation. Transaction T2 is re-executed. As before the log record of the incomplete transaction T2 can be deleted for the log. Recovery: redo (T1) is performed since the record appear in the log which results A=1100, B=2100. T2 transaction must be re-executed; the value of C is 3000.
Case c: Crash occurs just after , then log record state is shown in case(c) of above figure 11.7. When the system comes back, two commit records are in the log, one from T1 and another from T2 transaction, the operation redo (T1) and redo (T2) must be performed. After these operations are executed the values of accounts A,B and C becomes Rs1100, Rs2100 and Rs2800 respectively. Recovery:
redo (T1) and redo (T2)
Result A=1100,B=2100 and C=2800 Case d: Crash occurs during recovery action. If the crash occurs during the recovery action, then some transaction may recover and other transactions may not recover. Thus, in order to remove above problems recovery actions are restarted from the beginning. Because, the result of a successful second attempt at redo is same as though redo had succeed the first time.
Immediate Database Modification The immediate database modification technique allows database modification to be output to the database while the transaction is still in the active state. The data modification written by active transactions are called “uncommitted modification”. If the system crash or transaction aborts, then the old value field of the log records is used to restore the modified data items to the value they had prior to the start of the transaction. This restoration is accomplished through the undo operation. In order to understand undo operations, let us consider the format of log record. <Ti, Xj, Vold, Vnew>
Here, Ti is transaction identifier, Xj is the data item, Vold is the old value of data item and Vnew is the modified or new value of the data item Xj.
Undo (Ti):
It restores the value of all data items updated by transaction T1 to the old values. Before a transaction T1 starts its execution the record is written to the log. During its execution, any write (x) operation by T1 is performed by writing of the apropriate new update record to the log. When T1 partially commits the record is written to the log. Example: Let us reconsider our simplified banking system. Let T1 be a transaction that transfers Rs100 from account A to account B and T2 be a transaction that withdraws Rs200 from account C. Suppose these transactions are executed serially as in previous example with initial values of A, B and C as Rs1000, Rs2000 and Rs3000 respectively.
After a failure has occurred, the recovery scheme consults the log record to determine which transaction needs to be undone. Transaction Ti needs to be undone if the log contains the record <Ti, start> but does not contain the record <Ti, commit>. This transaction is crashed during execution. Thus, transaction Ti needs to be undone. Redo (Ti): Sets the values of all data items updated by transaction Ti to the new values. These new values can be found in log record. After a failure has occurred log record is consulted to determine which transaction need to be redone.
Transaction Ti needs to be redone if the log contains both the record <Ti, start> and the record <Ti, commit>. This transaction is crashed just after partially committed. Thus transaction Ti needs to be redone.
Case b: Crash occurs just after write (C,c), when the system comes back two recovery actions needs to be taken, the operation undo (T2) must be performed , since the record appears in the log, but there is no record .The operation redo(T1) must be performed, since the log contains both the records and record Recovery:
redo (T1) and undo (T2)
A=1100,B=2100 and C=3000. T2 transaction must be re-executed. Case c: Crash occurs just after When the system comes back both transactions T1 and T2 need to be redone, since the records and appears in the log as do the records and Recovery:
redo (T1) and redo (T2)
A=1100, B=2100, C=2800.
Checkpoints To recover the database after some failure we must consult the log record to determine which transaction needs to be undone and redone. For this we need to search the entire log to determine this information. There are two major problems with this approach. These are: ♦
The search process is time-consuming.
♦ Most of the transactions need to redone have already written their updates into the database. Although redoing them will cause no harm, but it will make recovery process more time consuming. To reduce these types of overhead, we introduce Checkpoints. The system periodically performs Checkpoints.
Actions Performed During Checkpoints ♦ Output onto stable storage all log records currently residing in main memory. ♦
Output on the disk all modified buffer blocks.
♦
Output onto stable storage a log record .
The presence of a record makes recovery process more streamline. Consider a transaction Ti that committed prior to the Checkpoint, means that <Ti, Commit> must appear in the log before the record. Any database modifications made by Ti must have been written to the database either prior to the checkpoint or as part of checkpoint itself. Thus, at recovery time, there is no need to perform a redo operation on Ti. The checkpoint record gives a list of all transactions that were in progress at the time the checkpoint was taken. Thus, the checkpoints help the system to provide information at restart time which transaction to undo and which to redo.
•
A system failure has occurred at time tf.
•
The most recent checkpoint prior to time tf was taken at time tc.
•
Transactions of type T1 completed (successfully) prior to time tc.
• Transactions of type T2 started prior to time tc and completed (successfully) after time tc and before time tf • Transactions of type T3 also started prior to time tc but did not complete by time tf. • Transactions of type T4 started after time tc and completed (successfully) before time tf. • Finally, transactions of type T5 also started after time tc, but did not complete by time tf.
It should be clear that, in case of immediate modification technique those transactions that have <Ti, start> and <Ti, commit> must be redo and those transactions that have only <Ti, start> and no <Ti, commit> must be undo.
Thus, when the system is restarted in case of immediate database modification, transactions of types T3 and T5 must be undone, and transactions of types T2 and T4 must be redone. Note, however that transactions of type T1 do not enter in the restart process at all, because their updates were forced to the database at time tc as part of the checkpoint process.
Log-Record Buffering As, we assumed earlier that every log record is output to stable storage at the time it is created. This assumption imposes a high overhead on system execution for the following reasons: •
Output to stable storage is performed in units of blocks. In most cases, a log record is much smaller than a block. Thus, the output of each log record translates to a much larger output at the physical level.
•
The cost of performing the output of a block to storage is sufficiently high that it is desirable to output multiple log records at once.
To do so, we write log records to a log buffer in main memory, where they stay temporarily until they are output to stable storage. Multiple log records can be gathered in the log buffer and output to stable storage in a single output operation. The order of log records in the stable storage must be exactly the same as the order in which they
Due to the use of log buffering a log record may reside in only main memory (volatile storage) for a considerable time before it is output to stable storage. Since such log records are lost if the system crashes, we must impose additional requirements on the recovery techniques to ensure transaction atomicity. •
Transaction Ti enters the commit state after the <Ti commit> log record has been output to stable storage.
•
Before the <Ti commit> log record can be output to stable storage, all log records pertaining to transaction Ti must have been output to stable storage.
•
Before a block of data in main memory can be output to the database (in nonvolatile storage), all log records pertaining to data in that block must have been output to stable storage.
The latter rule is called the write-ahead logging (WAL) rule.
The Write-Ahead Log Protocol Before writing a page to disk, every update log record that describes a change to this page must be forced to stable storage. This is accomplished by forcing all log records to stable storage before writing the page to disk. WAL is the fundamental rule that ensures that a record of every change to the database is available while attempting to recover from a crash. If a transaction makes a change and committed, the no-force approach means that some of these changes may not have been written to disk at the time of a subsequent crash. Without a record of these changes, there would be no way to ensure that the changes of a committed transaction survive crashes. Note that the definition of a committed transaction is effectively “a transaction whose log records, including a commit record, have all been written to stable storage”. When a transaction is committed, the log detail is forced to stable storage, even if a no-force approach is being used.
Example
The database is stored in nonvolatile storage (disk) and blocks of data are brought into main memory as needed. Since main memory is typically much smaller than the entire database, it may be necessary to overwrite a block B1 in main memory when another block B2 needs to be brought into memory. If B1 has been modified B1 must be output prior to the input of B2.
The rules for the output of log records limit the freedom of the system to output blocks of data. If the input of block B2 causes block B1 to be chosen for output, all log records pertaining to data in B1 must be output to stable storage before B1 is output.
Thus, the sequence of actions by the system would be as follows: • Output log records to stable storage until all log records pertaining to block B1 have been output. •
Output block B1 to disk.
•
Input block B2 from disk to main memory.
It is important that no writes to the block B1 be in progress while the preceding sequence of actions is carried out.
Failure with Loss of Nonvolatile Storage Until now, we have considered only the case where a failure results in the loss of information residing in volatile storage while the content of the nonvolatile storage remains intact. Although failures in which the content of nonvolatile storage is lost are rare we nevertheless need to be prepared to deal with this type of failure. In this section, we discuss only disk storage. The basic scheme is to dump the entire content of the database to stable storage periodically-say, once per day. For example we may dump the database to one or more magnetic tapes. If a failure occurs that results in the loss of physical database blocks, the most recent dump is used in restoring the database to a previous consistent state. Once this restoration has been accomplished, the system uses the log to bring the database system to the most recent consistent state.
More precisely, no transaction may be active during the dump procedure and a procedure similar to check point must take place:
3. Output all log records currently residing in main memory onto stable storage.
5. Output all buffer blocks onto the disk.
7. Copy the contents of the database to stable storage.
9. Output a log record onto the stable storage.
Steps 1,2 and 4 correspond to the three steps used for checkpoints.
To recover from the loss of nonvolatile storage, we restore the database to disk using the most recent dump. Then the log is consulted and all the transactions that have committed since the most recent dump occurred are redone. Notice that no undo operations need to be executed, because dump already contain old values of data items.
A dump of the database contents is also referred to as an archival dump, since we can archive the dumps and use them later to examine old states of the database. Dumps of a database and check pointing of buffers are similar.
Problems with simple dump operation: • The entire database must be copied to stable storage, resulting in considerable data transfer. • Since transaction processing is halted during the dump procedure, CPU cycles are wasted.
Fuzzy dump schemes have been developed, which allow transactions to be active while the dump is in progress.