DATABASE RECOVERY: Database recovery is the process of restoring a database to the correct state in the event of a failure. Database recovery is a service that is provided by the DBMS to ensure that the database is reliable and remains in consistent state incase of a failure. Recovery techniques are used with concurrency control mechanisms (Concurrency control is the process of managing simultaneous operations i.e. queries, updates, inserts, deletes, etc. on the database without having them interfere with one another.) A major responsibility of the database administrator is to maintain the up time of a database and to prepare for the possibility of hardware, software, network, process and system failure. In the event of a failure, the DBA should also be prepared to bring the database back to operation as quickly as possible and with little or no data loss. If properly planned, recovery will be a smooth operation, thereby protecting the users and the database. Recovery processes vary, depending on the type of failure that has occurred, the structures that have been affected and the type of recovery that is desired.
Why recovery is needed? Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (1) all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or (2) the transaction has no affect whatsoever on the database or on any other transactions. The DBMS must not permit partial completion of a transaction to be applied to the database, which happens if a transaction fails after executing some of the operations but before executing all of them. Types of failures: Failures are generally classified as transaction, system and media failures. There are several possible reasons for a transaction to fail in the middle of execution: 1. A computer failure (system crash): A hardware, software or network error occurs in the computer system during transaction execution. 2. A transaction or system error: Some operation in the transaction may cause it to fail, such as division by zero. It may also occur because of erroneous parameter values or because of a logical programming error. Also user may interrupt it during its execution. 3. Local errors or exception conditions detected by the transaction: During transaction execution, certain conditions may occur that necessitate cancellation
of the transaction. e.g. data for transaction not found e.g. insufficient bank balance. 4. Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later to provide for serializability. 5. Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. 6. Physical problems and catastrophes: This refers to an endless list of problems that includes power or air conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake and mounting of a wrong tape by the operator.
Recovery facilities: A DBMS should provide the following facilities to assist with the recovery: 1. A backup mechanism that makes periodic backup copies of the database. 2. Logging facilities that keep track of the current state of transactions and database changes. 3. A checkpoint facility that enables updates to the database that are in progress to be made permanent. 4. A recovery manager that allows the system to restore the database to a consistent state following a failure. 1. Backup mechanism: The DBMS should provide a mechanism to create backup copies of the database and the log files to be created at regular intervals without having to first stop the system. The backup copy of the database can be used to recover the database in the event that the database has been damaged or destroyed. A backup can be complete copy of the entire database or an incremented copy. An incremental backup consists only of modifications made since the last complete or incremental backup. The backups are usually stored on offline storage like magnetic tapes. 2. Logging: To keep track of the database transactions, the DBMS maintains special files called log files that contain information about all updates to the database. The log file contains information like transaction identifier, type of the log record (transaction start, insert, update, delete, commit, abort, etc.), identifier of the data item affected by the database action (insert, delete, update operations), beforeimage of the data item, after-image of the data item, log management information, checkpoint records, and so on. The log files are stored online so that
recovery can be faster. Archive copies of log files are also maintained offline and only the most recent copy stored online. Online copies basically provide recovery from minor failures. In case of major failures, all the offline log archives are used and an incremental recovery is done. 3. Checkpointing: The log file information is used to recover the database from a failure. One problem with this situation is that we may not know how far back in the log to search and we may end up redoing transactions that have been safely written to the database. To limit the amount of search and subsequent processing that we need to carry out on the log file, we use checkpointing. A checkpoint is a point of synchronization between the database and the transaction log file. All buffers are force written to secondary storage at the checkpoint. Checkpoints are also called syncpoints or savepoints. If the transactions are executed serially, when a failure occurs we check the log file to find the transaction that started before the last checkpoint. All the earlier transactions would have committed previously and would have been written to the database. Therefore, we need only redo the transaction that was active at the checkpoint. If a transaction is active at the time of failure, it must be undone. If the transactions are performed concurrently, we will have to redo all transactions that have committed since the checkpoint and undo all transactions that were active at the time of the failure. 4. Recovery techniques used by the manager: The extent of damage that has occurred to the database decides the recovery technique that will be used. e.g. if the database has been heavily damaged, then the last backup copy will have to be restored and the update operations performed on the database since the last backup have to be reapplied using the log file. If there is no physical damage, but some inconsistency, then it is enough to undo the changes that caused the inconsistency. In the case where the database is damaged, first the database is restored from the backup copy. Then the redo logs are applied to bring the database to the state before it crashed. In this state the database will contain the actions of the transaction that were completed and that were being executed at the time of the crash. There are many techniques to bring back a database to a consistent state. a. Deferred update: These techniques do not physically update the database on disk until after a transaction reaches its commit point; then the updates are
recorded in the database. Before reaching commit, all transaction updates are recorded in the buffers. b. Immediate update: In the immediate update technique, the database may be updated by some operations of a transaction before the transaction reaches its commit point. c. Shadow paging: It is a technique where transaction logs are not required. Two directories foe each database page are created during the life of a transaction-the current directory and the shadow directory. When the transaction starts, both the directories are the same. The shadow directory is never changed during the duration of the transaction and the current directory is updated when the transaction performs a write operation. All I/O operations use the current directory to locate the database pages on the disk. When a transaction commits, the shadow directory is discarded and the current directory becomes the database page directory.