qwertyuiopasdfghjklzxcvbnmqwe rtyuiopasdfghjklzxcvbnmqwertyui opasdfghjklzxcvbnmqwertyuiopa IT DISASTER sdfghjklzxcvbnmqwertyuiopasdfg RECOVERY PLANNING hjklzxcvbnmqwertyuiopasdfghjklz xcvbnmqwertyuiopasdfghjklzxcv ISM ASSIGNMENT bnmqwertyuiopasdfghjklzxcvbnm qwertyuiopasdfghjklzxcvbnmqwe SUBMITTED BY: rtyuiopasdfghjklzxcvbnmqwertyui opasdfghjklzxcvbnmqwertyuiopa sdfghjklzxcvbnmqwertyuiopasdfg hjklzxcvbnmqwertyuiopasdfghjklz xcvbnmqwertyuiopasdfghjklzxcv bnmqwertyuiopasdfghjklzxcvbnm qwertyuiopasdfghjklzxcvbnmqwe rtyuiopasdfghjklzxcvbnmrtyuiopa TARNAV
INTRODUCTION TO DISASTER RECOVERY PLANNING As the world has virtually shrunk to become a global village and business opening and closing times have been replaced with round-the-clock operations and almost all organizations — whether commercial or governmental — rely on some form of technology to manage the various parts of their operations. A disruption to the availability of any of these resources, if even for a few hours, can have serious consequences for their ability to function at normal capacity. For organizations that provide mission critical services such as power plants, telecommunications facilities, and national defense agencies, disruptions must be kept to a minimum or, if possible, avoided altogether. The threats to an organization, whether from the increase in political uncertainty on a global scale, decreased stability of national power networks, or the changing climate conditions and related severe weather, have seemingly been increasing over the past decade. Further, new threats are continually looming on the horizon, such as the outbreak of highly contagious diseases, digital blackmail and hacking, and new methods used by terrorists for wide-scale destruction. And in addition to these, there are of course internal threats, whether damage caused accidentally through human error or purposeful damage to data by an employee. Therefore, how an organization responds to threats during and after a crisis will determine whether they emerge on the other side intact or cause them to cease operations entirely. This is where disaster recovery planning comes into play.
DISASTER RECOVERY: THE MEANING Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or humaninduced disaster. Disaster recovery planning is a subset of a larger process known as BUSINESS CONTINUITY PLANNING and should include planning for resumption of applications, data, hardware, communications (such as networking) and other IT infrastructure. A business continuity plan (BCP) includes planning for non-IT related aspects such as key personnel, facilities, crisis communication and reputation protection, and should refer to the disaster recovery plan (DRP) for IT related infrastructure recovery / continuity. Business Continuity Planning (BCP) – It is an "umbrella" term used to describe the comprehensive process of planning for the recovery of business operations in the event of a business disruption. BCP encompasses planning for the recovery of business operations (Business Unit Continuity Plans), technology environments and data (Technology Continuity Plans) and overall operations (Corporate Business Continuity and Business Continuity Management Plans).
IMPORTANCE OF DISASTER RECOVERY PLANNING With the increasing importance as well as dependency on information technology for the continuation of business critical functions, combined with a transition to an around-the-clock economy, the importance of protecting an organization's data and IT infrastructure in the event of a disruptive situation has become an increasing and more visible business priority in recent years. It is estimated that most large companies spend between 2% and 4% of their IT budget on disaster recovery planning, with the aim of avoiding larger losses in the event that the business cannot continue to function due to loss of IT infrastructure and data. Of companies that had a major loss of business data, ➢ 43% never reopen, ➢ 51% close within two years, and ➢ Only 6% will survive long-term.
GENERAL STEPS TO FOLLOW WHILE CREATING BCP/DRP 1. Identify the scope and boundaries of business continuity plan. First step enables us to 2.
3. 4.
5.
define scope of BCP. It provides an idea for limitations and boundaries of plan. It also includes audit and risk analysis reports for institution’s assets. Conduct a business impact analysis (BIA).It is a review of current operations, with a focus on business processes and functions, to determine the effect that a business disruption would have on normal business operations. Impacts are measured in either Quantitative or Qualitative terms. This information is used to drive the recovery planning process, the potential recovery solutions and the amount of expenditure required to support the backup of certain business operations. Sell the concept of BCP to upper management and obtain organizational and financial commitment. Each department need to understand its role in plan and support to maintain it. In case of disaster, each department has to be prepared for the action. The BCP project team must implement the plan.
THREE AREAS WHICH A DRP MUST ADDRESS 1. Prevention ( pre-disaster ): The pre-planning required – using mirrored servers for mission critical systems, maintaining hot sites, training disaster recovery personnel– to minimize the overall impact of a disaster on systems and resources. This pre-planning also maximizes the ability of an organization to recover from a disaster. 2. Continuity ( during a disaster ): The process of maintaining core, mission-critical systems and resource “skeletons” (the bare minimum assets required to keep an organization in operational status) and/or initiating secondary hot sites during a disaster. Continuity measures prevent the whole organization from folding by preserving essential systems and resources. 3. Recovery ( post disaster ): The steps required for the restoration of all systems and resources to full, normal operational status. Organizations can cut down on recovery time by subscribing to quickship programs (third-party service providers who can deliver pre-configured replacement systems to any location within a fixed timeframe).
THE OBJECTIVES OF DRP ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢ ➢
Provides a greater sense of security. Ensures a certain level of system and resource stability during a disaster. Minimizes system downtime and recovery time. Minimizes the risk of permanent loss of core assets or the entire organization. Minimizes confusion during a disaster. Minimizes the amount of decision-making during a high-stress time when emotions will be running high. Provides a platform in which to stimulate various disaster recovery scenarios. Ensures the reliability of secondary systems such as hot sites and server mirrors. Minimizing potential economic loss. Decreasing potential exposures. Reducing the probability of occurrence. Reducing disruptions to operations. Providing an orderly recovery. Minimizing insurance premiums. Reducing reliance on certain key individuals. Ensuring the safety of personnel and customers. Minimizing legal liability.
THE CREATION OF PLAN: 1.Obtain Top Management Commitment Top management must support and be involved in the development of the disaster recovery planning process. Management should be responsible for coordinating the plan and ensuring its continued viability. Adequate time and resources must be committed to the development of an effective plan. Resources should include both financial considerations and the effort of all personnel involved. 2. Establish a planning committee A planning or steering committee should be appointed to oversee the development and implementation of the plan. Representatives from all functional areas of the organization should be included. The committee will define the scope and objectives of the plan. 3. Perform a risk assessment The planning and steering committee should prepare a risk analysis and business impact analysis that includes a range of possible events, including natural, technical and man-made threats. Each functional area is to be analyzed to determine the consequences and impact of both likely [power failure] and catastrophic [tornado] events. Evaluate the safety of critical documents and vital records. Fire poses one of the greatest threats. Intentional human destruction or sabotage, however, should also be considered. The plan must provide for the “worst case” situation: destruction of the main facilities. Impacts and consequences resulting from loss of information and services should be addressed. Cost effective risk mitigation planning is also the committee’s responsibility. 4. Establish priorities for core processes and functions of the Business operation The critical requirements of each area within the business should be carefully and thoroughly evaluated: • Functional operations • Key personnel • Information and data • Processing systems • Customer service • Documentation • Vital records • Policies and procedures Processing and operations should be analyzed to determine the maximum amount of time that the department and organization can operate without each critical system. Critical needs are defined as the necessary procedures and equipment required to continue operations should an area, main facility, or key resources or any combination of these be destroyed or become unavailable.
A method of determining the critical needs of a department is to document all the functions performed by each area. Once the primary functions have been identified, the operations and processes should be ranked in order of priority: Critical, Essential, or Administrative (supportive). 5. Determine Recovery Strategies The most practical and cost effective alternatives for processing in case of a disaster need to be researched and evaluated. Alternatives, depend upon the evaluation of a given function, and may include: • Relocation To Backup Site (A “warm” site will already have suitable equipment and operating environment) • Reciprocal agreements or vendor service level agreements • Manual processing with specific follow up “return to normal” restoration procedures. • Home or remote processing (Facility is inaccessible but the computer equipment is fully operational) Written agreements with vendors or other agencies for the specific recovery alternatives selected should be prepared. Be sure to consider: • Cost of contingency arrangement • Special security procedures • Notification of systems changes • Required hours of operation • Specific hardware and other equipment required for processing • Personnel requirements-possible temporary staff to accelerate recovery • Circumstances constituting an emergency • Issues of availability and terms of use 6. Perform Data Collection Recommended data gathering materials and documentation includes: • Backup position listing • Critical telephone numbers (work, cell, home, pager) • Communications Inventory including work and an alternate email address • Distribution Log • Records inventory • Equipment inventory • Forms inventory • Insurance policies in effect • Computer hardware /software inventory • Office equipment inventory • Master call list/communication plan • Master vendor and external agency contact list • Notification checklist • Office supply inventory • Off-site storage location inventory
• Software and data files backup/retention schedules • Temporary location specifications, potential or existing backup sites It is advantageous to develop standardized forms to facilitate the data gathering process.
7. Organize and document a written plan An outline is very useful to guide the development of the detailed procedures. • Helps to organize the detailed procedures • Identifies all major steps before the writing begins • Identifies redundant procedures that only need to be written once and defines subprocedures The planning committee should review and approve the proposed plan. The plan should be thoroughly developed, including all detailed procedures to be used before, during and after a disaster. It may not be practical to develop detailed procedures until backup alternatives have been defined. Procedures should include methods for maintaining and updating the plan to reflect any significant internal, external or systems changes and as important, allow for a regular review of the plan by key personnel within the organization. The disaster recovery plan is best structured using a team approach. Specific roles and responsibilities should be assigned to the appropriate team for each functional area of the company. General team categories include administrative functions, facilities, logistics, user support, computer backup, restoration and other important areas in the organization. The Management Team is especially important because it coordinates and accomplishes the actual continuity-recovery process. The Damage Assessment Team should first assess the disaster followed by activation the recovery plan by the team or the Continuity Coordinator, and contact other team leaders. The Management Team also documents the efforts and recovery processes during the event. Management Team members should sit on the Planning Committee to assist in final decisions, setting priorities, policies and procedures. 8. Develop testing criteria and procedures It is essential that the plan be thoroughly tested and evaluated on a regular basis (at least annually). Procedures to test the plan should be documented. The tests will provide the organization with the assurance that all necessary steps are included in the plan. Other reasons for testing include: • Determining the feasibility and compatibility of backup facilities and alternate processing methods • Identifying areas in the plan that require clarification or modification • Providing training to all staff and personnel • Demonstrating the ability of the business to meet the anticipated recovery objective in time and degree • Providing motivation for maintaining and updating the Business Continuity Recovery Program
9. Test the Plan After testing procedures have been completed, test the plan initially by conducting a structured walk-through test. The test will provide additional information regarding any further steps that may need to be included, changes in procedures that are not effective, and other appropriate adjustments. It is recommended that initial testing of the plan should be done in sections, and during off peak business hours to minimize disruptions to the overall operations of the business. 10. Approve the plan Once the plan has been written and tested, it must be approved by all top level management. It is top management’s ultimate responsibility that the business has a current, documented and tested plan. Additional responsibilities include: • Reviewing and approving the plan at least annually, and documenting such reviews in writing • Ensure that the plan is compatible with that of the business.
SCALING LEVELS OF DISASTER It is sometimes advantageous to define levels of disaster for given scenarios, for which a standard set of response procedures can be written. The sub-procedures can then be referenced or called from within other top-level procedures using the designated level of disaster. Level 0-
No Interruption in operations.
Level 1-
Operations can be resumed within eight hours.
Level 2-
Operations can be resumed within 8-48 hours. Users may need to implement manual or alternate processing procedures.
Level 3-
Operations cannot be restored for over 48 hours. All functions and personnel to be moved to an alternate site(s). Users need to implement manual processing.
Alternate disaster scale: Level 0-
The disaster can be handled by the personnel of the organization alone.
Level 1-
The disaster will require some outside intervention for recovery such as police, fire, or other professional services.
Level 2-
The disaster will require assistance from multiple external organizations.
DISASTER RECOVERY STRATEGIES Prior to selecting a Disaster Recovery (DR) strategy, the DR planner should refer to their organization's business continuity plan which should indicate the key metrics of Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for various business processes (such as the process to run payroll, generate an order, etc). The metrics specified for the business processes must then be mapped to the underlying IT systems and infrastructure that support those processes. Once the RTO and RPO metrics have been mapped to IT infrastructure, the DR planner can determine the most suitable recovery strategy for each system. An important note here however is that the business ultimately sets the IT budget and therefore the RTO and RPO metrics need to fit with the available budget. While most business unit heads would like zero data loss and zero time loss, the cost associated with that level of protection may make the desired high availability solutions unpractical. The following is a list of the most common strategies for data protection. ➢ Backups made to tape and sent off-site at regular intervals (preferably daily). ➢ Backups made to disk on-site and automatically copied to off-site disk, or made directly
to off-site disk. ➢ Replication of data to an off-site location, which overcomes the need to restore the data
(only the systems then need to be restored or synced). This generally makes use of storage area network (SAN) technology. ➢ High availability systems which keep both the data and system replicated off-site,
enabling continuous access to systems and data. ➢ Wide Area Network Optimization technology - helps improve disaster recovery and
increases network response time. This type of technology will also make sure data still comes through the network even when it's down. In many cases, an organization may elect to use an outsourced disaster recovery provider (such as SunGard Availability Systems or IBM BCRS) to provide a stand-by site and systems rather than using their own remote facilities. In addition to preparing for the need to recover systems, organizations must also implement precautionary measures with an objective of preventing a disaster in the first place. These may include some of the following: ➢ Local mirrors of systems and/or data and use of disk protection technology such as RAID. ➢ Surge protectors — to minimize the effect of power surges on delicate electronic equipment. ➢ Uninterruptible power supply (UPS) and/or backup generator to keep systems going in the event of a power failure.
➢ Fire preventions — alarms, fire extinguishers. ➢ Anti-virus software and other security measures.
CONCLUSION Continuity and recovery planning traditionally has information technology roots, but involves more than off-site storage or backup processing. Agencies need to develop written, comprehensive continuity recovery plans that address all the critical operations and functions of its business operations. The plan should include documented and tested procedures, which, if followed, will either, ensure the ongoing availability of critical resources and continuity of operations or the efficient and timely recovery of such. Since the probability of occurrence for any given event is highly uncertain, the plan is not dissimilar to liability insurance; it represents an ongoing investment in return for a certain level of protection from financial disaster. In fact, the plan is better protection, because insurance alone it may not compensate for the incalculable loss of business during the interruption or the long-term losses due to damage of reputation. Effective documentation and procedures are extremely important in a continuity recovery plan. Considerable effort and time are necessary to develop a working plan. A well-organized plan requires relatively little maintenance and with proper testing and training provides the type of core stability that cannot be matched by external arrangements or contracts alone.
BIBLIOGRAPHY • • •
• • •
Disaster recovery – Wikipedia, the free encyclopedia. How to Create a Disaster Recovery Plan www.devx.com/security/article/16390 Disaster Recovery Journal – Dedicated to Business Continuity – www.drj.com/new2dr/samples.htm The Disaster Recovery Planning Guide: A-Z Business Continuity Plans. Disaster Recovery Planning – An Introduction by Shyam Sunder Kambhammettu. IT Infrastructure Management by Anita Sengar.