Continuity Management
- Premanand Lotlikar 9th September, 2007
Agenda • • • • • • • • • •
Introduction Objective of Continuity Mgmt Benefits Relationship with other processes Phases in Business Continuity Planning (BCP) Disaster Recovery Planning (DRP) Process Control Key Performance Indicators Cost Possible Problems
Introduction • Disasters can strike anytime! • Organizations must plan for these disasters: – Natural- Earthquakes, storms, fires, floods, hurricanes, tornados, and tidal waves – System/technical- Outages, malicious code, worms, and hackers – Supply systems- Electrical power problems, equipment outages, utility problems, and water shortages – Human-made/political- Disgruntled employees, riots, vandalism, theft, crime, protesters, and political unrest
Introduction • The length of time that services could be interrupted are defined as follows: – Minor- Operations are disrupted for several hours to less than a day. – Intermediate- An event of this stature can cause operations to be disrupted for a day or longer. The organization might need a secondary site to continue operations. – Major- This type of event is a true catastrophe. In this type of disaster, the entire facility would be destroyed. A long-term solution would require building a new facility.
Objective • Support overall Business Continuity Management by ensuring that all required IT infrastructure and services can be restored within the specified time limits after the disaster
Benefits • Recovery of the systems • Lose less availability time • Minimize interruptions to business activities
Relationship with other processes • • • • •
SLM Availability Mgmt Configuration Mgmt Capacity Mgmt Change Mgmt
Phases in the BCP process • • • • •
Project Management and Initiation Business Impact Analysis Recovery Strategy Plan design and Development Testing, Maintenance, Awareness, and Training
Project Management and Initiation • Establish the need for the BCP • Perform a risk analysis to identify and document potential outages to critical systems • Results should be presented to management so they understand the potential risk
Project Management and initiation •
With management on board, you can start to develop a plan of action • This management plan should include: – Scope of the project – Appointment of a project planner – Determination of who will be on the team • representatives from senior management, the legal staff, recovery team leaders, the information security department, various business units, networking, and physical security
– Finalize the project plan • finalize issues such as needed resources (personnel, financial), time schedules, budget estimates, and critical success factors
– Determine the data-collection method • Strohl Systems BIA Professional and SunGard’s Paragon software can automate much of the BCP process • Learning curve is involved anytime individuals are introduced to software they are not familiar with
Business Impact Analysis (BIA) • Its role is to describe what impact a disaster would have on critical business functions • Example – DoS attacks that result in 2 hours of downtime of the company’s VoIP phone system will result in $28,000 in lost revenue – 8-hour outage to the web server might cost the company only $1,000 in lost revenue
• These types of numbers will help the organization determine what needs to be done to ensure the survival of the company
Business Impact Analysis • The impact or loss that an organization faces because of lost service or data can be felt in many ways • These are generally measured by one of the following: – Allowable business interruption • Max Tolerable Downtime (MTD) is a measurement of the longest time that an organization can survive without a specific business function
– Financial and operational considerations – Regulatory requirements – Organizational reputation
Business Impact Analysis • The eight steps in the BIA process are as follows: – Select individuals to interview – Determine the methods to be used for gathering information – Develop a customized questionnaire to gather specific monetary and operational impact information – Analyze the compiled data – Determine the time-critical business processes and functions – Determine MTD for each process and function – Prioritize the critical business process or function based on its MTD – Document the findings and report your recommendations to management
Recovery Strategy • Predefined actions that management has approved to be followed in case normal operations are interrupted • Following are recovery strategies for: – Data interruptions • Focus here is on recovering the data • Solutions to data interruptions include backups, offsite storage, and remote journaling
Recovery Strategy • Recovery strategies for: – Operational interruptions • Interruption is caused by the loss of some type of equipment • Solutions to this type of interruption include hot sites, redundant equipment, Redundant Array of Inexpensive Disks (RAID), and Backup Power Supplies (BPS)
– Facility and supply interruptions • Causes of these interruptions can include fire, loss of inventory, transportation problems, Heating Ventilation and Air Conditioner (HVAC) problems, and telecommunications
– Business interruptions • These interruptions can be caused by loss of personnel, strikes, critical equipment, supplies, and office space
Recovery Strategy • To evaluate the losses and determine the best recovery strategy, follow these steps: – Document all costs for each possible alternative. – Obtain cost estimates for any outside services that might be needed. – Develop written agreements with the chosen vendor for such services. – Evaluate what resumption strategies are possible in case there is a complete loss of the facility. – Document your findings and report your chosen recovery strategies to management for feedback and approval.
Plan Design and Development • The team prepares and documents a detailed plan for recovery of critical business systems • The plan should be a guide for implementation • The plan should also detail how the organization will interface with external groups, such as customers, shareholders, the media, the community, and region and state emergency services groups • Final step of the phase is to combine this information into the BCP plan and interface it with the organization’s other emergency plans
Plan Design and Development • Plan should include information on both long-term and short-term goals and objectives: – Identify critical functions and priorities for restoration. – Identify support systems that are needed by critical functions. – Estimate potential disasters and calculate the minimum resources needed to recover from the catastrophe. – Select recovery strategies and determine what vital personnel, systems, and equipment will be needed to accomplish the recovery. – Determine who will manage the restoration and testing process. – Calculate what type of funding and fiscal management is needed to accomplish these goals.
Testing & Maintenance •
Five different types of BCP training: – Checklist • Performed by sending copies of the plan to different department managers and business unit managers for review
– Tabletop • Performed by having the members of the emergency management team and business unit managers meet in a conference to discuss the plan • Primary advantage of the tabletop testing method is to discover dependencies between different departments
– Walkthrough • This is an actual simulation of the real thing • Primary purpose of this test is to verify that members of the response team can perform the required duties
– Functional • Functional test is similar to a walkthrough but actually starts operations at the alternative site
– Full interruption • This plan is the most detailed, time-consuming, and thorough • Mimics a real disaster, and all steps are performed to startup backup operations • Involves all the individuals who would be involved in a real emergency, including internal and external organizations
Awareness and Training • Goal of awareness and training is to make sure all employees know what to do in case of an emergency • Employees assigned to specific tasks should be trained to carry out needed procedures. • Plan for cross-training of teams, if possible, so those team members are familiar with a variety of recovery roles and responsibilities • Number one priority of any BCP or DRP plan is to protect the safety of employees
Disaster Recovery Planning • BCP deals with what is needed to keep the organization running and what functions are most critical • DRP’s purpose is to get a damaged organization restarted where critical business functions can resume • DR activities center on assessing the damage, restoring operations, and determining whether an alternate location will be needed until repairs can be made • These items can be broadly grouped into salvage and recovery
Disaster Recovery Planning •
Salvage – Restoring functionality to damaged systems, units, or the facility • A damage assessment to determine the extent of the damage • A salvage operation to recover any repairable equipment • Repair and cleaning to eliminate any damage to the facility and restore equipment to a fully functional state • Restoration of the facility so that it is fully restored, stocked, and ready for business
•
Recovery – Focused on the responsibilities needed to get an alternate site up and running – This site will be used to stand in for the original site until operations can be restored there
•
#NOTE: Physical security is always of great importance after a disaster. Steps such as guards, temporary fencing, and barriers should be deployed to prevent looting and vandalism
Alternative Sites and H/W Backup • Reciprocal Agreement – Requires two organizations to pledge assistance to one another in case of disaster – Carried out by sharing space, computer facilities, and technology resources (cost effective) – Parties to this agreement must place their trust in the other organization to their aid in case of disaster – There is also the issue of confidentiality because the damaged organization is placed in a vulnerable position and must trust the other party with confidential information – If the parties of the agreement are near each other, there is always the danger that disaster could strike both parties, thereby, rendering the agreement useless
Cold, Warm, and Hot Sites • Cold site – This option can be used by businesses that can manage without IT services for some time – Basically an empty room with only rudimentary electrical, power, and computing capability
• Warm site – Somewhat of an improvement over a cold site – This facility has data equipment and cables, and is partially configured – It could be made operational in anywhere from a few hours to a few days
Cold, Warm, and Hot Sites • Hot site – This facility is ready to go – Fully configured and is equipped with the same system as the production network – Although it is capable of taking over operations at a moment’s notice, it is the most expensive option discussed
• Mobile site – Non-mainstream alternative to traditional recovery options – Typically consist of fully contained tractor-trailer rigs that come with all the needed facilities of a data center – They can be quickly moved to any needed site
Multiple Data Centers • Each of these sites is capable of handling all operations if another fails • Gives the company fault tolerance by maintaining multiple redundant sites • If the redundant sites are geographically dispersed, the possibility of more than one being damaged is low • However, cost is a consideration
Other Alternatives • Database shadowing – Database shadowing system uses two physical disks to write the data to – Creates good redundancy by duplicating the database sets to mirrored servers
• Electronic vaulting – Makes a copy of backup data to a backup location – This is a batch-process operation that functions to keep a copy of all current records, transactions, or files at an offsite location
• Remote journaling – Similar to electronic vaulting, except that information is processed in parallel – By performing live data transfers, it allows the alternate site to be fully synchronized and ready to go at all times
Backup Types • Full Backup – A full backup backs up all files, regardless of whether they have been modified
• Incremental Backup – An incremental backup backs up only those files that have been modified since the previous backup of any sort – Restoration will require all incremental backup tapes plus the last full backup
• Differential Backup – A differential backup backs up all files that have been modified since the last full backup – Restoration will require the full backup and the last differential backup
Process Control • • • •
Effective Configuration Mgmt process Regular test of the recovery plan Up-to-date and effective tools Dedicated training for everyone involved in the process • Support and commitment throughout the organization
Key Performance Indicators • No of identified shortcomings of the recovery plan • Revenue lost further to disaster • Cost of the process
Cost
Possible Problems • • • • • •
Resources Commitment Estimating the damages Access to recovery facilities Lack of awareness BCP is IT departments responsibility!
Thank you!