Disaster Recovery Planning (drp)

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Disaster Recovery Planning (drp) as PDF for free.

More details

  • Words: 3,200
  • Pages: 62
Disaster Recovery Planning (DRP)

Disaster Recovery Planning (DRP) 





DRP is the process of regaining access to the data, hardware and software necessary to resume critical business operations after a natural or human-induced disaster. A disaster recovery plan (DRP) should also include plans for coping with the unexpected o r sudden loss of key personnel, although this is not covered in this article, the focus of which is data protection. DRP is part of a larger process known as business continuity planning (BCP).

What is the difference DRP and BCP (1/2) 



Disaster recovery is the process by which you resume business after a disruptive event. The event might be 





something huge-like an earthquake or the terrorist attacks on the World Trade Center something small, like malfunctioning software caused by a computer virus.

Given the human tendency to look on the bright side, many business executives are pr one to ignoring "disaster recovery" because d isaster seems an unlikely event.

What is the difference DRP and BCP (2/2) 





"Business continuity planning" suggests a more comprehensive approach to making sur e you can keep making money. Often, the two terms are married under the acronym BC/DR. At any rate, DR and/or BC determines how a company will keep functioning after a disrupti ve event until its normal facilities are restored .

What do these plans include (1/2) 

All BC/DR plans need to encompass   



how employees will communicate where they will go how they will keep doing their jobs.

The details can vary greatly, depending on the size and scope of a company and the way it does business.

What do these plans include (2/2) 

For example, The plan at one global manufacturing company 







restore critical mainframes with vital data at a backup site within four to six days of a disruptive event, obtain a mobile PBX unit with 3000 telephones within two days recover the company's 1000-plus LANs in order of business need set up a temporary call center for 100 agents at a nearby training facility.

Events that necessitate disaster recovery          

Natural disasters Fire Power failure Terrorist attacks Organized or deliberate disruptions Theft System and/or equipment failures Human error Computer viruses Testing

Prevention against data loss (1/2) 

Backups sent off-site in regular intervals 



Create an insurance copy on Microfilm or similar and store the records off-site. 



Includes software as well as all data information, to facilitate recovery

Use a Remote backup facility if possible to minimize data loss

Storage Area Networks (SANs) over multiple sites make data immediately available without the need to recover or

Prevention against data loss (2/2) 







Surge Protectors — to minimize the effect of power surges on delicate electr onic equipment Uninterruptible Power Supply (UPS) and/ or Backup Generator Fire Preventions — more alarms, accessible extinguishers Anti-virus software and other security measures

Techniques and technology 

Mirroring 



 

RAID : RAID0 – 6 and combination On-site data storage 



Disk mirroring : Redundant arrays of inexpensive disks 1 (RAID1) Server mirroring: web / ftp /email

Back up - Tape / optical disk

Off-site data storage (backup-site)   

Cold sites Warm sites Hot site

Mirroring 

Mirroring can occur locally or remotely. 







Locally means that a server has a second hard drive that stores data. A remote mirror means that a remote server contains an exact duplicate of the data. The secon d drive is called a mirrored drive.

Data is written to the original drive when a write request is issued. Data is then copied to the mirrored drive, providing a mirror image o f the primary drive. If one of the hard drives fails, all data is protected from loss.

Disk mirroring (RAID1) 



The replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability , currency and accuracy. A mirrored volume is a complete logical representation of separate volume copies

Server mirroring 

 

Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads. Mirroring is a type of file synchronization Web server 





Email server 



To preserve a website or page, especially when it is closed or is about to be closed. To counteract censorship and promote freedom of information To protect loss of email information

ftp server 

To allow faster downloads for users at a specific geographical location

Redundant arrays of inexpensive disks (RAID) The organization distributes the data across multiple smaller disks, offering protection froma crash that could wipe out all data on a single, shared disk.  Benefits of RAID include the following 

 

 

Increased storage capacity per logical disk volume High data transfer or I/O rates that improve information throughput Lower cost per megabyte of storage

RAID0 







RAID Level 0 -aka. a stripe set or striped volume) splits data evenly across two or more disks (striped) with no parity information for redundancy. It is important to note that RAID 0 provides zero data redundancy. RAID 0 is normally used to increase performance A RAID 0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to

RAID1 







A RAID 1 creates an exact copy (or mirror) of a set of data on two or more disks. This is useful when read performance or reliability are more important than da ta storage capacity. Such an array can only be as big as the smallest member disk. A classic RAID 1 mirrored pair contains two disks (see diagram), which increases r eliability

RAID2 

 



A RAID 2 stripes data at the bit (rather than block) level, and uses a Hamming code for error correction. Extremely high data transfer rates are possible. RAID 2 is the only standard RAID level which can automatically recover accurate data from single-bit corruption in data. At the moment, there are no commercial implementations of RAID-2

RAID3 







RAID Level 3uses byte-level striping with a dedicated parity disk. RAID 3 is very rare in practice. One of the side-effects of RAID 3 is that it generally cannot service multiple req uests simultaneously. This comes about because any single block of data will, by definition, be sprea d across all members of the set and will reside in the sa me location.

RAID4 







RAID Level 4 uses block-level striping with a dedicated parity dis k. This allows each member of the set to act independently when onl y a single block is requested. RAID 4 looks similar to RAID 3 except that it stripes at the block level, rather than the byte level. In the example , a read request for block "A1" would be serviced by disk 0. A simultaneous read reque st for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1.

RAID5 









A RAID 5 uses block-level striping with parity data distributed across all member disks. RAID 5 has achieved popularity due to its low cost of redundancy. A minimum of 3 disks is generally required for a complete RAID 5 co nfiguration. In the example, a read request for block "A1" would be serviced by di sk 0. A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1

RAID6 

 

A RAID 6 extends RAID 5 by adding an additional parity block, thus it uses blocklevel striping with two parity blocks distributed across all member disks. Improve reliability Like RAID 5, the parity is distributed in stripes, with the parity blocks in a differe nt place in each stripe.

Nested RAID

Storage Model

Storage Area Network 

The Storage Network Industry Association (SNIA) defines the SAN as a network whose pr imary purpose is the transfer of data between computer systems and storage elements.



A SAN consists of a communication infrastructure, which provides physical conne ctions; and a management layer, which organ izes the connections, storage elements, and c omputer systems so that data transfer is secu

SAN ‘s definition 





Put in simple terms, a SAN is a specialized, high-speed network attachin g servers and storage devices It is sometimes referred to as “the network behind the servers.” A SAN introduces the flexibility of networking to enable one server or man y heterogeneous servers to share a com mon storage utility, which may comprise many storage devices, including disk, ta

SAN Component 

SAN Connectivity 



SAN Storage 



the connectivity of storage and server components typically using Fibre Channel (FC). TAPE /RAID /ESS (Enterprise Storage System) /JBOD (Just Bunch of Disk) /SSA (Serial Storage Architecture)

SAN Server 

Windows /Unix /Linux and etc

Switched Fabric 





An infrastructure specially designed to handle storage communications called a fabric. A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches. Today, all major SAN equipment vendors also offer some form of Fibre Channel routing solution, and these bring substan tial scalability benefits to the SAN archit

Fiber Channel protocol 











Fibre Channel is a layered protocol. It consists of 5 layers, namely: FC0 The physical layer, which includes cables, fiber optics, connectors, pinouts etc. FC1 The data link layer, which implements the 8b/10b encoding and decoding of signals. FC2 The network layer, defined by the FC-PI-2 standard, consists of the core of Fibre Channel, and de fines the main protocols. FC3 The common services layer, a thin layer that could eventually implement functions like encryption o r RAID. FC4 The Protocol Mapping layer. Layer in which other protocols, such as SCSI, are encapsulated into an infor mation unit for delivery to FC2.

IP Storage Networking 

FCIP (Fiber Channel over IP) 



iFCP (Internet Fiber Channel Protocol) 



It is a method for allowing the transmission of Fibre Channel information to be tunneled through the IP network. It is a mechanism for transmitting data to and from Fibre Channel storage devices in a SAN, or on the Internet using TCP/IP

Internet SCSI (iSCSI) 

It is a transport protocol that carries SCSI commands from an initiator to a target.

FCIP (Fiber Channel over IP) 







FCIP encapsulates FC frames within TCP/IP, allowing islands of FC SANs to be interconnected over an IP-based network TCP/IP is used as the underlying transport to provide congestion control and in-order deliver y FC Frames All classes of FC frames are treated the same as datagrams End-station addressing, address resolution, message routing, and other elements of the FC

iFCP 







iFCP is a gateway-to-gateway protocol for implementing a fibre channel fabric over a TCP/IP Traffic between fibre channel devices is routed and switched by TCP/IP network The iFCP layer maps Fibre Channel frames to a predetermined TCP connection for transport FC messaging and routing services are terminated at the gateways so the fabrics are not merged to one another

iSCSI 

iSCSI is a SCSI transport protocol for mapping of block-oriented storage data over TCP/IP networks



The iSCSI protocol enables universal access to storage devices and Storage Area Network s (SANs) over standard TCP/IP networks

Back up site 





A backup site is a location where a business can easily relocate following a disaster, such as fire, flood, or terrorist threat. This is an integral part of the disaster recovery plan of a business. A backup site can be another location operated by the business, or contracted via a company that specializes in disaster recovery services. In some cases, a business will have an

Cold Sites 

 



A cold site is the most inexpensive type of backup site for a business to operate. It provides office spaces to operate It does not include backed up copies of data and information from the original location of th e business, nor does it include hardware alrea dy set up. The lack of hardware contributes to the minimal startup costs of the cold site, but requ ires additional time following the disaster to ha ve the operation running at a capacity close to

Warm Sites 

A warm site is a location where the business can relocate to after the disast er that is already stocked with compute r hardware similar to that of the original site, but does not contain backed up co pies of data and information.

Hot Sites 





A hot site is a duplicate of the original site of the business, with full computer systems as we ll as near-complete backups of user data. Ideally, a hot site will be up and running within a matter of hours. This type of backup site is the most expensive to operate. Hot sites are popular with stock exchanges and other financial institutions who may need to evacuate due to potential bomb threats and must resume normal operations as soon as po

How to choose 





Choosing the type is mainly decided by a company's cost vs. benefit strategy. Hot sites are traditionally more expensive than cold sites since much of the equipment the company needs has a lready been purchased and thus the ope rational costs are higher. However if the same company loses a substantial amount of revenue for each day they are inactive then it may be wor







The advantages of a cold site are simple--cost. It requires much fewer reso urces to operate a cold site because no e quipment has been bought prior to the di saster. The downside with a cold site is the potential cost that must be incurred in order to make the cold site effective. The costs of purchasing equipment on very short notice may be higher and the

Discovery Planning steps (1/3) 

Assess business impact and risk. 







This should include an assessment of the business unit's function and, preferably, a business impact analysis (BIA). The purpose of the assessment is to determine the business unit's relative contribution to the larger organization (monetary and functional). The greater the potential impact, the more money a company should spend to restore a system or process quickly. For instance, a stock trading company may decide to pay for completely redundant IT systems that would allow it to immediately start processing tra

Discovery Planning steps (2/3) 

Develop a Disaster Recovery framework. 





Data should be categorized by importance. Two measures of importance are used, RTO and RPO. Recovery Time Objective (RTO) is the acceptable amount of time between the disa ster and the post-disaster resumption of fun ction (how long can we wait to restore data? ). Recovery Point Objective (RPO) is the

Discovery Planning steps (3/3) 

Develop a recovery strategy and then a written Disaster Recovery Plan. 



That written plan should address at a minimum: response, recovery, and resumpti on of services detailed tasks.

Adjust information systems to make Disaster Recovery easier. 

This includes consolidating servers and data, perhaps with a Storage Area Network or other archival storage method.

Important factors (1/3) 

Communication 





Personnel — notify all key personnel of the problem and assign them tasks focused tow ard the recovery plan. Customers — notifying clients about the problem minimizes panic.

Recall backups 

If backup tapes are taken offsite, these need to be recalled. If using remote backup services, a network connection to the remot e backup location (or the Internet) will be re

Important factors (2/3) 

Facilities 



having backup hot sites or cold sites for larger companies. Mobile recovery facilities are also available from many suppliers.

Prepare your employees 

during a disaster, employees are required to work longer, more stressful hours, and a support system should be in place to allevi ate some of the stress. Prepare them ahead of time to ensure that work runs smoothly.

Important factors (3/3) 

Business information 



backups should be stored in a completely separate location from the company

Testing the plan 

provisions, directions, frequency for testing the plan should be stipulated.

Things to do in DRP (1/4) 

Here are 10 absolute basics your plan should cover:    1. Develop and practice a contingency plan that includes a succession plan for your CEO.

   2. Train backup employees to perform emergency tasks. The employees you count on to lead in an emergency will not always be ava ilable. 3. Determine offsite crisis meeting places for top executives.

Things to do in DRP (2/4) 4. Make sure that all employees-as well as executives-are involved in the exercises so tha t they get practice in responding to an emerge ncy.    5. Make exercises realistic enough to tap into employees' emotions so that you can see how they'll react when the situation gets stressful.    6. Practice crisis communication with

Things to do in DRP (3/4) 7 Invest in an alternate means of communication in case the phone networks go down. 8. Form partnerships with local emergency response groups-firefighters, police to establi sh a good working relationship. Let them bec ome familiar with your company and site.  

Things to do in DRP (3/3) 9. Evaluate your company's performance during each test, and work toward constant improvement. Continuity exerci ses should reveal weaknesses. 10. Test your continuity plan regularly to reveal and accommodate changes. technology, personnel and facilities are i n a constant state of flux at any compan

Top mistakes in disaster recovery (1/3) 1. Inadequate planning:  







Have you identified all critical systems, do you have detailed plans to recover them to the current day? Everybody thinks they know what they have on their networks, but most people don't really know how many servers they have, how they're configured, or what applications reside on them-what services were running, what version of software or operating systems they were using.

Top mistakes in disaster recovery (2/3) 2 Failure to bring the business into the planning and testing of your recovery efforts. 

3 Failure to gain support from senior-level managers. The largest problems here are: 





Not demonstrating the level of effort required for full recovery. Not conducting a business impact analysis and addressing all gaps in your recovery model.

Top mistakes in disaster recovery (3/3) 

Not building adequate recovery plans that outline your recovery time objective, critica l systems and applications, vital document s needed by the business, and business fun ctions by building plans for operational acti vities to be continued after a disaster.



Not having proper funding that will allow for a minimum of semiannual testing.

Related Documents

Disaster Recovery
October 2019 29
Disaster Recovery
October 2019 31
Disaster Recovery
November 2019 30
Disaster Recovery
October 2019 28