Redpaper Alex Osuna Roland Tretau
IBM System Storage N series MetroCluster Introduction The IBM® System Storage™ N series MetroCluster extends Clustered Failover capabilities from a primary to a remote site. It replicates data from the primary site to the remote site to ensure that the data there is completely up-to-date and available. If Site A goes down, MetroCluster allows you to rapidly resume operations at a remote site minutes after a disaster: Stretch MetroCluster provides a disaster recovery option at distances up to 500 meters between each N series system (Figure 1 on page 2). – Available on N5000 & N7000 series. Fabric MetroCluster provides a disaster recovery option at distances up to 100 km using a Fibre Channel switched network. – Available on N5000 series In this IBM Redpaper publication, we discuss the following: MetroCluster Features & Benefits MetroCluster Functionality Failure Scenarios
© Copyright IBM Corp. 2006. All rights reserved.
ibm.com/redbooks
1
Figure 1 shows a MetroCluster configuration.
Site A
Site B Stretch
Fabric
X Vol-X
Ym Mirrored Vol-Y
Xm Mirrored Vol-X
Y Vol-Y
Figure 1 MetroCluster configurations
MetroCluster Benefits The benefits of using MetroCluster are: MetroCluster is designed to be a simple-to-administer solution that extends fail-over capability from within a data center to a remote site. MetroCluster is also designed to provide replication of data from the primary site to a remote site, which helps to keep the data at the remote site current. The combination of failover and data replication aids in the recovery from disaster — helping prevent loss of data — in less time than otherwise possible. Extends Clustered Failover capabilities from a primary to a remote site. Replicates data from the primary site to the remote site to ensure that data there is completely up-to-date and available. If Site A goes down, MetroCluster allows you to rapidly resume operations at a remote site minutes after a disaster.
N series Business Continuity Solutions The environment or business need determines the necessary configuration for you.
2
IBM System Storage N series MetroCluster
Protection levels MetroCluster provides three protection levels: Within the data center Campus distances WAN distances In the following section, we further discuss the protection levels.
Within the Data Center For protection of power faults, node failure, and network connector failure, a Cluster Failover may be the right protection for you, as shown in CFO Figure 2.
Campus distances For protection from building, power, node, volume, switch, and network connector failures, the Stretch MetroCluster might be the right configuration for you. By creating a scenario where failover occurs to another building or installation on the campus, availability is increased even one step further.
WAN distances For those of you whose business requires the highest level of availability, the MetroCluster Fabric configuration delivers the needed options. By making available different synchronous versions of SnapMirror®, the Stretch MetroCluster configuration protects against location, building, power, node, volume, switch, and network connector failures. Figure 2 shows the protection levels.
IBM System Storage
WAN Distances Campus distances
Async SnapMirror Primary Data Center
Within DataCenter
Clustered Failover (CFO) • High system protection
MetroCluster (Stretch) • Cost effective zero RPO protection
• Most cost effective with RPO from 10 min. to 1 day
MetroCluster (Fabric) • Cost effective zero RPO protection
Sync SnapMirror • Most robust zero RPO protection
5 1
N Series Metro Cluster
© 2006 IBM Corporation
Figure 2 Protection levels
IBM System Storage N series MetroCluster
3
What is MetroCluster MetroCluster software provides an enterprise solution for high availability over wide area networks (WAN). MetroCluster deployments of IBM N series storage systems are used for: Business continuance Disaster recovery Achieving recovery point and recovery time objectives MetroCluster technology is an important component of enterprise data protection strategies. If a disaster occurs at a source site, businesses can continue to run and access data from a clustered node in a remote site. The primary goal of MetroCluster is to provide mission-critical applications with redundant storage services in case of site-specific disasters. It is designed to tolerate site-specific disasters with minimal interruption to mission-critical applications and zero data loss by synchronously mirroring data between two sites. A MetroCluster system is made up of the following components and requires the following licenses: Multiple storage controllers and high availability (HA) configuration Provides automatic failover capability between sites in case of hardware failures. SyncMirror® Provides an up-to-date copy of data at the remote site. The data is ready for access after failover without administrator intervention. Cluster remote Provides a mechanism for the administrator to declare site disaster and initiate a site failover using a single command, for ease of use. FC switches Provide storage system connectivity between sites that are greater than 500* meters apart. MetroCluster allows the active/active configuration to be spread across data centers up to 100 kilometers apart (see Figure 3 on page 5). In the event of an outage at one data center, the second data center can assume all affected storage operations, that are lost, with the original data center. SyncMirror is required as part of MetroCluster to ensure that an identical copy of the data exists in the second data center, should the original data center be lost: MetroCluster and SyncMirror extend active/active clustering across data centers that are up to 100 kilometers apart. Either dark-fiber or Dense Wave Division Multiplexing (DWDM) between the switches is required. MetroCluster and SyncMirror provide the highest level of storage resiliency across a local region. Highest levels of regional storage resiliency ensure continuous data availability in a particular geography.
4
IBM System Storage N series MetroCluster
Figure 3 shows how MetroCluster allows the active/active configuration to be spread across data centers that are up to 100 kilometers apart.
IBM System Storage
MetroCluster is a unique, cost-effective synchronous replication solution for combined high availability and disaster recovery within a campus or metro area Major Data Center
Nearby Office
LAN/SAN
Stretch MetroCluster provides Campus DR protection - Can stretch up to 500m
N Series
Fabric MetroCluster provides Metropolitan DR protection
Disks
- Can stretch up to 100km with FC switches
5 1
Configurations
N Series Metro Cluster
© 2006 IBM Corporation
Figure 3 MetroCluster
The benefits of MetroCluster: Synchronous Mirroring with SyncMirror Synchronous Mirroring for MetroCluster mirrors the Write Anywhere File Layout (WAFL®) volumes (aggregates). Both copies or plexes are updated synchronously on writes, which insures consistency. The design of IBM N series and MetroCluster provides data availability even in the event of multiple disk failures. Read performance is optimized by performing application reads from both plexes (Figure 4 on page 6). Performance is directly affected by the distance between plexes. One of the additional performance benefits of Synchronous mirroring is the low write overhead and latencies. Mirroring is done closer to the hardware, and there is Zero impact on the secondary controller.
IBM System Storage N series MetroCluster
5
Figure 4 illustrates a MetroCluster with mirrored plexes.
IBM System Storage
FC
FC
x
Ple0
Plex1
6 1
RAID Groups N Series Metro Cluster
RAID Groups
© 2006 IBM Corporation
Figure 4 MetroCluster with mirrored plexes
MetroCluster: SyncMirror configuration Use the sysconfig –r command (Figure 5) to see the volume/plex/raidgroup relationship in Data ONTAP®.
Î M irrored volum e Volume spiel (online, normal, mirrored) (zoned checksums) Plex /spiel/plex0 (online, normal, active) RAID group /spiel/plex0/rg0 RAID Disk --------parity data
Device -----------3.2 3.3
HA SHELF BAY --------------3 0 2 3 0 3
CHAN ---FC:A FC:A
Used (MB/blks) -------------8579/17570816 8579/17570816
Phys (MB/blks) -------------8683/17783112 8683/17783112
CHAN ---FC:A FC:A
Used (MB/blks) -------------16979/34774016 8579/17570816
Phys (MB/blks) -------------17560/35964296 17560/35964296
Plex /spiel/plex2 (online, normal, active) RAID group /spiel/plex2/rg0 RAID Disk --------parity data
Device -----------6.8 6.9
HA SHELF BAY --------------6 1 0 6 1 1
7 1
N Series Metro Cluster
Figure 5 Volume relationships
6
IBM System Storage N series MetroCluster
© 2006 IBM Corporation
Use aggr mirror to start mirroring the plexes, as we show in Example 1. Example 1 aggr mirror syntax
aggr mirror
[-n] [-f] [-v ] [-d ... ] - add a mirror to aggregate or traditional volume
MetroCluster: Cluster_Remote License The Cluster_Remote License provides features that enable the administrator to declare a site disaster and initiate a site failover using a single command: Enables the cf forcetakeover -d. – Initiates a takeover of the local partner even in the absence of a quorum of partner mailbox disks. Provides the ability for an administrator to declare a site-specific disaster and have one node takeover its partner’s identity without quorum of disks. Root volumes of both filers MUST be synchronously mirrored. Only synchronously mirrored aggregates are available during a site disaster. Requires administrator intervention as a safety precaution against a ‘split brain’ scenario (cf forcetakeover –d). Important: Site-specific disasters are not the same as a Normal Cluster Failover!
MetroCluster components MetroCluster is an integrated solution made up of the following components: Clustered Failover Provides the high-availability failover capability between sites. The administrator controls the failover decision depending on the impact of the disaster. SyncMirror Provides an up-to-date copy of data at the remote site. You can access the data after failover, without administrator intervention. Cluster _Remote Provides a mechanism for an administrator to declare a site disaster and initiate a site failover using a single command, for ease of use. FC Switches Provides filer connectivity between sites that are greater than 500 meters apart, which enables sites to be located at a safe distance away from each other.
IBM System Storage N series MetroCluster
7
MetroCluster configurations Table 1 lists the MetroCluster configurations. Table 1 MetroCluster configurations Configuration
Supported platform
Switch Mode operation
Number of dedicated switches
Drives supported capacity
Shelves per loop
Maximum number of shelves
Short Distance <500 Meters
N5000 N7000 series
N/A
N/A
SATA or FC
6
Same as N5XXX A20 and N7XXX A20
Long Distance 500 Meters to 100KM
N5000 series
Fabric
2 per site (4 per MetroClust er pair)
FC
2
24 (12 each side)
Table 2 Switch distances Switch
Up to 500m
<=10km
<=10km
<=50km
<=80km
IBM 2005-B16 Brocade 200E
Standard SFPs
Vendor: Finisar Vendor #: FTLF-1319P1xCL
Vendor: Finisar Vendor #: FTLF-1419P1BCL
Vendor: Finisar Vendor #: FTLF-1419P1BCL
Vendor: Finisar Vendor #: FTLF-1519P1BCL
IBM 2005-H16 Brocade 3850 IBM 2005-H08 Brocade 3250 IBM 2109-F16 Brocade 3800 IBM 3534-F08 Brocade 3200
Table 3 shows the supported N series Gateway configurations.
8
IBM System Storage N series MetroCluster
Table 3 Supported Gateway Configurations Gateway Model
Data ONTAP Versions
Switches
Storage
N5500 N5200
7.2.3
Brocade 48000, 7500, 5000, 4900, 4100, 200E with switch FW 5.2.1b
HDS TagmaStore FW 50-09-06, 50-08-06, 50-07-64 HDS Lightning 9980V/9970V FW 21-14-33 HDS Lightning 9960V/9910 FW 01-19-99 IBM DS4700, DS4800 FW 6.23.05.00
N5500 N5200
N/A (Stretch MetroCluster)
N5500 N5200
Cisco MDS 9506, 9509 with switch FW 3.0(2)
HDS TagmaStore FW 50-09-06, 50-08-06, 50-07-64 HDS Lightning 9980V/9970V FW 21-14-33 HDS Lightning 9960V/9910 FW 01-19-99
N7800 N7600 N5600 N5300
McData 6140 FW v9.02.01
N7800 N7600 N5600 N5300
Brocade 48000, 7500, 5000, 4900, 4100, 200E with switch FW 5.2.1b
HDS TagmaStore FW 50-09-06, 50-08-06, 50-07-64 HDS Lightning 9980V/9970V FW 21-14-33 HDS Lightning 9960V/9910 FW 01-19-99 IBM DS4700, DS4800 FW 6.23.05.00
N7800 N7600 N5600 N5300
N/A (Stretch MetroCluster)
Stretch MetroCluster and Fabric MetroCluster The IBM N series supports two configurations: Stretch and Fabric. We cover both configurations in this section.
Stretch Stretch MetroCluster provides a disaster recovery option at distances up to 500 meters between each N series system. It only requires the MetroCluster license and the SyncMirror license. To connect the N series nodes over a distance greater than 30m, you need to convert the IB-copper to fiber using feature 1042 - Copper-Fiber Converter. This adaptor has multi fiber Push-On (MPO) connectors and needs a MTP plug (NVRAM5). What is MTP/MPO? Optical links specify the LC duplex connector for 1X links, the MTP/MPO for 4X links, and dual MTP/MPO for 12X links. All of these optical connectors are accepted as ad hoc industry standards for other optical data, communication protocols as well.
IBM System Storage N series MetroCluster
9
Stretch is available on N5000 & N7000 series.
Fabric Fabric MetroCluster provides a disaster recovery option at distances up to 100km using a Fibre Channel switched network. Available on N5000 and N7000 series
Fabric MetroCluster configuration details First of all you will need to make sure you have the right Storage Initiator HBA’s see Table 4. Platform
N5200,N5500
N5300,N5600
N7000
Min Data ONTAP Version
Feature Code 1006
Yes
No
7.1h2
Feature Code 1014
No
Yes
7.2.3
Feature Code 1029
No
Yes
7.2.3
Table 4 Fabric MetroCluster Storage Initiator HBA’s
Onboard ports 0a,0b,0c and 0d can be used The heartbeat and NVRAM uses a separate Dual-port MetroCluster HBA. This HBA is dedicated and not used for the IO traffic of SyncMirror see Table 5. Table 5 Fabric MetroCluster Interconnect HBA’s Platform
N5200,N5500
N5300,N5600
N7600,N7800
Min Data ONTAP
Feature Code 1018
Yes
No
7.1.H12
Feature Code 1032
No
Yes
7.2.3
Additional Requirements for Fabric MetroCluster support Must use Clustered models No-Charge SyncMirror License on each storage controller EXN2000 or EXN4000 storage only (EXN1000
Fabric MetroCluster HBAs In this section we discuss 1042 - Copper-Fiber Converter (1042). To connect the filer heads over a distance greater than 30 m, for a stretched cluster you need to convert the IB-copper to fiber using feature 1042 - Copper-Fiber Converter. This adaptor has MPO connectors and needs a MTP plug (NVRAM5).
10
IBM System Storage N series MetroCluster
Table 6 lists the Fabric MetroCluster Storage Initiator HBAs. Table 6 Fabric MetroCluster Storage Initiator HBAs Platform
N5200, N5500
N5300,N5600
N7600,N7800
Min Data ONTAP
Feature code 1006a
Yesb
No
No
7.1h2
a. Fabric MetroCluster only 1006 supported, Stretch MetroCluster all Storage Initiator HBAs supported. b. Can use onboard ports 0a,0b,0c,0d.
Table 7 shows the Fabric MetroCluster Interconnect HBAs. Table 7 Fabric MetroCluster Interconnect HBAs Platform
N5200,N5500
N5300,N5600
N7600,N7800
Min Data ONTAP
Feature Code 1018
Yes
No
No
7.1.H12
Stretch MetroCluster: Campus Distance (<500m) In a stretch MetroCluster, the controllers and expansion shelves are attached to Fibre Channel switches, and the switches have GBICs to communicate across the WAN to one another. SyncMirror is built into MetroCluster in that every write is written to two separate expansion units to two separate aggregate groups. The advantage of MetroCluster is that it can take a high availability solution and stretch it outside of the frame. Think about taking a disk subsystem, such as the IBM DS4800, separating the controllers miles apart, and maintaining two separate disk groups to ensure failover, instead of building two completely separate DS4800 systems and synchronously mirroring between the two. The benefits of Stretch MetroCluster, Campus Distance are: Stretch MetroCluster provides a disaster recovery option at distances of up to 500 meters between each IBM N series system. See Figure 6 on page 12. Stretch MetroCluster only requires the MetroCluster license and the SyncMirror license. Stretch MetroCluster is available on N5000 and N7000 series.
IBM System Storage N series MetroCluster
11
Figure 6 shows the MetroCluster at less than 500 meters.
Figure 6 MetroCluster <500M
You can use SATA disks either in mirrored configuration or non-mirrored configuration (with one of the nodes only) on a Stretch MetroCluster.
Fabric MetroCluster: Metro Distance (>500m) The benefits of MetroCluster, Metro Distance are: Fabric MetroCluster provides a disaster recovery option at distances up to 100km using a Fibre Channel switched network (Figure 7) Fabric MetroCluster uses switches for longer distance disaster recovery solutions.
12
IBM System Storage N series MetroCluster
Figure 7 shows an example of the Fabric MetroCluster.
Figure 7 Fabric MetroCluster
Using DWDM switches Dense Wave Division Multiplexing is a method of multiplexing multiple channels of fiber optic-based protocols, such as ESCON®, Fibre Channel, FICON®, and Gbit Ethernet, onto physical cable by assigning different wavelengths of light (that is colors) to each channel and then fanning it back out at the receiving end. The major players in the enterprise class DWDM marketplace are: Nortel Networks, Cisco (ONS 15540), and Lucent. Dense Wave Division Multiplexors are data link Layer 2 tools. Thus, the typical DWDM machine does not perform any switching, routing, or protocol conversion.
IBM System Storage N series MetroCluster
13
Figure 8 depicts a Fabric MetroCluster installation using DWDM.
Figure 8 DWDM environment
Fabric attach MetroCluster Switch Matrix Table 8 is shows the required Firmware and Data ONTAP levels for San switches. Table 8 MetroCluster Switch Matrix Data ONTAP level Switch Model IBM 2105-H08/Brocade 3250
7.1h2
7.2 or above Minimum Firmware level 5.2.0a
IBM 2105-H16/Brocade 3850 IBM 3534-F08/Brocade 3200
3.3.1b
IBM 2109-F16/Brocade 3800 IBM 2005-B16/Brocade 200E
5.1.0
Cable selection Use Table 9 on page 15 to determine if the desired cable length is within the supported maximum specification. We provide this table for guidance to determine the distances; however, the best way to determine the best cable type is by testing the cable length in the environment (because there are many variables such as types of cables, panels, and so forth), for example, a client needs to run an FC cable over approximately 260 m (850 ft.) and wants to run it at 2 Gbps. How do we determine the maximum distance that can be allowed 14
IBM System Storage N series MetroCluster
for cabling between nodes? Table 9 is a cable selection chart than can help you determine if the cable length you want is within the supported maximum specification. Table 9 Cable selection chart Cable type
Fiber core type
Mode
Wave length
Maximum distance (m)
Attenuation (Db/Km)
Maximum channel attenuation
Splice loss
Connector pair loss
OM2
50/ 125 um
Multi
850
550
3.00
3.25
0.3
0.75
OM3
50/ 125 um
Multi
850
550
3.00
3.50
0.3
0.75
OS1
9/125 um
Single
1310
2000
0.40
7.80
0.3
0.75
OM2
50/ 125 um
Multi
850
550a
3.00
2.62
0.3
0.75
OM3
50/ 125 um
Multi
850
550
3.00
3.25
0.3
0.75
OS1
9/125 um
Single
1310
2000
0.40
7.80
0.3
0.75
OM2
50/ 125 um
Multi
850
150
3.00
2.06
0.3
0.75
OM3
50/ 125 um
Multi
850
150
3.00
3.00
0.3
0.75
OS1
9/125 um
Single
1310
500
0.40
7.80
0.3
0.75
1Gbps
2Gbps
4Gbps
IB 1X 250MB/sec OM2
50/ 125 um
Multi
850
250
3.50
2.38
0.3
0.75
OM3
50/ 125 um
Multi
850
500
3.50
3.25
0.3
0.75
a. According to the maximum channel attenuation (2.62 dB) and an attenuation of 3.00 dB/km, the maximum distance for this cable type is 406 m, so be careful with longer distances (up to 550 m) with OM2 cable at 2 Gbps.
Table 9 summarizes data that is related to optical cabling for data communications, which is available in documents that are published by various standards organizations. We focus on data that is relevant to fiber deployments, which are supported on IBM N series systems. To determine whether the desired cable run length is within the supported maximum specification: 1. Determine the needed transfer rate based on the type of shelf you use. 2. Find out what fiber type is installed for the system. 3. Determine the number of connectors in the path between the nodes. Refer to Table 9 and locate the maximum supported operating distance, compare this value to the distance for the actual intended application, and verify that the actual does not exceed the supported maximum. IBM System Storage N series MetroCluster
15
Consider the 260 m distance for the above client. Let us assume: Desired transfer rate: 2 Gbps Fiber core type: 50/125 OM2 Multimode cabling Number of connector pairs: two Table 9 on page 15 shows that 260 m is within the operating maximum distance for all Fibre Channel transfer rates. This also assumes that the fiber connection is point-to-point with only source and destination connections and no patch panels or splices.
MetroCluster best practices The following list contains MetroCluster best practices: Read the Data ONTAP 7.2 Active/Active Configuration Guide. Read Chapter 3: “Installing a MetroCluster” in the Data ONTAP 7.2 Active/Active Configuration Guide. Ensure correct cabling!!! Ensure correct switch configuration, and apply 16port and fabric licenses. Test the installation and fail-over scenarios. Train the staff.
N series resiliency Figure 9 gives you an idea of what events the N series can protect against with MetroCluster. Event
Does th e Event T rigger a Failover?
Single Disk F ailure
No
Yes
Double Disk Failure (sam e RAID group)
No
Yes, with no CFO necessary Yes, with no CFO necessary
Triple Disk Failure (sam e RAID group)
No
Single HBA (initiator) failure, LO OP A
No
Yes, with no CFO necessary
Single HBA (initiator) failure,
No
Yes, with no CFO necessary
Single HBA initiator failure, (Loop A + B)
No
Yes, with no CFO necessary
LRC F ailure (Loop A)
No
Yes, with no CFO necessary
LRC F ailure (Loop B)
No
Yes
ESH Failure (Loop A)
No
Yes, with no CFO necessary
LO OP B
ESH Failure (Loop B)
No
Yes
Shelf (backplane) Failure
No
Yes, with no CFO necessary
Shelf, Single Power F ailure
No
Yes
Shelf, Dual Power Failure
No
Yes, with no CFO necessary
Head, Single Power Failure
No
Yes
Head, Dual Power F ailure
Yes
Yes, if CFO is successful
Cluster Interconnect Failure (1 port)
No
Yes
Cluster Interconnect Failure (both ports)
No
Yes
Ethernet Interface Failure (prim ary, no VIF )
No
No
Ethernet Interface Failure (prim ary, VIF )
No
Yes
Ethernet Interface Failure (secondary, VIF)
No
Yes
Ethernet Interface Failure (VIF, all ports)
No
No
Tape Interface Failure
No
Yes
Heat +/- operating tem peratures
Yes
Yes, if partner is within operating tem p...
Fan Failures (disk shelves or filer head)
No
Yes
Reboot/Panic
No*
M aybe (depends on cause of panic)
22
Figure 9 N series resiliency
16
Is data still available o n the affected volum e after th e event?
IBM System Storage N series MetroCluster
Failure scenarios In this section, we give examples of some possible failures and the associated recovery that is available when you use MetroCluster.
Host failure In this scenario, Host 1 is lost; however, access to data continues uninterrupted. Host 2 continues accessing data, as shown in Figure 10.
Figure 10 Host failure
Node failure In this scenario, environmental factors destroyed IBM N series N1. IBM N series takes over access to its disks, as shown in Figure 11 on page 18. The fabric switches provide the connectivity for the IBM N series and the hosts, to continue data access uninterrupted.
IBM System Storage N series MetroCluster
17
Figure 11 is an example of Node failure.
Host 1
Data Center #1
Host 2
Data Center #2
Recovery - Automatic N #2 can access the disks in both DCs
N1
DC1 Primary DC 2 Mirror
N2
DC 2 Primary DC1 Mirror
Figure 11 Node failure
Node and expansion unit failure This scenario is likely happen when a catastrophic action occurs to a rack with a IBM N series and its expansion units. Both the IBM N series and the expansion units become unavailable, as shown in Figure 12 on page 19. In order to continue access, you must perform the cluster failure with the cfo -d command. Data access is restored because DC1 Mirror was synchronized with DC1 Primary. All hosts through connectivity provided by the fabric switches can once again have access to required data.
18
IBM System Storage N series MetroCluster
Figure 12 depicts a node and expansion unit failure.
Host 1
Data Center #1
Data Center #2
Host 2
Dual Failure Recovery – One Step failover with cfo –d command N1
DC1 Primary DC 2 Mirror
N2
DC 2 Primary DC1 Mirror
Figure 12 Node and expansion unit failure
MetroCluster interconnect failure In this scenario, the fabric switch interconnects failed (Figure 13 on page 20). Although this is not a critical failure, resolution must occur promptly in case of a more critical failure. During this period, data access is uninterrupted to all hosts. Mirroring and failover are disabled, which reduces data protection. Upon resolution of interconnect failure, re-syncing of mirrors occurs.
IBM System Storage N series MetroCluster
19
Figure 13 illustrates the MetroCluster interconnect failure, where the fabric switch interconnects failed.
Host 1
Data Center #1
Data Center #2
Host 2
Recovery - No failover; Mirroring disabled - Both filer heads will continue to run serving it’s LUNs/volumes
N1
DC1 Primary DC 2 Mirror
N2
- Re-syncing happens automatically after interconnect is reestablished
DC 2 Primary DC1 Mirror
Figure 13 Interconnect failure
MetroCluster site failure In this scenario, a site disaster occurred, and all switches, storage systems, and hosts are lost (Figure 14 on page 21). To continue data access a cluster failover must be initiated with the cfo -d command. Both primaries now exist at Data Center 2, and hosting of Host1 is also done at Data Center 2. Note: If the site failure is staggered in nature and the interconnect fails before the rest of the site is destroyed there is a chance of data loss. This occurs because processing has continued after the interconnect has failed. Typically site failures occur pervasively and at the same time.
20
IBM System Storage N series MetroCluster
Figure 14 depicts the MetroCluster site failure, where a site disaster occurs, and all switches, storage systems, and hosts are lost.
Figure 14 MetroCluster site failure
MetroCluster site recovery After the hosts, switches, and storage systems are recovered at Data Center 1, a recovery can be performed. A cf giveback command is performed to resume normal operations, as shown in Figure 15 on page 22. Mirrors are re-synchronized and primaries and mirrors are reversed to previous status.
IBM System Storage N series MetroCluster
21
Figure 15 illustrates a MetroCluster site recovery. IBM System Storage
MetroCluster – Site Recovery Data Center Center #1
Host 1
Host 2 Host 1 Emulation
Data Center #2
Site Recovery One Step return to normal CF giveback N2
N1
DC1 Primary reverts back to DC1 after rebuild
DC 2 Primary
DC1 Primary
DC1 Break DC1 Mirror Mirror
DC 2 Mirror N Series Metro Cluster
© 2006 IBM Corporation
Figure 15 MetroCluster site recovery
The team that wrote this Redpaper This Redpaper was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO), Poughkeepsie Center. Alex Osuna is a Project Leader at the International Technical Support Organization, San Jose Center. He has worked in the IT industry for 26 years, including 25 years with IBM. He has extensive experience in software and hardware storage in service, support, planning, early ship programs, performance analysis, education, and published flashes. Alex has also provided presales and postsales support. Roland Tretau is an IBM Certified IT Specialist and Consulting IT Specialist who provides technical leadership for IBM System Storage N series NAS solutions throughout Europe. He has a solid background in project management, consulting, operating systems, storage solutions, enterprise search technologies, and data management. Roland led software development, large-scale IT solution and architectures designs in customers’ mission critical environments. He holds a Master's degree in Electrical Engineering with an emphasis in telecommunications. He is a Red Hat Certified Engineer (RHCE) and a Microsoft® Certified Systems Engineer (MCSE). He also holds a Masters Certificate in Project Management from The George Washington University School of Business and Public Management.
22
IBM System Storage N series MetroCluster
Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. Send us your comments in one of the following ways: Use the online Contact us review redbooks form found at: ibm.com/redbooks Send your comments in an email to: [email protected] Mail your comments to: © Copyright International Business Machines Corporation 2006. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
23
IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 U.S.A.
®
Trademarks The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Redbooks (logo) ESCON®
®
FICON® IBM®
Redpaper
System Storage™
The following terms are trademarks of other companies: WAFL, SyncMirror, SnapMirror, Data ONTAP, and the Network Appliance logo are trademarks or registered trademarks of Network Appliance, Inc. in the U.S. and other countries. Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
24
IBM System Storage N series MetroCluster