INTERNAL USE ONLY AMENDMENT NO:
SUBJECT:
001 Windows Server 2003 Cluster Installation & DR Instructions
ISSUE DATE:
04/03/06 DEPARTMENT:
PREVIOUS DATE ISSUED:
Windows Service Team
Objective This document will provide detailed instructions to complete the installation of a Microsoft Windows Server 2003 two node cluster attached to shared disk. It will also tell you how to manually failover a node as well as how to recover from a failed node or Quorum. The information in this document applies to: • Microsoft Clustered Server • Windows Server 2003 Enterprise Edition • TSM V5.3.0.3
Prerequisites An understanding of the Auto Install process and EMC/shared disk. Knowledge of Virtual Center if creating a cluster using virtual hardware.
Overview This document will take you through the steps needed to install and configure a Windows Server 2003 Cluster on physical or virtual hardware. Also included in this document are instructions on how to create a cluster file share and how to manually failover an active node to a non-active node in order to complete maintenance tasks such as applying patches or installing new software or hardware. In the event of a single node, multiple node, or Quorum drive (cluster database) crash, this document also provides instructions to restore those components to a server with identical hardware. In the event of a Quorum failure, a short cluster outage will occur during the restore. You will need administrator rights to do all of the above.
Responsibilities Windows Service Team
Table of Contents Before You Start.........................................................................................................................2 Checklist for Cluster Server Installation....................................................................................3 Cluster User Account.............................................................................................................3 Server Hardware Requirements ............................................................................................3 Single Site..........................................................................................................................3 Multi-Site...........................................................................................................................3 RFID...................................................................................................................................4 Network Requirements...........................................................................................................4 Single Site..........................................................................................................................4 Multi-Site...........................................................................................................................4 RFID...................................................................................................................................4 EMC / Shared Disk requirements...........................................................................................4 Page 1 of 41
INTERNAL USE ONLY
Single Site..........................................................................................................................4 Multi-Site...........................................................................................................................5 RFID...................................................................................................................................5 Hardware Setup for VMware Clusters.......................................................................................5 Create & Install Virtual Servers.............................................................................................5 Add Components to New Servers..........................................................................................8 Add NIC #2........................................................................................................................8 Installation Overview...............................................................................................................13 Configuring the Heartbeat Network Adapter...........................................................................14 Setting up Shared Disks...........................................................................................................15 Configuring Shared Disks ..............................................................................................15 Assigning Drive Letters...................................................................................................15 Configuring the First Node......................................................................................................16 Configuring the Second Node..................................................................................................21 Configuring the Cluster Groups and Verifying Installation.....................................................24 Test failover to verify that the cluster is working properly..............................................25 Verify cluster network communications...........................................................................25 Configuring a Cluster File Share..............................................................................................28 Failing Over an Active Cluster Node.......................................................................................31 Windows Server 2003 Cluster Restore Instructions.................................................................33 Before You Begin.................................................................................................................33 Single Node Restore Instructions.............................................................................................33 Restoring the Server from Backup ..........................................................................................33 Windows Service Team........................................................................................................33 Storage Management Team..................................................................................................34 Windows Service Team........................................................................................................34 Multiple Node Restore Instructions.........................................................................................36 Restoring the Server from Backup ..........................................................................................36 Windows Service Team........................................................................................................36 Storage Management Team..................................................................................................37 Windows Service Team........................................................................................................38 Quorum Drive Restore Instructions.........................................................................................39 Restoring the Cluster Database (Quorum Drive) ....................................................................39 Windows Service Team........................................................................................................39 Amendment Listing..................................................................................................................41
Before You Start Before you can begin installing and configuring your Windows Server 2003 cluster you will need to have two Windows 2003 Enterprise Edition servers built and connected to the network. The servers must have teamed NICs so that a single network failure will not cause the cluster to fail over. You will also need HBAs installed in each server and you will need to request EMC disk to be configured. The EMC disk needs to be presented to both servers. Besides the EMC disk you need for the application running on the cluster, you will need an additional “Quorum” EMC disk configured. This disk needs to be a minimum of 500MB. The Quorum disk holds the cluster configuration files and should not be used to store any thing else. Besides the NetBios name and IP address for each of the servers (nodes) in the cluster, the cluster itself needs a NetBios name and IP address. To make the installation less confusing,
Page 2 of 41
INTERNAL USE ONLY
rename the network connection on both servers to be used for the heartbeat network (the non-teamed NIC) to Heartbeat. Also, a domain user account is needed for the cluster. You need to request this account from Computer Security by submitting the Global – Non-User ID Request form in Outlook. Make sure that this account has logon rights to only the two servers in the cluster and that the account password does not expire. Please note that this account cannot be requested until the two servers are actually built. Also the password must be a secure one (minimum of 14 characters etc) and it needs to be changed every 180 days. This account is to be used for the cluster service only; no application should be using this ID. The owner should be listed as Bryan Miller (B27021S). Note: Multi-site clusters installed with an odd number (ex. USTCCA001) should have their “A” node at TCC. Multi-site clusters installed with an even number (ex. USTCCA002) should have their “A” node at TCC-West.
Checklist for Cluster Server Installation Cluster User Account • • •
• • • •
• •
Request a domain user account from Computer Security after the two cluster servers are built. The account should only have logon rights to the two servers in the cluster. The password for this account should never expire. The password MUST be a secure one (minimum of 14 characters etc.) and must be changed every 180 days. The account should be used for the cluster service ONLY; no applications on the cluster should use the account. Bryan Miller (B27021S) should be the owner It MUST be entered in to the Windows Service team fire fight account list. After you receive the account from Computer Security please verify that the account is configured as stated above. NOTE: A cluster user account is required for any/all of the three scenarios listed below (single site, multi-site, RFID).
Server Hardware Requirements Single Site • • •
2 Windows 2003 Enterprise Edition servers 3 Ethernet ports per server (2 for a teamed NIC and 1 for the heartbeat network) 2 single-port Emulex HBAs (or one dual-port HBA) per server
Multi-Site • • •
2 Windows 2003 Enterprise Edition servers 3 Ethernet ports per server (2 for a teamed NIC and 1 for the heartbeat network) 2 single-port Emulex HBAs (or one dual-port HBA) per server
Page 3 of 41
INTERNAL USE ONLY
RFID • • •
2 Proliant DL380 G4 servers with 4 - 3.4GHZ CPUs and 3.5GB RAM, installed with Windows 2003 Server Enterprise Edition HP NC7771 NIC installed in slot 1 642 Smart Array controllers installed in slots 2 and 3 (part of the HP StorageWorks Modular Smart Array 500 62 High Availability Kit)
Network Requirements Single Site • •
• • •
1 Cluster IP and NetBios name 1 IP and NetBios name per server (For the teamed NIC) 1 IP per server for the heartbeat network, for clusters running at TCC you need to request the heartbeat IP from Network Operations. Servers not running at TCC use 192.168.1.249 for node A and 192.168.1.250 for node B. 1 Crossover patch cord (Servers at TCC don’t need this) 2 Network ports on the production network per server
Multi-Site • •
• • •
1 Cluster IP and NetBIOS name. Request an IP address from Network Operations using the IP request form and select “TCC/TCC West Spanned Vlan – 165.28.94” or “TCC/TCC West Spanned Vlan – 165.28.111” form the “Site” drop down list. 1 IP and NetBIOS name per server (For the teamed NIC). Request an IP address from NetOps using the IP request form and select “TCC/TCC West Spanned Vlan – 165.28.94” or “TCC/TCC West Spanned Vlan – 165.28.111” form the “Site” drop down list. Note: All three IP’s for the cluster must be on the same Vlan. 1 IP per server for the heartbeat network. Request a heartbeat IP from Network Operations using the IP request form and select “USTC- UNIX Heartbeat 172.12.123” form the “Site” drop down list. 2 Network ports on the production network per server.
RFID • • • • •
1 Cluster IP and NetBIOS name 1 IP and NetBIOS name per server (For the teamed NIC) 1 IP per server for the heartbeat network, (use 192.168.1.249 for node A and 192.168.1.250 for node B) 1 Crossover patch cord 2 Network ports on the production network
EMC / Shared Disk requirements Single Site • • •
1 510MB Quorum disk minimum (must be a minimum of 500mb after formatting) Whatever disk configuration you need for the application you are running. The above disk needs to be seen by both servers.
Page 4 of 41
INTERNAL USE ONLY
Multi-Site • • •
•
1 510MB Quorum disk minimum (must be a minimum of 500mb after formatting) Whatever disk configuration you need for the application you are running. The above disk needs to be seen by both servers. Note: When requesting disk inform the DSM team that this cluster is split between TCC and TCC West.
RFID • • • •
1 HP StorageWorks Modular Smart Array 500 G2 1 HP StorageWorks Modular Smart Array 500 62 High Availability Kit 6 HP 72GB 10K Ultra320 SCSI HDD Configure 2 logical disks o 1 510MB for the Quorum and the remainder of the disk as R: for SQL data.
Hardware Setup for VMware Clusters If you don’t have the physical hardware to create a cluster, you can use VMware to create a cluster using virtual servers and virtual shared disks. However, when doing this, there are several steps to complete in order to properly replicate a true hardware cluster.
Create & Install Virtual Servers 1. Open Virtual Center (Start/All Programs/VMware/VMware Virtual Center). If Virtual Center is not installed on your desktop, please install it from \\usnca001\share\ESMSoftware\VMWare\Virtual Center\1.3.0-16701 2. Right click on the VM host where the guest servers will be created. Select New Virtual Machine. NOTE: Both servers to be used in the cluster should be located on the same ESX host.
Page 5 of 41
INTERNAL USE ONLY
3. Click Next at the window that appears.
4. Select Typical from the two options and then click Next.
5. Select the VM group where the new server will be located and click Next.
6. Select Microsoft Windows as the OS and Windows Server 2003 Enterprise Edition as the version.
Page 6 of 41
INTERNAL USE ONLY
7. Enter the name of the new server (must be lower case) and choose the datastore location for the c:\ drive. DO NOT choose vol_local. Click Next.
8. Choose vlan_04 as the NIC and vlance as the adapter type. Be sure to check connect at power on as well. Click Next.
Page 7 of 41
INTERNAL USE ONLY
9. Select the disk size for the c:\ drive of the new server. Click Next.
10. Click Finish at the next window. The new virtual server should show up in the correct VM group in about 10 seconds. Repeat this task for the second server (node) which will be included in the cluster. After both servers are created in Virtual Center, continue with the install of the new servers using script builder and the auto-install process. Choose to install just the c:\ drive from script builder and be sure to specify that it is an ESX guest. 11. Once the new servers are built power them both off using Virtual Center. NOTE: Both servers to be used in the cluster should be located on the same ESX host.
Add Components to New Servers Add NIC #2 12. After both servers are off, right click on first server (node A) and click Properties.
Page 8 of 41
INTERNAL USE ONLY
13. Click Add at the window below. Click Next when the next window appears.
14. Highlight Ethernet Adapter and then click Next.
15. Choose heartbeat 1 for the NIC and be sure connect at power on is checked. Click Finish.
Page 9 of 41
INTERNAL USE ONLY
16. Verify that Adapter Type for both NIC1 & NIC2 is set to vmxnet. Click OK.
17. Repeat steps 13-17 for Node B.
Add Shared Disk 18. Click Add again. Click Next when the next window appears. 19. This time, choose Hard Disk from the list and then click Next. 20. Select Create New Virtual Disk, then click Next.
21. Select the size and location of the disk you are creating, then click Next. This disk should be located on the same datastore as the first disk you created when building the server if at all possible. Also, this disk should be the quorum disk, so it needs to be at least 500mb after formatting.
Page 10 of 41
INTERNAL USE ONLY
22. This new disk should be set on SCSI 1:0, then click Finish.
23. Repeat steps 19-23 for any other shared disks needed. Be sure to attach them to the next available opening on SCSI Controller 1, not 0. (ex. SCSI 1:1)
24. Once all shared disks are created, click OK to exit the virtual machine properties. 25. Re-open the virtual machine properties again and verify that the SCSI Bus Sharing for SCSI Controller 1 is set to Virtual and that all shared disks were created successfully. Click OK.
26. For Node B, repeat steps 19 & 20. Then select Use An Existing Virtual Disk, then click Next.
Page 11 of 41
INTERNAL USE ONLY
27. Select the correct datastore where you created the shared disk on Node A, then click Browse.
28. Choose the first shared disk file that was created with Node A and select Open. You may need to review the properties for Node A to be sure you are selecting the same disk file. Click Next at the next window that appears.
29. Be sure to add this new shared disk to the same SCSI controller and port that it is using on Node A.
Page 12 of 41
INTERNAL USE ONLY
30. Repeat this process for the remaining shared disks that were created on Node A. In the end, you want both nodes to have the exact same configuration. See below for an example of two servers using the same shared drives on the same host.
Once you have finished setting up both nodes you may power on Node A, but only Node A at this time. Follow the steps below to setup the heartbeat NIC, the shared disks, and then the first node in the cluster. Once you have done all those steps, you may power on Node B and continue with adding it to the cluster.
Installation Overview During the installation process, some nodes will be shut down and some nodes will be rebooted. These steps are necessary to guarantee that the data on disks that are attached to the shared storage bus is not lost or corrupted. This can happen when multiple nodes try to simultaneously write to the same disk that is not yet protected by the cluster software. Use Table 1 below to determine which nodes and storage devices should be powered on during each step. Table 1. Power Sequencing for Cluster Installation
Page 13 of 41
INTERNAL USE ONLY
Step
Node A
Node B
Storage Comments
Setting Up Networks
On
On
Off
Disconnect Fiber from the HBAs on both nodes.
Setting up Shared Disks
On
Off
On
Shutdown both nodes. Connect Fiber to the HBAs on both nodes, then power on node A.
Verifying Disk Configuration
Off
On
On
Shut down node A, power on node B.
Configuring the First Node
On
Off
On
Shutdown both nodes; power on node A.
Configuring the Second Node
On
On
On
Power on node B after node A was successfully configured.
Post-installation
On
On
On
At this point all nodes should be on.
Configuring the Heartbeat Network Adapter To avoid possible data corruption and as a Microsoft best practice, make sure that at this point that the fiber to the Emulex HBA is disconnected for both servers. Perform the following steps on both nodes of the cluster. At this point the crossover patch cord should be connected.
1. 2. 3. 4. 5. 6. 7. 8.
Right-click My Network Places and then click Properties. Right-click the Heartbeat icon. Right-click Heartbeat, click Properties, and click Configure. Click Advanced. Set the Speed & Duplex to 100Mb Full , click OK (gig full at TCC) Click Transmission Control Protocol/Internet Protocol (TCP/IP). Click Properties. Click the radio-button for Use the following IP address and type in the address supplied by Network Operations or if the cluster is not running at TCC use following address: 192.168.1.249 for node A. Use 192.168.1.250 for node B. 9. Type in a subnet mask of 255.255.255.0. (Leave default gateway blank) 10. Click the Advanced radio button and select the WINS tab. Select Disable NetBIOS over TCP/IP. Click Yes to This connection has an empty primary WINS address prompt. Click OK, and OK to exit. Do these steps for the heartbeat network adapter only. 11. Repeat steps 1-10 on node B The window should now look like this.
Page 14 of 41
INTERNAL USE ONLY
Setting up Shared Disks Power off node B. Make sure that the Fiber cable from the SAN is connected to the HBA on node A.
Configuring Shared Disks 1. Right click My Computer, click Manage, and click Storage. 2. Double-click Disk Management. (Cancel Write Signature and Upgrade Disk wizard) 3. Verify that all shared disks are formatted as NTFS and are designated as Basic. If you connect a new drive, the Write Signature and Upgrade Disk Wizard starts automatically. If this happens, click Next to go through the wizard. The wizard sets the disk to dynamic. To reset the disk to Basic, right-click Disk # (where # specifies the disk you are working with) and click Revert to Basic Disk. NOTE: For SAN connected devices the Storage team must write the signatures with the Diskpar command to set the Sectors Per Track to the proper offset.
4. 5. 6. 7. 8. 9.
Right-click unallocated disk space. Click Create Partition… The Create Partition Wizard begins. Click Next twice. Enter the desired partition size in MB and click Next. Accept the default drive letter assignment by clicking Next. Click Next to format and create partition.
Assigning Drive Letters After the disks and partitions have been configured, drive letters must be assigned to each partition on each clustered disk. The Quorum disk (this disk holds the cluster information and will usually be 1GB or less in size) should be assigned drive letter Q:.
1. Right-click the partition and select Change Drive Letter and Path. 2. Click Change and select a new drive letter, Click OK and then Yes. Page 15 of 41
INTERNAL USE ONLY
3. Right-click the partition select Properties. Assign the disk label using the cluster name (example: XXTCCA004-D) click OK. 4. Repeat steps 1 thru 3 for each shared disk.
5. When finished, the Computer Management window should look like Figure above. Now close the Computer Management window. 6. Reboot Server.
Configuring the First Node Note: During the configuration of Cluster service on node A, node B must powered Off. 1. Logon to node A with your admin account 2. Click Start, All Programs, Administrative Tools, Cluster Administrator
3. At the “Open Connection to Cluster” screen select “Create New Cluster” from the drop down menu. Then click OK.
Page 16 of 41
INTERNAL USE ONLY
4. At the “Welcome to The New Server Cluster Wizard” screen click Next. 5. At the “Cluster Name and Domain” screen select the domain the cluster will be in from the drop down menu and enter the cluster name in the Cluster name field. Then click Next.
6. At the “Select Computer” screen verify that the Computer name is that of node A. Then click Next.
Page 17 of 41
INTERNAL USE ONLY
7. The “Analyzing Configuration” screen will appear and verify the server configuration. If the setup is acceptable the next button will be available. Click Next to continue. If not check the log and troubleshoot.
Page 18 of 41
INTERNAL USE ONLY
8. Enter the IP address for the Cluster then click Next.
9. At the “Cluster Service Account” screen enter the cluster server account name and password and verify that the domain is correct then click Next.
Page 19 of 41
INTERNAL USE ONLY
10. At the “Proposed Cluster Configuration” screen verify that all is correct. Take a moment to click on the Quorum button and also verify that the Q:\ drive is selected to be used as the Quorum disk. Click Next once configuration is verified.
11. At the “Creating Cluster” screen check for errors, if none click Next.
12. At the “Completing the New Server Cluster Wizard” screen click Finish.
Page 20 of 41
INTERNAL USE ONLY
Configuring the Second Node 1. Power up node B and wait for it to fully load Windows. 2. From node A click Start, All Programs, Administrative Tools, Cluster Administrator 3. At the “Cluster Administrator” right click on the cluster then go to New then click Node.
4. At the “Welcome to Add Nodes Wizard” click Next . 5. At the “Select Computers” screen enter the server name for node B then click Add, then Next.
Page 21 of 41
INTERNAL USE ONLY
6. The “Analyzing Configuration” screen will appear and verify the server configuration. If the setup is acceptable the next button will be available. Click Next to continue. If not, check the log and troubleshoot.
Page 22 of 41
INTERNAL USE ONLY
7. At the “Cluster Service Account” screen enter the cluster user account password then click Next.
8. At the “Proposed Cluster Configuration” screen, verify the configuration and if it is correct click Next.
Page 23 of 41
INTERNAL USE ONLY
9. At the “Adding Node to Cluster” screen click Next.
10. At the “Completing the Add Nodes Wizard” screen click finish.
Configuring the Cluster Groups and Verifying Installation The cluster groups need to be renamed. Rename the default group Cluster Group to (If the name of the cluster is USTCCA001 the group Cluster Group will be renamed to USTCCA001) and the Disk Group # to correspond to the application being installed on the cluster appended to the Cluster Name, i.e. SQL#, would be USTCCA001SQL1 and SAP# would be STCCA001SAP1 etc. Naming convention standards can be found at http://esm.kcc.com/serverfacts.aspx
1. Click Start, click Administrative Tools, and click Cluster Administrator.
Page 24 of 41
INTERNAL USE ONLY
2. Right click on the Cluster Group and then click rename and enter the name .
3. Right click on the Disk Group # and then click rename and enter the name that corresponds to the application you are installing on the cluster appended to the Cluster Name I.E USTCCA001SQL1.
Test failover to verify that the cluster is working properly 1. To test Failover Right-Click the Cluster Group and select Move Group. The group and all its resources will be moved to node B. After a short period of time the resources will be brought online on the node B. If you watch the screen, you will see this shift. 2. Move the Cluster Group back to node A by repeating step 1.
Verify cluster network communications 1. You’ll need to verify the cluster network communications settings as a final step in the cluster configuration. Click Start, then Administrative Tools, then Cluster Administrator. 2. Right click on the cluster name, then choose Properties.
Page 25 of 41
INTERNAL USE ONLY
3. Click on the Network Priority tab from the window that appears.
4. Verify that the Heartbeat network is listed first and the Team network is listed second.
5. Select the Heartbeat network from the list and click on Properties. Verify that the Heartbeat network is set to Internal Cluster Communications Only (private network) and then click OK.
6. Select the Team network from the list and click on Properties. Verify that the Team network is set to All Communications (mixed network) and then click
Page 26 of 41
INTERNAL USE ONLY
OK.
7. Click OK to exit the Network Priority screen to return to the Cluster Administrator GUI. Congratulations you have successfully installed and configured a Microsoft Server Cluster. Don’t forget to add the Cluster account to the ESM firefight database. Additionally, please update Asset Management with all the nodes, including the cluster name itself. (Ex. ustcca008, ustcca008a, ustcca008b) Firefight – http://ws.kcc.com/AccountMgmt/ Asset Mgmt - http://ws.kcc.com/Asset/ Example of an Asset Mgmt entry with the cluster name and two nodes listed separately.
When entering the cluster name into Asset Mgmt the following fields should be filled in as follows:
• • • •
• •
Serial # - Name of the cluster node (ex. USTCCA008) Description – Something that designates that this is the cluster node and not a physical server (ex. Cluster node for servers USTCCA008A and USTCCA008B) Asset Tag – Name of cluster node (ex. USTCCA008) Build Type – Cluster Node Name Image Version – Cluster Node Name Hardware Description - _Cluster Name
Page 27 of 41
INTERNAL USE ONLY
Configuring a Cluster File Share Summary In order to create a cluster file share you need to create the file share using the Cluster Administrator GUI or the Cluster command line utility. The following steps will take you through the process of creating a file share resources using the Cluster Administrator GUI.
Configuring a Cluster File Share 1. Logon to one of the cluster nodes with your administrator account 2. Start the Cluster Administrator GUI, Click Start\Administrative Tools\Cluster Administrator. Note - if this is the first time you are running this utility you will be prompted to enter the cluster name. Enter the cluster name and Click Open. 3. Under Groups, right click on cluster group which contains the drive you need to create the share on then go to New\Resource
4. The New Resource window will pop up. Fill in the following fields: Name, Enter the Name of the Share Description, Enter the path to the folder you are sharing Resource Type, From the drop down list select File Share (note: the folder to be shared must currently exist, if not, please create it at this point)
Page 28 of 41
INTERNAL USE ONLY
Click Next
5. In the Possible Owners windows make sure that all nodes of the cluster are listed in the possible owners box then Click Next.
6. In the Dependences Windows Highlight the Disk resources in the resource window that the Folder is on then click Add then Next.
Page 29 of 41
INTERNAL USE ONLY
7. In the File Share Parameters window fill in the following fields; Share name, Enter the name of the share Path, Enter the path to the folder you are sharing
Then Click Finish and then OK.
8. The newly created share will be offline and you will need to bring it online. Right click on the share you just created than click on Bring Online.
Page 30 of 41
INTERNAL USE ONLY
9. To set security on the share you right click on the share resource then select Properties then click on the Parameters tab then click on the Permissions button and add the security rights you need. 10. You can verify that the new share is online by accessing it from your desktop by clicking Start/Run and then typing \\\<share name>
Failing Over an Active Cluster Node The following instructions will explain how to manually failover a cluster node using the Cluster Administrator GUI. 1. Logon to one of the cluster nodes with your administrator account.
2. Start the Cluster Administrator GUI, Click Start\All Programs\Administrative Tools\Cluster Administrator. Note - if this is the first time you are running this utility you will be prompted to enter the cluster name. Enter the cluster name and Click OK.
3. Under Groups click on the cluster group name (XXTCCA004 in the screen shot example below). The right pane will display information about all of the resources in the group. The Owner column will list the node that is currently running the cluster. Make note of the owner as well as the state of the resources. In most cases, all of the resources will be online and should be online again after you failover the cluster.
4. To failover the active node, under Groups right click on XXTCCA004 then click on Move Group. If more than two nodes exist as part of the cluster you can choose a specific node or choose Best Possible. If Best Possible is chosen, the group will move to one of the available nodes based on preferred ownership designation for that specific resource group.
Page 31 of 41
INTERNAL USE ONLY
5. You will notice the that state of the resources will go to offline pending, then to offline, then to online pending, and finally to online. The owner will also switch at this time, and this whole process will take about 10 to 15 seconds to complete. After it is completed all the resources that were online before will now be online and the owner will have changed.
Repeat steps 3-5 for all resource groups listed under the Groups heading.
Page 32 of 41
INTERNAL USE ONLY
Windows Server 2003 Cluster Restore Instructions Before You Begin Before the process is started, you will need to gather some information about the server you are restoring: 1. Server name and IP information (i.e. IP address, default gateway, subnet mask) 2. The service pack that was installed prior to the crash 3. Cluster Name 4. Cluster account and password 5. Blank 1.44MB floppy diskette 6. Compare hardware – Document any differences
Single Node Restore Instructions The instructions below will detail how to restore a single node of a failed Windows Server 2003 cluster.
Restoring the Server from Backup Windows Service Team 1. These instructions are based on you restoring the server to identical hardware.
2. Create an install script from the following location (http://esm.kcc.com/ScriptBuilder/Win2000Server/Default.aspx?OS=WS03) using the same server name and IP address. Select Disaster Recovery as “Application type”, Select the same service pack version as the crashed server had installed. Make sure you select C: drive only.
3. Make sure that the fiber cable to the SAN is disconnected. For VMWare clusters, make sure the shared disks are removed from the server properties in Virtual Center before installing the DR build.
4. Install Windows 2003 using the Auto Install Floppy Diskette http://esm.kcc.com/ScriptBuilder/AutoInstallDocs.aspx
5. Logon to the server with the following credentials: user name is administrator and password is Admin1. These are case sensitive. Note: After the DR auto-install completes, the server will probably auto-logon the first time with the administrator account. 6. Make sure the Network Card is set to communicate at 100/Full (If the switch can handle 100/Full). In other words, make sure the Network card is configured the same as the switch port. Do not leave it set to Auto/Auto.
7. Browse to c:\windows\system32\ and copy HAL.DLL, NTKRNLPA.EXE, and NTOSKRNL.EXE to c:\temp
8. Perform a search for the three files mentioned above. Take note of the paths to all instances found. Disregard those paths leading to *.cab files. You will have to copy these file back to those locations later.
Page 33 of 41
INTERNAL USE ONLY
9. Install the TSM client on the server by mapping a drive to \\\smspkgd$ and then running \N040000e\Script\DR.bat. 10. Enable remote desktop connections on server if not already enabled (right click on server name icon on desktop, Properties, Remote tab, Enable Remote Desktop on this computer, OK) 11. Install correct service pack that was on the server before it crashed, most likely SP1. Reboot server. 12. Log back into server. Check the three files from step 7, if newer files exist after service pack is installed, then copy them to c:\temp and overwrite previously copied files. 13. Contact the Storage Management team to restore the required information.
Storage Management Team 1. On the desktop, double-click the “TSM Backup Client” icon. 2. Click the Restore button. 3. Click on the “+” next to “File Level”. You will see: \\<ServerName>\c$ (C:) 4. Select the folders needed to complete the restore.
5. Click the Options button. 6. From the “Action for Files That Already Exist”, Select “Replace” from the dropdown box. Click the check box next to “Replace file even if read-only/locked” (You may still be prompted to make a decision. Always select Replace).
7. Click OK. 8. Click Restore. 9. Verify the radio button for “Original location” is selected. Click Restore – If a message asking to restart the machine pops up select No…. Always.
10. Click OK. 11. When prompted to reboot server, click NO. 12. Close the TSM Backup Client.
Windows Service Team 1) Now run NT Backup to restore the system state. a) Browse to c:\sstate\ and double-click systemstate.bkf. b) Ignore the wizard that pops up and instead choose Advanced Mode. c) An NT backup window should appear. Go to the tools menu and select options.
Page 34 of 41
INTERNAL USE ONLY
d) Click on the Restore tab and select always replace the file on my e) f)
g) h) i) j) k) l) m) n) o) p) q) r) s) t)
computer. Click OK. Click on the Restore tab and highlight file. Right-click and choose catalog file….. Enter the path to the catalog file (c:\sstate\systemstate.bkf) and hit OK. Click on the “+” sign next to file. Click OK on Popup window if it appears. Click on the “+” sign next to System State.bkf created… that has the most recent date/time Click OK on Popup window if it appears. Make sure the path still points to c:\sstate\systemstate.bkf and click OK. Check the box next to system state. Make sure the dropdown menu restore files to: Original Location is selected. Click Start Restore. A warning message stating “Restoring system state will always overwrite current system state……..etc. Click OK. A Confirm Restore message box will appear. Click OK. Click OK on Popup window if it appears. An Enter Backup File Name window may appear. Verify the path to be c:\sstate\systemstate.bkf and click OK. When the restore completes, click Close. You’ll be prompted to reboot. Click on “NO”. Close the NT Backup window.
2) Browse to c:\temp\ and copy HAL.DLL, NTKRNLPA.EXE, and NTOSKRNL.EXE back to the locations you found them in earlier (should be c:\windows\system32)
3) Reboot the server. 4) On startup, press F8 and then select “Safe Mode with Networking” from the menu. 5) Logon with a KCUS “Administrative” account.
6) Continue to install any new hardware that is detected. 7) Open Device manager. Under “Display Adapters” delete any/all defined. By “System Device” if there is an “!” next to “Compaq …..System Management Controller”. Delete the device. Close Device Manager and Reboot. 8) Logon with a KCUS “Administrative” account.
9) Map T: to \\\SMSPKGD$ 10) Reapply the service pack version of the “Crashed Server”, Restart.
11) Reboot and repeat steps 9 and 10. 12) Apply the Compaq Support pack. (N040037C\Shell37Cs.bat), Restart. 13) Logon with an “Administrative” account. 14) Verify that the server is functioning.
Page 35 of 41
INTERNAL USE ONLY
15) Reconnect the SAN fiber cable to the HBA. (Physical server only, not VMWare) 16) Reboot server 17) If the server is VMWare, add the shared disks back to the server properties thru Virtual Center while the server is off. (This step is only for virtual servers.) 18) Logon with a KCUS “Administrative” account. 19) Verify that the cluster node is activated and functional. Items to check to be sure the rebuild is successful: • • • • •
Verify local disk names Verify IP address settings Verify remote desktop connections Ping cluster name (virtual node) Schedule outage to test failing cluster groups between nodes (primary node to secondary node and back again)
Multiple Node Restore Instructions The instructions below will detail how to restore both nodes of a failed Windows Server 2003 cluster.
Restoring the Server from Backup Windows Service Team 1. These instructions are based on you restoring the server to identical hardware.
2. Create an install script from the following location (http://esm.kcc.com/ScriptBuilder/Win2000Server/Default.aspx?OS=WS03) using the same server name and IP address. Select Disaster Recovery as “Application type”, Select the same service pack version as the crashed server had installed. Make sure you select C: drive only.
3. Make sure that the fiber cable to the SAN is disconnected. For VMWare clusters, make sure the shared disks are removed from the server properties in Virtual Center before installing the DR build.
4. Install Windows 2003 using the Auto Install Floppy Diskette http://esm.kcc.com/ScriptBuilder/AutoInstallDocs.aspx
5. Logon to the server with the following credentials: user name is administrator and password is Admin1. These are case sensitive. Note: After the DR auto-install completes, the server will probably auto-logon the first time with the administrator account. 6. Make sure the Network Card is set to communicate at 100/Full (If the switch can handle 100/Full). In other words, make sure the Network card is configured the same as the switch port. Do not leave it set to Auto/Auto.
Page 36 of 41
INTERNAL USE ONLY
7. Browse to c:\windows\system32\ and copy HAL.DLL, NTKRNLPA.EXE, and NTOSKRNL.EXE to c:\temp
8. Perform a search for the three files mentioned above. Take note of the paths to all instances found. Disregard those paths leading to *.cab files. You will have to copy these file back to those locations later.
9. Install the TSM client on the server by mapping a drive to \\\smspkgd$ and then running \N040000e\Script\DR.bat. 10. Enable remote desktop connections on server if not already enabled (right click on server name icon on desktop, Properties, Remote tab, Enable Remote Desktop on this computer, OK) 11. Install correct service pack that was on the server before it crashed, most likely SP1. Reboot server. 12. Log back into server. Check the three files from step 7, if newer files exist after service pack is installed, then copy them to c:\temp and overwrite previously copied files. 13. Contact the Storage Management team to restore the required information.
Storage Management Team 1. On the desktop, double-click the “TSM Backup Client” icon. 2. Click the Restore button. 3. Click on the “+” next to “File Level”. You will see: i. \\<ServerName>\c$ (C:) 4. Select the folders needed to complete the restore.
5. Click the Options button. 6. From the “Action for Files That Already Exist”, Select “Replace” from the dropdown box. Click the check box next to “Replace file even if read-only/locked” (You may still be prompted to make a decision. Always select Replace).
7. Click OK. 8. Click Restore. 9. Verify the radio button for “Original location” is selected. Click Restore – If a message asking to restart the machine pops up select No…. Always.
10. Click OK. 11. When prompted to reboot server, click NO. 12. Close the TSM Backup Client.
Page 37 of 41
INTERNAL USE ONLY
Windows Service Team 1. Now run NT Backup to restore the system state. a) Browse to c:\sstate\ and double-click systemstate.bkf. b) Ignore the wizard that pops up and instead choose Advanced Mode. c) An NT backup window should appear. Go to the tools menu and select options. d) Click on the Restore tab and select always replace the file on my computer. Click OK. e) Click on the Restore tab and highlight file. Right-click and choose catalog file….. f) Enter the path to the catalog file (c:\sstate\systemstate.bkf) and hit OK. g) Click on the “+” sign next to file. h) Click OK on Popup window if it appears. i) Click on the “+” sign next to System State.bkf created… that has the most recent date/time j) Click OK on Popup window if it appears. k) Make sure the path still points to c:\sstate\systemstate.bkf and click OK. l) Check the box next to system state. m) Make sure the dropdown menu restore files to: Original Location is selected. Click Start Restore. n) A warning message stating “Restoring system state will always overwrite current system state……..etc. Click OK. o) A Confirm Restore message box will appear. Click OK. u) Click OK on Popup window if it appears. v) An Enter Backup File Name window may appear. Verify the path to be c:\sstate\systemstate.bkf and click OK. w) When the restore completes, click Close. x) You’ll be prompted to reboot. Click on “NO”. y) Close the NT Backup window.
2. Browse to c:\temp\ and copy HAL.DLL, NTKRNLPA.EXE, and NTOSKRNL.EXE back to the locations you found them in earlier (should be c:\windows\system32)
3. Reboot the server. 4. On startup, press F8 and then select “Safe Mode with Networking” from the menu. 5. Logon with a KCUS “Administrative” account.
6. Continue to install any new hardware that is detected. 7. Open Device manager. Under “Display Adapters” delete any/all defined. By “System Device” if there is an “!” next to “Compaq …..System Management Controller”. Delete the device. Close Device Manager and Reboot. 8. Logon with a KCUS “Administrative” account.
9. Map T: to \\\SMSPKGD$ 10. Reapply the service pack version of the “Crashed Server”, Restart.
11. Reboot and repeat steps 9 and 10. Page 38 of 41
INTERNAL USE ONLY
12. Apply the Compaq Support pack. (N040037C\Shell37Cs.bat), Restart. 13. Logon with an “Administrative” account. 14. Verify that the server is functioning.
15. Reconnect the SAN fiber cable to the HBA. (Physical server only, not VMWare) 16. Reboot server 17. If the server is VMWare, add the shared disks back to the server properties thru Virtual Center while the server is off. (This step is only for virtual servers.) 18. Logon with a KCUS “Administrative” account. 19. Verify that the cluster node is activated and functional. Items to check to be sure the rebuild is successful: • • • • • •
Verify local disk names Verify IP address settings Verify remote desktop connections Ping cluster name (virtual node) Verify shared disks Launch cluster administrator to verify cluster and associated groups are operational
20) Restore second node by same process as first node, using all steps above. 21) Verify that the second cluster node is active and functional. 22) Schedule outage to test cluster failover by moving cluster groups from primary to secondary node and then back again.
Quorum Drive Restore Instructions The instructions below will detail how to restore a failed/corrupted cluster database (Quorum drive) for all nodes in a Windows Server 2003 cluster.
Restoring the Cluster Database (Quorum Drive) Windows Service Team 1. Logon to the primary node and reconfigure the Quorum drive using Disk Manager (right click on computer name icon on desktop, Manage, Disk Management) 2. Delete and recreate Q:\ drive. Format as NTFS. Logoff server. 3. Logon to the primary node and open Backup (Start, All Programs, Accessories, System Tools, Backup) 4. Ignore the Backup/Restore wizard and click on “Advanced Mode” instead. 5. Click on the “Restore and Manage Media” tab.
Page 39 of 41
INTERNAL USE ONLY
6. Double-click on the SystemState.bkf file on the right hand pane with the most recent date/time stamp. 7. Now put a check next to “System State” in the expanded view to the left. 8. Be sure “Restore Files to:” is set to Original Location 9. Click “Start Restore”, then “OK” at the pop-up window. 10. On the Confirm Restore window, click “Advanced” 11. Put a check next to “Restore the Cluster Registry to the quorum disk…” and then click “OK”. Click “Yes” at the next pop-up window, then “OK” again. 12. If asked to verify the location of the SystemState.bkf file, please browse to c:\sstate and locate the file, then the restore should begin. This process will stop the cluster service on the primary node and will restore the cluster configuration for that node. 13. When the restore has completed, select YES to reboot the server. This will restart the Cluster service on the primary node and will then stop the Cluster service on all other nodes in the cluster. Backup will then copy the restored cluster database information from the restarted primary node to the Quorum disk (Q:\) and all other nodes in the cluster. 14. Once the primary node is restarted, logon and verify that the Cluster service is running. Also, try to ping the cluster name to be sure it is running properly. 15. Now go to each of the other nodes and start the cluster service manually. Verify the cluster service is running after starting it on each node by doing the following: 16. Schedule outage to test failing cluster groups between nodes (primary node to secondary node and back again)
Page 40 of 41
INTERNAL USE ONLY
Amendment Listing AMENDMENT NO.
001
DATE OF ISSUE
04/03/06
SUMMARY OF AMENDMENT
First issue
RECOMMENDED BY
Bryan Miller
This procedure has been prepared to comply with the requirements of Kimberly-Clark and the Corporate Financial Instructions of Kimberly-Clark Corporation. It is important that no deviation from the procedure occurs without prior reference to: Windows Services
Page 41 of 41