Oracle RAC Administration - Controlling the Cluster with CRS command Tarry Singh,
[email protected]
Brief intro Starting with this article, we will begin a discussion on administering our RAC database. Administration can mean many things; think of all that you do as an Oracle DBA--backup, restore, export, import, tuning, benchmark and even some PL/SQL work (It is not just for programmers you know). Administering your RAC does however bring more challenges. You need solid tools in place (like SoRAC for instance); the new SQL developer can be your friend as well. TOAD is a commonly used tool among DBAs and Developers that does loads of work. And let's not forget the good old sqlplus. In Part 5 of our "RACing ahead with Oracle on Vmware" series, we took a brief look at the SRVCTL utility. In this article, we will look at the CRS and CRSCTL utility. We will treat our Oracle documentation as our sole guide.
Introduction to CRS Let's take a quick look at what CRS is and what it does. Let's also check out the command line paramters and what, exactly, they do. crs_getperm: Checks the permissions that are associated with each resource. Syntax: crs_getperm resource_name [-u user|-g group] Alternatively, you can just type crs_getperm resource_name and you will get the associated permissions. crs_profile: Creates, validates, deletes, and updates an Oracle Clusterware profile. You can also use crs_profile to generate a template script. In a profile, we decide how we will have the cluster managed and monitored. Now after you have created an application profile and registered the application with Oracle Clusterware using the crs_register command, you can use other Oracle Clusterware commands, such as crs_stat, crs_start, crs_stop, crs_relocate, and crs_unregister, on the application. This comes in very handy in troubleshooting your resources, as they sometimes go awry. Moreover, an error message asking for “human interference” really needs a human to type in the necessary parameters to check the application's condition. You must register applications using the crs_register command before the other Oracle Clusterware commands are available to manage the application.
Syntax: Doing the following will create a profile: crs_profile -create resource_name -t application [-a action_script] [-B executable_pathname] [-dir directory] [-d description] [-p placement_policy] [-h hosting_nodes] [-r required_resources] [-l optional_resources] [-o option,[...]] [attribute_flag attribute_value] [...] [-f] [-q] Then create an application profile from a template: crs_profile -create resource_name -I template_file [-f] [-q] Validating: crs_profile -validate resource_name [-q] List one or more application profiles: crs_profile -print [resource_name [...]] [-q] Using a template using an existing profile: crs_profile -template resource_name [-O template_file] [-q] Updating: crs_profile -update resource_name [option [...]] [-q] Deleting an application profile and its associated action program: crs_profile -delete resource_name [-q] crs_register: This command registers configuration information for an application with the OCR. The crs_register command registers one or more applications specified with the resource_name parameter for each application. Clearly, before starting, stopping or performing any other operations you need to register an application first. It is important to note that you must have write access to the target location. The CRS daemon must be running; if any of the fields are missing then the profile is merged with the default template. Ownership and permissions are decided at the time of resources registration. Naturally, the user who registered the application is the owner. For both the crs_profile and crs_register, read and write permissions must be available. Then we can use the crs_stat command to check if the applications are registered. Syntax: You can use the crs_register command to register and update applications.
crs_register resource_name [-dir directory_path] [...] [-u] [-f] [-q] crs_relocate: Here relocation of an application profile from one node to another takes place. Again, the application to be relocated must be registered first. Upon typing crs_relocate command, Oracle Clusterware transfers the resource to be relocated by effectively stopping it on the source node and then starting it on the destination node. If that does not happen, then you can always do the crs_stop -f command on the resource and restart it with crs_start to return it to the ONLINE state, before you attempt to relocate it again. All the actions are echoed on the command line. You can, however, also monitor them using the Event Manager (EVM). Syntax: crs_relocate crs_relocate crs_relocate [-q] crs_relocate
resource_name [...] [-c cluster_node] [-f][-q] resource_name [-c cluster_node] [-q] [USR_attribute_name=value] [...] resource_name [-c cluster_node] -s source_node [-c cluster_node] [-q]
crs_setperm: This command sets and modifies permissions associated with a resource. It is actually the same as our good old chmod command in UNIX/LINUX systems. Syntax: crs_setperm crs_setperm crs_setperm crs_setperm
resource_name resource_name resource_name resource_name
-u -x -o -g
aclstring [-q] aclstring [-q] user_name [-q] group_name [-q]
crs_stat : Lists the status of an application profile; I use it often it to check out my resources status. Several times, I have had to start some of the resources individually to get my RAC on my (rather lightweight) VMware setup. As mentioned above the resource must have read and execute permissions except when supplying the -g option, that option is for anyone to use in order to determine if a resource exists or not. Resources can either be ONLINE or OFFLINE as shown in the STATE attribute. If the resource is online and the cluster node fails then the Clusterware tries to restart the application on another node. It goes without saying that these resources must be ONLINE unless you have a specific reason to FORCE them to stay OFFLINE. They can also be offline if the resource has a failure count higher than the failure threshold, in which case the TARGET is changed to OFFLINE. You can however bring it online again by using the crs_start command. I personally like the –v (verbose status) option since I like to see what is happening when I am administering the application and when troubleshooting it always comes in handy. The RESTART_COUNT tells us how many times an application resource was restarted on a node. FAILURE_COUNT tells us the failure
count of a resource within the FAILURE_INTERVAL as mentioned in the profile when we created it. Once it overshoots the FAILURE_THRESHOLD parameter then it stops. The only time a FAILOVER_STATUS is shown in the verbose mode, is when an application has to wait before being relocated (owing to a cluster node failure on one node) due to the profile's FAILOVER_DELAY attribute. The FAILOVER_STATUS field goes on to show on which node the application failed and the time required for that node to restart before restarting on the other node. Syntax: crs_stat crs_stat crs_stat crs_stat crs_stat crs_stat crs_stat
[resource_name [...]] [-v] [-l] [-q] [-c cluster_node] [resource_name [...]] -t [-v] [-q] [-c cluster_node] -p [resource_name [...]] [-q] [-a] resource_name -g [-a] resource_name -r [-c cluster_node] -f [resource_name [...]] [-q] [-c cluster_node] -ls resource_name
Introduction to CRS
crs_start This command sets all the applications or the specified applications ONLINE, and attempts to start the specified registered applications or application resources. You should have full admin privileges for this.
Syntax crs_start resource_name [...] [-c cluster_node] [-q] [-f] crs_start -all [-q] crs_start [USR_attribute_name=value] [...] resource_name [-c node_name] [-q]
crs_stop This command stops a resource on the specified node. Checking your crs_stat always helps to verify which application/resource we want to stop.
Syntax crs_stop resource_name [...] [-f] [-q] crs_stop -c cluster_node [...] [-q] crs_stop -all [-q] crs_stop [USR_attribute_name=value] [...] resource_name [-q] -c cluster_node [...]
crs_unregister The crs_unregister command removes the registration information of Oracle Clusterware resources.
Syntax crs_unregister resource_name [...] [-q]
Correcting our RAC problem We have seen the CRS commands and syntax, now lets look at the practical aspect of it. I was having some trouble while running my RAC on RHEL4 (I think the problem primarily arose because I supplied my virtual machines with inadequate memory. When I noticed that I was getting some alerts on my SoRAC tool, I looked at the status of my RAC. oracle@vmora01rh4 ~]$ cd /u01/app/oracle/product/10.2.0/crs/bin [oracle@vmora01rh4 bin]$ crs_stat -t Name Type Target State Host -----------------------------------------------------------ora.fokerac.db application ONLINE ONLINE vmora02rh4 ora....c1.inst application ONLINE ONLINE vmora01rh4 ora....c2.inst application OFFLINE UNKNOWN vmora02rh4 ora....serv.cs application ONLINE ONLINE vmora02rh4 ora....ac1.srv application ONLINE ONLINE vmora01rh4 ora....ac2.srv application ONLINE OFFLINE ora....SM1.asm application ONLINE ONLINE vmora01rh4 ora....H4.lsnr application ONLINE ONLINE vmora01rh4 ora....rh4.gsd application ONLINE UNKNOWN vmora01rh4 ora....rh4.ons application ONLINE UNKNOWN vmora01rh4 ora....rh4.vip application ONLINE ONLINE vmora01rh4 ora....SM2.asm application OFFLINE UNKNOWN vmora02rh4 ora....H4.lsnr application OFFLINE UNKNOWN vmora02rh4 ora....rh4.gsd application ONLINE UNKNOWN vmora02rh4 ora....rh4.ons application OFFLINE UNKNOWN vmora02rh4 ora....rh4.vip application ONLINE ONLINE vmora02rh4 As you can see above, some of the applications are UNKNOWN or OFFLINE, either of which is not good for my RAC. The crs_stat command gives you the names of the applications, which you might need to shut down some applications manually, in order to shut the whole cluster down and restart it. [oracle@vmora01rh4 bin]$ crs_stat NAME=ora.fokerac.db TYPE=application TARGET=ONLINE STATE=ONLINE on vmora02rh4 NAME=ora.fokerac.fokerac1.inst TYPE=application TARGET=ONLINE
STATE=ONLINE on vmora01rh4 NAME=ora.fokerac.fokerac2.inst TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.fokerac.fokeserv.cs TYPE=application TARGET=ONLINE STATE=ONLINE on vmora02rh4 NAME=ora.fokerac.fokeserv.fokerac1.srv TYPE=application TARGET=ONLINE STATE=ONLINE on vmora01rh4 NAME=ora.fokerac.fokeserv.fokerac2.srv TYPE=application TARGET=ONLINE STATE=OFFLINE NAME=ora.vmora01rh4.ASM1.asm TYPE=application TARGET=ONLINE STATE=ONLINE on vmora01rh4 NAME=ora.vmora01rh4.LISTENER_VMORA01RH4.lsnr TYPE=application TARGET=ONLINE STATE=ONLINE on vmora01rh4 NAME=ora.vmora01rh4.gsd TYPE=application TARGET=ONLINE STATE=UNKNOWN on vmora01rh4 NAME=ora.vmora01rh4.ons TYPE=application TARGET=ONLINE STATE=UNKNOWN on vmora01rh4 NAME=ora.vmora01rh4.vip TYPE=application TARGET=ONLINE STATE=ONLINE on vmora01rh4
NAME=ora.vmora02rh4.ASM2.asm TYPE=application TARGET=OFFLINE STATE=UNKNOWN on vmora02rh4 NAME=ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr TYPE=application TARGET=OFFLINE STATE=UNKNOWN on vmora02rh4 NAME=ora.vmora02rh4.gsd TYPE=application TARGET=ONLINE STATE=UNKNOWN on vmora02rh4 NAME=ora.vmora02rh4.ons TYPE=application TARGET=OFFLINE STATE=UNKNOWN on vmora02rh4 NAME=ora.vmora02rh4.vip TYPE=application TARGET=ONLINE STATE=ONLINE on vmora02rh4 I could have also attempted to stop them all using crs_stop –all, but it normally throws enough errors to force you do it manually one by one. [oracle@vmora01rh4 bin]$ crs_stop -all Attempting to stop `ora.vmora01rh4.ons` on member `vmora01rh4` Attempting to stop `ora.vmora02rh4.ons` on member `vmora02rh4` `ora.vmora02rh4.ons` on member `vmora02rh4` has experienced an unrecoverable failure. Human intervention required to resume its availability. Stop of `ora.vmora01rh4.ons` on member `vmora01rh4` succeeded. Attempting to stop `ora.vmora01rh4.ASM1.asm` on member `vmora01rh4` Attempting to stop `ora.fokerac.fokerac2.inst` on member `vmora02rh4` `ora.fokerac.fokerac2.inst` on member `vmora02rh4` has experienced an unrecoverable failure. Human intervention required to resume its availability. Attempting to stop `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` has experienced an unrecoverable failure. Human intervention required to resume its availability. Attempting to stop `ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr` on member `vmora02rh4`
`ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr` on member `vmora02rh4` has experienced an unrecoverable failure. Human intervention required to resume its availability. Attempting to stop `ora.fokerac.fokerac2.inst` on member `vmora02rh4` `ora.fokerac.fokerac2.inst` on member `vmora02rh4` has experienced an unrecoverable failure. Human intervention required to resume its availability. Attempting to stop `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` has experienced an unrecoverable failure. Human intervention required to resume its availability. Attempting to stop `ora.vmora02rh4.vip` on member `vmora02rh4` Stop of `ora.vmora02rh4.vip` on member `vmora02rh4` succeeded. Stop of `ora.vmora01rh4.ASM1.asm` on member `vmora01rh4` succeeded. Attempting to stop `ora.vmora01rh4.LISTENER_VMORA01RH4.lsnr` on member `vmora01rh4` Stop of `ora.vmora01rh4.LISTENER_VMORA01RH4.lsnr` on member `vmora01rh4` succeeded. Attempting to stop `ora.vmora01rh4.vip` on member `vmora01rh4` Stop of `ora.vmora01rh4.vip` on member `vmora01rh4` succeeded. CRS-0216: Could not stop resource 'ora.vmora02rh4.ASM2.asm'. CRS-0216: Could not stop resource 'ora.vmora02rh4.ons'. CRS-0216: Could not stop resource 'ora.vmora02rh4.vip'. For the very same reason we will go ahead and do it our way. Therefore, we need to stop our instances first. [oracle@vmora01rh4 bin]$ srvctl stop instance -d fokerac -i fokerac1 [oracle@vmora01rh4 bin]$ srvctl stop instance -d fokerac -i fokerac2 Check our status [oracle@vmora01rh4 bin]$ crs_stat -t Name Type Target State Host -----------------------------------------------------------ora.fokerac.db application OFFLINE OFFLINE ora....c1.inst application OFFLINE OFFLINE ora....c2.inst application OFFLINE OFFLINE ora....serv.cs application ONLINE UNKNOWN vmora02rh4 ora....ac1.srv application OFFLINE OFFLINE ora....ac2.srv application OFFLINE OFFLINE ora....SM1.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora....rh4.gsd application ONLINE UNKNOWN vmora01rh4 ora....rh4.ons application OFFLINE OFFLINE
ora....rh4.vip ora....SM2.asm ora....H4.lsnr ora....rh4.gsd ora....rh4.ons ora....rh4.vip
application application application application application application
OFFLINE OFFLINE OFFLINE ONLINE OFFLINE OFFLINE
OFFLINE UNKNOWN UNKNOWN UNKNOWN UNKNOWN OFFLINE
vmora02rh4 vmora02rh4 vmora02rh4 vmora02rh4
Stop the service [oracle@vmora01rh4 bin]$ srvctl stop service -d fokerac -s fokeserv Check status again [oracle@vmora01rh4 bin]$ crs_stat -t Name Type Target State Host -----------------------------------------------------------ora.fokerac.db application OFFLINE OFFLINE ora....c1.inst application OFFLINE OFFLINE ora....c2.inst application OFFLINE OFFLINE ora....serv.cs application OFFLINE OFFLINE ora....ac1.srv application OFFLINE OFFLINE ora....ac2.srv application OFFLINE OFFLINE ora....SM1.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora....rh4.gsd application ONLINE UNKNOWN vmora01rh4 ora....rh4.ons application OFFLINE OFFLINE ora....rh4.vip application OFFLINE OFFLINE ora....SM2.asm application OFFLINE UNKNOWN vmora02rh4 ora....H4.lsnr application OFFLINE UNKNOWN vmora02rh4 ora....rh4.gsd application ONLINE UNKNOWN vmora02rh4 ora....rh4.ons application OFFLINE UNKNOWN vmora02rh4 ora....rh4.vip application OFFLINE OFFLINE OK, so we need to stop those applications now. [oracle@vmora01rh4 bin]$ crs_stop ora.vmora01rh4.gsd Attempting to stop `ora.vmora01rh4.gsd` on member `vmora01rh4` Stop of `ora.vmora01rh4.gsd` on member `vmora01rh4` succeeded. [oracle@vmora01rh4 bin]$ crs_stop ora.vmora02rh4.ASM2.asm Attempting to stop `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` Stop of `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` succeeded. [oracle@vmora01rh4 bin]$ crs_stop ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr Attempting to stop `ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr` on member `vmora02rh4`
Stop of `ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr` on member `vmora02rh4` succeeded. [oracle@vmora01rh4 bin]$ crs_stop ora.vmora02rh4.gsd Attempting to stop `ora.vmora02rh4.gsd` on member `vmora02rh4` Stop of `ora.vmora02rh4.gsd` on member `vmora02rh4` succeeded. [oracle@vmora01rh4 bin]$ crs_stop ora.vmora02rh4.ons Attempting to stop `ora.vmora02rh4.ons` on member `vmora02rh4` Stop of `ora.vmora02rh4.ons` on member `vmora02rh4` succeeded. Check status [oracle@vmora01rh4 bin]$ crs_stat -t Name Type Target State Host -----------------------------------------------------------ora.fokerac.db application OFFLINE OFFLINE ora....c1.inst application OFFLINE OFFLINE ora....c2.inst application OFFLINE OFFLINE ora....serv.cs application OFFLINE OFFLINE ora....ac1.srv application OFFLINE OFFLINE ora....ac2.srv application OFFLINE OFFLINE ora....SM1.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora....rh4.gsd application OFFLINE OFFLINE ora....rh4.ons application OFFLINE OFFLINE ora....rh4.vip application OFFLINE OFFLINE ora....SM2.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora....rh4.gsd application OFFLINE OFFLINE ora....rh4.ons application OFFLINE OFFLINE ora....rh4.vip application OFFLINE OFFLINE OK all set , now lets bring them all online. [oracle@vmora01rh4 bin]$ crs_start -all Attempting to start `ora.vmora02rh4.vip` on member `vmora02rh4` Attempting to start `ora.vmora01rh4.vip` on member `vmora01rh4` Start of `ora.vmora02rh4.vip` on member `vmora02rh4` succeeded. Start of `ora.vmora01rh4.vip` on member `vmora01rh4` succeeded. Attempting to start `ora.vmora01rh4.ASM1.asm` on member `vmora01rh4` Attempting to start `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` Start of `ora.vmora02rh4.ASM2.asm` on member `vmora02rh4` succeeded. Attempting to start `ora.fokerac.fokerac2.inst` on member `vmora02rh4` Start of `ora.vmora01rh4.ASM1.asm` on member `vmora01rh4` succeeded. Attempting to start `ora.fokerac.fokerac1.inst` on member `vmora01rh4` Start of `ora.fokerac.fokerac2.inst` on member `vmora02rh4` succeeded.
Attempting to start `ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr` on member `vmora02rh4` Start of `ora.fokerac.fokerac1.inst` on member `vmora01rh4` succeeded. Attempting to start `ora.vmora01rh4.LISTENER_VMORA01RH4.lsnr` on member `vmora01rh4` Start of `ora.vmora02rh4.LISTENER_VMORA02RH4.lsnr` on member `vmora02rh4` succeeded. Start of `ora.vmora01rh4.LISTENER_VMORA01RH4.lsnr` on member `vmora01rh4` succeeded. CRS-1002: Resource 'ora.vmora02rh4.ons' is already running on member 'vmora02rh4' CRS-1002: Resource 'ora.vmora01rh4.ons' is already running on member 'vmora01rh4' Attempting to start `ora.fokerac.fokeserv.fokerac1.srv` on member `vmora01rh4` Attempting to start `ora.vmora01rh4.gsd` on member `vmora01rh4` Attempting to start `ora.fokerac.db` on member `vmora01rh4` Attempting to start `ora.fokerac.fokeserv.fokerac2.srv` on member `vmora02rh4` Attempting to start `ora.fokerac.fokeserv.cs` on member `vmora02rh4` Attempting to start `ora.vmora02rh4.gsd` on member `vmora02rh4` Start of `ora.fokerac.fokeserv.fokerac2.srv` on member `vmora02rh4` succeeded. Start of `ora.fokerac.fokeserv.cs` on member `vmora02rh4` succeeded. Start of `ora.fokerac.db` on member `vmora01rh4` succeeded. Start of `ora.vmora02rh4.gsd` on member `vmora02rh4` succeeded. Start of `ora.vmora01rh4.gsd` on member `vmora01rh4` succeeded. Start of `ora.fokerac.fokeserv.fokerac1.srv` on member `vmora01rh4` succeeded. *CRS-0223: Resource 'ora.vmora01rh4.ons' has placement error. CRS-0223: Resource 'ora.vmora02rh4.ons' has placement error. *Don’t bother about those errors, as they just did not report back to us in the sequence they were started by the clusterware. [oracle@vmora01rh4 bin]$ crs_stat -t Name Type Target State Host -----------------------------------------------------------ora.fokerac.db application ONLINE ONLINE vmora01rh4 ora....c1.inst application ONLINE ONLINE vmora01rh4 ora....c2.inst application ONLINE ONLINE vmora02rh4 ora....serv.cs application ONLINE ONLINE vmora02rh4 ora....ac1.srv application ONLINE ONLINE vmora01rh4 ora....ac2.srv application ONLINE ONLINE vmora02rh4
ora....SM1.asm application ora....H4.lsnr application ora....rh4.gsd application ora....rh4.ons application ora....rh4.vip application ora....SM2.asm application ora....H4.lsnr application ora....rh4.gsd application ora....rh4.ons application ora....rh4.vip application [oracle@vmora01rh4 bin]$
ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE
ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE
vmora01rh4 vmora01rh4 vmora01rh4 vmora01rh4 vmora01rh4 vmora02rh4 vmora02rh4 vmora02rh4 vmora02rh4 vmora02rh4
CRS commands
Administering the Clusterware Components Brief intro Database administration can be a very daunting task. Will that change with all the cool technologies like Virtualization or market developments like Offshoring? NO WAY! You will still need a DBA or a typical "IT Versatilist," if you will, who will do the necessary tasks. Sure, things might get easier as the Oracle Enterprise Manager will be more robust. Administration tools like Spotlight on RAC will get even more advanced. However, the fact remains that you must still be able to troubleshoot your RAC. The need for these skills will not go away. Ok so in the future, (as I see it happening) maybe your RAC will float in a "Secure Virtual Vault" somewhere on the web, but someone will need to watch it. On the other hand, it could be the Extended RAC (RAC nodes geographically separated) thus providing real High Availability. Administration is administration--you will need it no matter what. So here, we will take good look at the administration aspects of "Oracle Clusterware Components"
What is Clusterware composed of? The Oracle Clusterware is comprised primarily of two components: the voting disk and the OCR (Oracle Cluster Registry). The voting disk is nothing but a file that contains and manages information of all the node memberships and the OCR is a file that manages the cluster and RAC configuration. Let's take a quick look at administering the voting disks and the OCR.
Administering voting disks Backup and Recovery: First, let's look at backing up the voting disks by running the following command: dd if=voting_disk_name of=backup_file_name
This operation needs to be performed on all voting disks. Here, clearly you see that the if (input file) is the source file (replace the voting_disk_name with your voting disk) and the of (output file) is the destination backup file containing all information of the voting disk contents. You can do a lot of stuff with dd, such as splitting a file (using bs = block size, I just did it yesterday, to restore my Linux hacked iPAQ with WinCE), converting case sensitivity. Type dd –help for more information. Running the command with the names of the files reversed will help you recover your voting disk file(s). dd if=backup_file_name of=voting_disk_name You can use the ocopy command in Windows environments or use the crsctl commands to copy and administer the files. Also, note that if you have multiple voting disks, which are not abnormal to have, you can use the crsctl command to add and delete the voting disks. For instance: crsctl delete css votedisk path Here you delete the disk and the path, which is the complete path of the location of the file, and below you add your new or backup files by doing the following: crsctl add css votedisk path This way you can either statically or dynamically add or remove your voting disks in your RAC. You must, however, note that if your cluster is down, then you can use the -force option crsctl add css votedisk path -force to modify the voting disk configuration. This way you don’t end up interfering with other Clusterware daemons. Using it in your active configuration may corrupt your configuration.
Administering Oracle Cluster Registry OCR contains information pertaining to instance-to-node mapping, node list and resource profiles for customized applications in your Clusterware. Let's take a look at some of the following administrative tasks: Adding, Replacing, Managing & Removing OCR You can’t have more than two OCRs.You can add an OCR either after an upgrade or after installing the RAC installation. If you already mirror the OCR, then you do not need to add an OCR location; Oracle does that automatically. If your OCR is on the network, do create a target file before performing any tasks! In addition, you must be logged in as the root user to run the ocrconfig tool. So if you created a single OCR, then add by doing the following. : ocrconfig -replace ocr destination_file or disk
Here, do the following to add a mirror file. ocrconfig -replace ocrmirror destination_file or disk Replacing OCR works the same way. Do however check that the file to be replaced is online, the Clusterware is running on that particular node and that if it is on a CFS or on the network. To replace OCR do the following: ocrconfig -replace ocr destination_file or disk and to replace the OCR mirror: ocrconfig -replace ocrmirror destination_file or disk Repairing the OCR comes in handy if the nodes where you may have been working are shutdown. Typing ocrconfig -repair brings those nodes back on line. More specifically, you can do this (to repair your OCR mirror): ocrconfig –repair ocrmirror device_name This command repairs the OCR configuration locally. Also, note that you cannot do this on running Clusterware daemons. To remove an OCR, you need to have at least one OCR online. You may want to do this to reduce overhead or for other storage reasons, such as stopping a mirror to move it to SAN, RAID etc. Carry out the following steps : • •
Check if at least one OCR is online Remove the OCR or OCR mirror
ocrconfig -replace ocr OR ocrconfig -replace ocrmirror Using the –backuploc option allows you to save/move the OCR file to a safe location. Type ocrconfig –option to see all the commands.
Conclusion: The Oracle Clusterware has an HA (High Availability) framework that provides a robust infrastructure to manage any application. The Oracle Clusterware daemons make sure that all applications startup during system startup. Failed applications are started automatically to maintain the HA aspect of the RAC cluster. It is possible to configure all of the administrative aspects (like monitoring frequencies, startup, shutdown) of the RAC cluster. This article examined the two most important components of Oracle Clusterware. In our next article, we will cover the backup and recovery administration of OCR using OCR backup files.
Brief intro I have a Google alert running for Oracle RAC and besides pulling the regular vendor offerings, I receive a lot of alerts on how often Oracle RAC is being adopted into the enterprise. I mentioned commoditization in my previous article, but there are a lot of pressures and several odd and painful reminders that RAC needs a good administrator; and still things can go wrong and they can go very badly wrong. RAC is no longer exclusive for huge data- centers; it is being deployed in SMB environments as well. Since the need to administer the database will require in-house expertise, it is becoming increasingly important that we practice the installation and administration of our Oracle RAC on our VMware Server /and or ESX test bed." So, let's pick up where we left off in our previous article on Clusterware administration.
Administering the OCR Using OCR Backup Files We will take a quick look at two methods described for copying the Oracle Cluster Registry (OCR) and recovering it. Oracle Clusterware automatically creates OCR backups every four hours and it always retains the last three backup copies of the OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and then at the end of a week a complete backup for the week. So there is a robust backup taking place in the background. And you guessed it right; you cannot alter the backup frequencies. This is meant to protect you, the DBA, so that you can copy these generated backup files at least once daily to a different device from where the primary OCR resides. These files are located at %CRS_home/cdata/my_cluster.
Restoring the OCR from generated OCR Backups Given that most of us run our Oracle RAC on limited hardware, on a VMware Server or ESX Server, it is no surprise to see applications failing. Always try to restart the application first. To verify the failure run an ocrcheck. The next step is to fix the problem.
On Unix/Linux Systems Lets do the following to restore our OCR on Unix/Linux Systems. • • • • • •
To show the backups, type the commands ocrconfig –showbackup Check the contents by doing ocrdump -backupfile my_file Go to bin and stop the CRS. crs stop on all nodes. Perform the restore ocrconfig –restore my_file Restart the nodes crs start We have spoken and seen the CVU (Cluster Verification Utility) play a crucial role during installation in our RAC on VMware Series. Check the OCR’s integrity. Get a verbose output of all of the nodes by doing this: cluvfy comp ocr –n all -verbose
On Windows Systems • • • • •
Do the same as above. Check the OCR backups using the ocrconfig -showbackup command. Verify the contents of the backup using ocrdump -backupfile my_file where my_name is your backup file. Disable the OCR clients on all nodes by stopping the following services from the Service Control Panel: OracleClusterVolumeService, OracleCSService, OracleCRService, and the OracleEVMService. Restore the OCR backup file with the following command ocrconfig -restore mfile name command. Always check to see if the OCR devices exist! Start all of the services. Restart all of the nodes to bring the cluster alive. To check the integrity, do the following with the CVU: cluvfy comp ocr -n all -verbose
Overruling the OCR (Oracle Cluster Registry) Data Loss Protection Machinery Oracle Clusterware is robustly built and allows for minimal error. An overwrite can throw RAC out of balance. If your OCR cannot access its mirrored files and for some reason is not able to verify the location of the OCR files (It could be anything, a temporary bottleneck in your SAN Virtual Disks or local shared disks which you chose particularly for your OCR, in any case some temporary glitch), then your OCR prevents further modification to the available OCR. The data protection mechanism prohibits the Clusterware from starting on the node where you have your OCR; Oracle throws an error on your Enterprise Manager and Clusterware alert log files. If the problem persists in just one node, (all that information is displayed neatly in your Enterprise Manager and Clusterware log files errors--Error messages like CLSD-1009 or CLSD-1011), try to restart the node(s). If that does not work and you cannot repair the OCR, then you are left with no other option except overriding the protection mechanism. Do not use it in the first instance! Oracle CRS is robust enough to check and poll the files appropriately. Be warned that data loss may occur (and here I mean that the OCR updates will be lost from the time of your last known successful update. So if you are attempting to make changes to configuration using the following command: ocrconfig –overwrite, then the last good known configuration will be lost. How to Override: • • •
Check and compare the error message output with the Windows registry OR ocr.loc on Unix/Linux. If they don’t match then try to repair using ocrconfig –repair. Use OCRDUMP (we will look more into OCRDUMP later in our Administration series) command to dump all information regarding the OCR configuration and check if the updates are latest. If you can’t resolve the error messages (CLSD) then do the following: ocrconfig -overwrite to bring the node back to life.
Conclusion: We have taken a quick look at the Clusterware's administration. We also took at look at the override possibilities to force restore the OCR files when the OCR's built in protection mechanism prohibits the automatic restore of the same. Future articles will go into more hands-on training. I have read the Oracle manual several times and have quoted it often. I advise you to go through the manual more than once. There is different documentation on RAC, even books, but nothing comes close to the Oracle Documentation--and Oracle just made that freely available now! So go ahead , download the PDF books,
get the free VMware Server (or Trial ESX 3.0 as its called Virtual Infrastructure 3) , ask your boss for an old server and go do some magic with VMware!
Administering the Clusterware and ASM Storage Brief intro Oracle's shopping spree has resulted in a 30% increase in earnings; (Siebel was the hero by the way). Oracle is setting the stage, aggressively moving into the BI world. Business Object will eventually be the next in line. I even predict that educational software, LMS (Learning Management system) as it is called in educational institutions, will be the next in line. I muse on that possible strategic intent in my blogpost. So what would it all mean? Well, one thing I’m pretty sure of--a lot more Oracle database deployments. RAC and GRID computing are soon becoming inseparable terms, and Grid or Utility Computing is where we are heading. OK let's now get into our administration and take a look at administering our OCR.
Administering the OCR Using OCR Exports It's one thing to have automated backups, which are created by the OCR backup files but it also requires that you export the OCR contents both before and after any changes that you may bring to the configuration files. Changes can vary from task to task, such as adding nodes, deleting nodes from your RAC environment, modifying the Clusterware resources, creating a database etc. These modifications are accomplished using the ocrconfig -export command, which will export the contents of our OCR to a file format. Unresolvable issues can be fixed by using the –import flag. Let's take a look at various configurations, starting with UNIX configurations.
Importing Oracle Cluster Registry Content on UNIX-Based Systems Note: Do note that most of the configuration changes to the OCR contents also cause the file and database object creation. Moreover, not all of these changes are restored when you restore the OCR. Performing an OCR restore in order to revert to an older working configuration will invariably fail, and it goes without saying, your RAC will be out of sync. OK now let's take a look at the import procedures of our OCR on UNIX-based systems: • • • • • •
Select the OCR export file that you want to import by using the following command: ocrconfig –export myfile Stop the RAC Clusterware on all nodes by going to the bin directory of your cluster and doing: ../bin/crs stop Now we have to carry out the import by supplying the following commands. Here myfile will be the file from which you want to import your OCR configuration data ocrconfig –import myfile
• •
Now restart your Oracle Clusterware on all nodes. As an additional check, you should (as I’ve mentioned before) always run the Cluster Verification Utility (CVU) for the integrity of our RAC. Doing a cluvfy comp ocr -n all [-verbose] will retrieve a list of all the nodes in the cluster.
Importing Oracle Cluster Registry Content on Windows-Based Systems Use the following procedure to import the OCR on Windows: • •
• • •
•
Check the OCR export file that you want to import by running the ocrconfig – showbackup command. Stop the OCR on all nodes in your RAC cluster. On Windows, you can use the Service Control Panel to stop the following services: OracleClusterVolumeService, OracleCMService, OracleEVMService, OracleCSService, and the OracleCRService. Now import the export file using the command ocrconfig –import from any one of the nodes. Restart all of the services on all of the nodes that you just modified. Run the command Cluster Verification Utility (CVU) to check the integrity of all of the affected nodes. You could also list them all by using –n flag. Typing the following should give you a verbose output of all the listed nodes. cluvfy comp ocr -n all [-verbose]
Administering Storage on RAC We all know by now that the storage on our RAC resides on shared disks and is thus shared. The datafiles we created (ASM1,ASM2,ASM3) all reside in the ASM (Automatic Storage Management) disk group. They can, however, also reside on raw devices. (We won’t get into that). If you follow the OFA, then you will create adequate log files (two at least) that also reside on the shared storage. We also had our mounted disk, specially made for the spasmfile, or SPFILE. There is also an option to use and store client-side parameter files (PFILES). We followed Oracle’s advice and stuck to the SPFILE and ASM. There may be situations or cluster scenarios where that might not be possible. Do refer to appropriate Admin Guide for further reading. Now let's take a quick look at administering our ASM instance with the srvctl command line tool, our powerful SVRCTL utility.
Administering ASM Instances with SRVCTL in RAC Use the following command to add configuration information to an existing ASM instance: srvctl add asm -n mynode_name -i myasm_instance_name -o myoracle_home If, however, you choose not to add the –I option, then the changes are propogated throughout the entire ASM instance pool. To remove an ASM instance, use the following syntax:
srvctl remove asm -n mynode_name [-i myasm_instance_name] In order to enable an ASM instance, use the following syntax: srvctl enable asm -n mynode_name [-i ] myasm_instance_name In order to disable an ASM instance use the following syntax: srvctl disable asm -n mynode_name [-i myasm_instance_name] Note that you can also use the SRVCTL utility to start, stop, and get the status of an ASM instance. See the examples below. To start an ASM instance, do the following: srvctl start asm -n mynode_name [-i myasm_instance_name] [-o start_options] [-c
| -q] To stop an ASM instance, type the following syntax: srvctl stop asm -n mynode_name [-i myasm_instance_name] [-o stop_options] [-c | -q] To list the configuration of an ASM instance do the following: srvctl config asm -n mynode_name To get the status of an ASM instance, see the following syntax: srvctl status asm -n mynode_name
Conclusion: In this article, we took a quick look at the administration of our Clusterware and a brief look at our storage administration. In our next article, we will cover RAC database administration and cluster databases administration.
Administering the Database Instances and Cluster Databases Brief intro Oracle has finally released its Oracle 10G Release 2 database for Solaris 10. I have been waiting for it for quite some time and I hope that they were able to fix the clusterware problems, which had had some issues. Anyway, I will test it on my ESX box, which has three ready baked Solaris 10 U1 Operating Systems.
OK moving ahead with our administration series, we will look at administering our database instances and cluster databases.
Tools needed for administering our instances For every administration task, you need a steady and reliable set of tools. I have seen many tools (and I mean third party tools) but when it comes to taming the Oracle database, you should have a look at the tools discussed (briefly) below. It really doesn’t matter if you are administering a multi-node RAC farm or a single Oracle instance: • • •
Using SQL*PLUS Using Oracle Enterprise Manager Using SRVCTL (our powerful utility)
Managing Oracle Real Application Clusters with SQL*PLUS We can all tell tales about the famed SQL*Plus tool, but check into it in the manual labeled Utilities. It's a great utility and is readily available with the installed Oracle base. Try to connect using the proper parameters, such as naming the instance, because by default, the SQL*Plus prompt does not identify the current instance. So you would typically do something like this if you need to connect to a different instance in SQL*PLUS: CONNECT username/password@my_service_name. If you want to change the SQL*Plus prompt, so that it automatically includes the name of the current instance, you will have to type the following commands in your SQL*Plus: SET SQLPROMPT '_CONNECT_IDENTIFIER> ' Now, what this command will do is, it replaces the SQL string before the > symbol with the user variable _CONNECT_IDENTIFIER that should ideally display the current instance name for the time while you will be logged into your instance. We have also seen Tom Kyte’s script, which did that automagically since versions 8.x. All you have to do is enter the following text into your glogin.sql; this file can be found in your SQL*Plus directory: SET SQLPROMPT '_CONNECT_IDENTIFIER> ' You can also add any other needed text or even a SQL*Plus user variable between the single quotes in the command. We all know that we can have several SQL*Plus sessions to the same instance. To perform any important tasks, you can log on as SYSOPER or SYSDBA. Administrative tasks such as instance shutdown, startup can be carried out with these admin privileges.
Managing Oracle Real Application Clusters with Enterprise Manager This web-based tool has come of age and offers an enterprise class administration and RAC control, to manage a single Oracle RAC database. Enterprise Manager is the central tool that focuses on all
possible aspects of our RAC service, providing a central point of control for the Oracle environment through a graphical user interface (GUI). Oracle Enterprise Manager can be used to start, stop and monitor databases, RAC instances and the associated Listeners. There are schedule jobs, alert thresholds, altering schema, storage features and much more. We will dedicate more time to the Enterprise Manager, since it is a very useful tool, even for Grid control, meaning when administering a GRID. In addition, Enterprise Manager can perform tasks on the fly, on multiple RAC databases.
Managing Oracle Real Application Clusters with SRVCTL In a previous article, we did some work with the SRVCTL tool, which we touted as a powerful utility, and it is powerful, for many reasons. SRVCTL is the source to all of the feeds that go into the Enterprise Manager. The configurations, for instance, are all picked up from a list that SRVCTL generates when discovering and monitoring the nodes in our RAC. These configurations are all stored in our Oracle Cluster registry (OCR) file. In addition, Enterprise Manager uses SQL*PLUS to stop and start the instances. Of course, the Enterprise Manager uses many more tools (which we as DBAs and Developers use day to day) than just the above-mentioned tools, to arm you with a robust GUI. So now it's time to move on to the more robust, enhanced and feature rich tools found in Enterprise Manager. As you can see, these tools are all independent and yet interdependent.
An Example : Starting/Stopping using various tools We can use all the tools to perform the start/stop tasks, whether it Enterprise Manager, SQL*Plus or SRVCTL. Assuming that we are using the SPFILE, which is advisable anyway, lets do the following: • • •
Starting Up and Shutting Down with Enterprise Manager Starting Up and Shutting Down with SQL*Plus Starting Up and Shutting Down with SRVCTL
Make sure that your Clusterware is running. You will not be able to start your RAC (after you have brought it down for this exercise) if your CRS stack is not running properly. All of the processes must be running in place. The procedure for shutting down RAC instances is pretty much the same as performing that kind of exercise on a single-instance Oracle database. There are however some important points to note. Refer to the Admin manual for more information on shutting down the databases. The key differences are: •
•
•
Bringing down one of the many nodes will not affect the performance or the availability of the whole RAC service. These operations are very common when performing maintenance and upgrade tasks while the RAC instance and the core business application continue to run despite all shutdowns and restarts in the back-end. In order to shut down a RAC database completely or the RAC service completely you will have to perform that operation on each single node. The nodes will have the database in mounted or in open state. You do not have to worry about instance recovery, as you would in your single-instance Oracle database, after having done a NORMAL or IMMEDIATE shutdown. You must, however, perform recovery
•
when you issue the SHUTDOWN ABORT command or if the instance or the applications terminated abnormally. The good part is that the node that is still up will perform the instance recovery for the nodes that were brought down abnormally. Moreover, it happens automatically when none are up, the first one automatically detects and recovers the ailing nodes. Doing SHUTDOWN TRANSACTIONAL LOCAL helps bringing down the local node after its transactions are all committed or rolled back. It does this in addition to the SHUTDOWN IMMEDIATE command. There is no interference of other nodes.
Conclusion : In our next article, we will continue to perform the shut down and start up task using SQL*PLUS, Oracle Enterprise Manager and SRVCTL.
Starting and Stopping the Database Instances and RAC Databases Brief intro Talking about syntax and theory is nice. It is good to know the facts and it is fine to know what syntax to use where. I can also assume that you enjoy my articles, but I may be wrong. That is exactly what a polite critique reminded me. (Thanks Solomon!). Doing a walk-through of syntax and theory is good but if it's not complimented with a demonstration, it will slip out of the mind just about as fast as it came. So I took the advice and will from now on spend more time in demonstrating the syntax (on a running RAC) and going through all of the details. We will in our administration series, attempt to break our Oracle RAC and try to mend it again. My next article will catch up with the real time demo on the commands and syntax.
Starting and Stopping Instances and RAC databases It's easy to start up and shut down the instances from the Enterprise Manager console, SQL*Plus or SRVCTL utility. The minor difference here is that both the Enterprise Manager and SRVCTL have options to startup and shutdown all instances of your RAC with a single step! Also, note that certain operations can be carried out in either NOMOUNT or MOUNT state while other operations require the database to be OPEN. We will as usual use the SPFILE and go about the following: • • •
Starting Up and Shutting Down with Enterprise Manager Starting Up and Shutting Down with SQL*Plus Starting Up and Shutting Down with SRVCTL
Obviously, you will need your RAC to be operational. The difference of a RAC against a regular database is that you can easily shutdown (even pull out the plug if you are impatient) and still have your services remain operational. That is utility computing; and indeed RAC is a utility appliance.
Starting Up and Shutting Down with Enterprise Manager Doing this is pretty easy. You can either shut all of the instances down or bring down the individual nodes. Do the following to start up or shut down a cluster database instance: • •
Go to the link http://myipaddress:5500/em this will be your RAC database homepage. Click on startup or shutdown for the particular instance scenario. When the node is down you can log on back with either your SYSDBA or SYSOPER privileges.
To bring down or bring up your RAC, meaning all instances that the Enterprise Manager is aware of, you will do the following: • •
Again, go to your RAC homepage (http://myipaddress:5500/em). Click Startup/Shutdown. The Specify Credentials page appears. This time, provide the username and password for your RAC. Here you can restart your RAC cluster effectively. You must be a member of the OSDBA group.
Starting Up and Shutting Down with SQL*Plus When you need to start or stop a single node with SQL*PLUS, while still connected to the local node, make sure that environment variables are OK. Normally with our RAC setup, we need not worry about that, as with every node reboot the environment variables are reloaded. • •
Now start your SQL*Plus on command line and connect as a user with SYSDBA or SYSOPER privileges. Issue the following command to start and mount your database. CONNECT / AS SYSDBA STARTUP MOUNT
NOTE : You can start multiple instances from a single SQL*Plus session using Oracle Net Services (this is similar to opening a putty console and doing ssh anothermachinename to go to the other machine and jump across machines via one console. The below mentioned example will typically be done via a putty or ssh console of one machine and jump across nodes from that machine). You must, however, use an alias in your connection string. This is a common practice, which I prefer to do on my local node as well, since I travel so much across consoles that sometimes I forget where I am working. Therefore, you will do this to shutdown a node 3 and 4 say vm3rh4 and vm4rh4: CONNECT /@vm3rh4 AS SYSDBA SHUTDOWN
Then go to the second node and connect from the SQL*Plus session: CONNECT /@vm4rh4 AS SYSDBA SHUTDOWN Also note that there is no one command that will shutdown all of the nodes. You can, however, write a shell script that will do so. Maybe we will try to cook up one and see it in upcoming articles. Or better yet, just stick to the Enterprise Manager for such activities.
Starting Up and Shutting Down with SRVCTL We have covered SRVCTL before, so we'll do a quick syntax check here, to start an instance: srvctl start instance -d mydb -i "myinstance_list" [-o start_options] [-c connect_str | -q] To stop, do the following: srvctl stop instance -d mydb -i " myinstance_list" [-o stop_options] [-c connect_str | -q] To start and stop the entire RAC cluster database, meaning all of the instances, you will do the following from your SRVCTL in the command line: srvctl start database -d mydb [-o start_options] [-c connect_str | -q] srvctl stop database -d mydb [-o stop_options] [-c connect_str | -q] There are several options and we will look at all of them in upcoming articles in RAC administration.
Time Synchronization: A common problem in a typical VMware RAC Cluster You must have noticed time synchronization problems during the install process. Your install will fail miserably if there are time sync problems and even during listener creation, you might get “time in future related errors”. What I encountered during preparation of my Virtual Machines was that a dual or even a quad vCPU Virtual Machine, that the CPU runs wild (mostly faster). I will not get into the detail of time drift, cpuspeed, etc., as there is enough advice from other corners to drive you nuts. However, configuring the ESX Servers ntp file helped me to keep time on a ESX 2.5.x host with 4 dual vCPU Virtual Machines RHEL 4.2 OS and have my RAC run successfully.
Conclusion: In our next article, we will continue by show more realistic examples against what we have discussed in our previous articles. Upcoming articles will try to take a closer look at the Enterprise Manager Console. Hands on syntax check
Brief intro As promised, in this article we will check the syntax on our console. These commands are carried out against a working 4-node Oracle 10g R2 cluster. The OS is RHEL 4.2/Centos 4.2 running on Virtual machines, on an ESX Server. Each server has 1200MB RAM, Dual CPU or better said as 2 vCPU (Virtual CPUs). In upcoming articles, we will discuss a 6 node 4vCPU RHEL4.2, 1600MB Oracle RAC, and will also keep in mind our plans for Solaris 10 with Oracle 10gR2 cluster. I’m pretty confident we will also do 10.1 Oracle RAC on MAC OSX, once VMware has support for the MACs on the ESX or VMware Servers.
Administering Clusterware Let's take a look at the details of the commands that we mentioned in the Administration of our Clusterware articles. Part 3: Administering the Clusterware Components Part 4: Administering the Clusterware: Components General OCRCONFIG commands [oracle@vm1rh4 ~]$ ocrconfig Name: ocrconfig - Configuration tool for Oracle Cluster Registry. Synopsis: ocrconfig [option] option: -export [-s online] - Export cluster register contents to a file -import - Import cluster registry contents from a file -upgrade [<user> []] - Upgrade cluster registry from previous version -downgrade [-version ] - Downgrade cluster registry to the specified version -backuploc - Configure periodic backup location -showbackup - Show backup information -restore - Restore from physical backup -replace ocr|ocrmirror [] - Add/replace/remove a OCR device/file -overwrite - Overwrite OCR configuration on disk
-repair ocr|ocrmirror configuration -help information
- Repair local OCR - Print out this help
Note: A log file will be created in $ORACLE_HOME/log//client/ocrconfig_.log. Please ensure you have file creation privileges in the above directory before running this tool. Showing backups [oracle@vm1rh4 ~]$ [oracle@vm1rh4 ~]$ ocrconfig -showbackup vm1rh4 2006/10/13 14:04:07 /u01/app/oracle/oracle/product/10.2.0/crs/cdata/crs vm1rh4 2006/10/13 09:52:00 /u01/app/oracle/oracle/product/10.2.0/crs/cdata/crs vm1rh4 2006/10/13 05:37:24 /u01/app/oracle/oracle/product/10.2.0/crs/cdata/crs vm1rh4 2006/10/12 04:14:46 /u01/app/oracle/oracle/product/10.2.0/crs/cdata/crs vm1rh4 2006/10/01 00:38:29 /u01/app/oracle/oracle/product/10.2.0/crs/cdata/crs [oracle@vm1rh4 ~]$ What is OCRDUMP anyways [oracle@vm1rh4 ~]$ ocrdump -help Name: ocrdump - Dump contents of Oracle Cluster Registry to a file. Synopsis: ocrdump [|-stdout] [-backupfile ] [-keyname ] [-xml] [-noheader] Description: Default filename is OCRDUMPFILE. Examples are: prompt> ocrdump
writes cluster registry contents to OCRDUMPFILE in the current directory prompt> ocrdump MYFILE writes cluster registry contents to MYFILE in the current directory prompt> ocrdump -stdout -keyname SYSTEM writes the subtree of SYSTEM in the cluster registry to stdout prompt> ocrdump -stdout -xml writes cluster registry contents to stdout in xml format Notes: The header information will be retrieved based on best effort basis. A log file will be created in ORACLE_HOME/log//client/ocrdump_.log. Make sure you have file creation privileges in the above directory before running this tool. Checking what the OCRDUMP did Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved. 2006-10-13 16:38:55.377: [ OCRDUMP][3086993088]ocrdump starts... 2006-10-13 16:39:04.063: [ OCRDUMP][3086993088]Failed to open key handle for key name [SYSTEM.evm.debug] [PROC-5: User does not have permission to perform a cluster registry operation on this key. Authentication error [User does not have permission to perform this operation] [0]] 2006-10-13 16:39:05.253: [ OCRDUMP][3086993088]Failed to open key handle for key name [SYSTEM.crs.versions] [PROC-5: User does not have permission to perform a cluster registry operation on this key. Authentication error [User does not have permission to perform this operation] [0]] 2006-10-13 16:39:16.853: [ OCRDUMP][3086993088]Failed to open key handle for key name [CRS.CUR] [PROC-5: User does not have permission to perform a cluster registry operation on this key. Authentication error [User does not have permission to perform this operation] [0]] 2006-10-13 16:39:16.908: [ OCRDUMP][3086993088]Failed to open key handle for key name [CRS.HIS] [PROC-5: User does not have permission to perform a cluster registry operation on this key. Authentication error [User does not have permission to perform this operation] [0]] 2006-10-13 16:39:16.938: [ OCRDUMP][3086993088]Failed to open key handle for key name [CRS.SEC] [PROC-5: User does not have permission to perform a cluster registry operation on this key. Authentication error [User does not have permission to perform this operation] [0]]
2006-10-13 16:39:16.939: [ OCRDUMP][3086993088]Exiting [status=success???]... Well not exactly successful so we log in as root and do it again (Do check with your Oracle support before you give your Oracle user the file creation permissions in production!). [root@vm1rh4 oracle]# ocrdump -backupfile my_file PROT-302: Failed to initialize ocrdump [root@vm1rh4 oracle]# ocrdump PROT-303: Dump file already exists [OCRDUMPFILE] Viewing the contents of the OCRDUMPFILE The contents of the OCRDUMP can be quite large, especially when dealing with 4 nodes. Clearly, you can see that permissions are crucial. On databases and instance level users, Oracle exercises full control, whereas on CRS level we see root as the one with permissions. I have highlighted the details on the small excerpt of the OCRDUMPFILE. ################################################################### 10/13/2006 16:38:55 ocrdump [SYSTEM] UNDEF : SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css] UNDEF : SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css.interfaces] UNDEF : SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_CREATE_SUB_KEY, OTHER_PERMISSION : PROCR_READ, USER_NAME : oracle, GROUP_NAME : dba} [SYSTEM.css.clustername] ORATEXT : crs SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css.misscount] UB4 (10) : 360
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css.diskfile] ORATEXT : /u03/oradata/votingdisk/CSSFile SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css.diskfile1] ORATEXT : /u03/oradata/votingdisk/CSSFile_mirror1 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css.diskfile2] ORATEXT : /u03/oradata/votingdisk/CSSFile_mirror2 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css.configured_node_map] BYTESTREAM (16) : 1e SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [SYSTEM.css.node_names] UNDEF : SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root} [DATABASE.DATABASES.brianic.CONFIG_VERSION] ORATEXT : 10.2.0.0.0 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_WRITE, OTHER_PERMISSION : PROCR_READ, USER_NAME : oracle, GROUP_NAME : dba} [DATABASE.DATABASES.brianic.INSTANCE.brianic1]-Our First Node! ORATEXT : brianic1 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba}
[DATABASE.DATABASES.brianic.INSTANCE.brianic1.NODE] ORATEXT : vm1rh4 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba} [DATABASE.DATABASES.brianic.INSTANCE.brianic1.ENABLED] ORATEXT : true SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba} [DATABASE.DATABASES.brianic.INSTANCE.brianic1.ENVIRONMENT] UNDEF : SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba} . . . [DATABASE.DATABASES.brianic.INSTANCE.brianic4]-Our Last Node! ORATEXT : brianic4 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba} [DATABASE.DATABASES.brianic.INSTANCE.brianic4.NODE] ORATEXT : vm4rh4 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba} [DATABASE.DATABASES.brianic.INSTANCE.brianic4.ENABLED] ORATEXT : true SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba} [DATABASE.DATABASES.brianic.INSTANCE.brianic4.ENVIRONMENT] UNDEF : SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_ALL_ACCESS, USER_NAME : oracle, GROUP_NAME : dba} For a full copy of my dumpfile, see my blog. Please note that I have put it up for my own use and will get into the nitty-gritty of every detail of it when I have time. If you have the time, go ahead and explore it!
Conclusion: In upcoming articles, we will conduct more command lines and will attempt to mend our broken RAC with recovery commands. I will make full backup copies of my development environment with the ESX Consolidated backup tool, and then we can safely do the dirty job in our well-protected and fenced lab environment.
Hands on administration Brief intro As a continuing part of the hands-on articles, we will take a deeper look at such things as errors in the installation (which are actually not errors), ESX host tuning for time synchronization (without which the whole RHEL RAC installation means nothing) and some SRVCTL commands.
Errors come and errors go Let's take a look at a couple of screen shots and some of the typical Cluster Ready Services (CRS) errors and tricks to bring your RAC services and applications online. This is a typical placement error that I get on every installation. I think it may have to do with the time issue; we will come to that. If you are using an ESX server to test/develop your RAC, then the information to test and fix your time synchronization issues will certainly come very handy.
It really doesn't mean a thing. I do run into this every time I do my virtual machine restart. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t Name Type Target State Host -----------------------------------------------------------ora....SM1.asm application ONLINE UNKNOWN vm1rh4 ora....H4.lsnr application ONLINE UNKNOWN vm1rh4 ora.vm1rh4.gsd application ONLINE UNKNOWN vm1rh4 ora.vm1rh4.ons application ONLINE UNKNOWN vm1rh4 ora.vm1rh4.vip application ONLINE ONLINE vm1rh4 ora....SM2.asm application ONLINE UNKNOWN vm2rh4 ora....H4.lsnr application ONLINE UNKNOWN vm2rh4 ora.vm2rh4.gsd application ONLINE UNKNOWN vm2rh4 ora.vm2rh4.ons application ONLINE UNKNOWN vm2rh4 ora.vm2rh4.vip application ONLINE ONLINE vm2rh4 ora....SM3.asm application ONLINE OFFLINE ora....H4.lsnr application ONLINE OFFLINE ora.vm3rh4.gsd application ONLINE UNKNOWN vm3rh4 ora.vm3rh4.ons application ONLINE UNKNOWN vm3rh4 ora.vm3rh4.vip application ONLINE UNKNOWN vm1rh4 ora....SM4.asm application ONLINE OFFLINE ora....H4.lsnr application ONLINE OFFLINE ora.vm4rh4.gsd application ONLINE ONLINE vm4rh4 ora.vm4rh4.ons application ONLINE ONLINE vm4rh4 ora.vm4rh4.vip application ONLINE UNKNOWN vm4rh4 This is really a VM issue. I am using a 1.2 G vMEM, 2vCPUs, and decent SCSI Virtual Disks (VMFS-2 file system on a VMDK format). Restarting the services does not go as expected. Let's see what happens if you try to do a crs_stat –stop and then a crs_stop –start. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop -all Attempting to stop `ora.vm4rh4.gsd` on member `vm4rh4` Attempting to stop `ora.vm4rh4.ons` on member `vm4rh4` Stop of `ora.vm4rh4.gsd` on member `vm4rh4` succeeded. Stop of `ora.vm4rh4.ons` on member `vm4rh4` succeeded. Attempting to stop `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4` Attempting to stop `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4` Stop of `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4` succeeded. Attempting to stop `ora.vm2rh4.ASM2.asm` on member `vm2rh4` Stop of `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4` succeeded. Attempting to stop `ora.vm1rh4.ASM1.asm` on member `vm1rh4` Stop of `ora.vm2rh4.ASM2.asm` on member `vm2rh4` succeeded. Attempting to stop `ora.vm2rh4.vip` on member `vm2rh4` Stop of `ora.vm2rh4.vip` on member `vm2rh4` succeeded. Stop of `ora.vm1rh4.ASM1.asm` on member `vm1rh4` succeeded. Attempting to stop `ora.vm1rh4.vip` on member `vm1rh4` Stop of `ora.vm1rh4.vip` on member `vm1rh4` succeeded.
As you can see, it just does not restart all of the services when we do the crs_stat –t. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t Name Type Target State Host -----------------------------------------------------------ora....SM1.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora.vm1rh4.gsd application ONLINE UNKNOWN vm1rh4 ora.vm1rh4.ons application ONLINE UNKNOWN vm1rh4 ora.vm1rh4.vip application OFFLINE OFFLINE ora....SM2.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora.vm2rh4.gsd application ONLINE UNKNOWN vm2rh4 ora.vm2rh4.ons application ONLINE UNKNOWN vm2rh4 ora.vm2rh4.vip application OFFLINE OFFLINE ora....SM3.asm application ONLINE OFFLINE ora....H4.lsnr application ONLINE OFFLINE ora.vm3rh4.gsd application ONLINE UNKNOWN vm3rh4 ora.vm3rh4.ons application ONLINE UNKNOWN vm3rh4 ora.vm3rh4.vip application ONLINE UNKNOWN vm1rh4 ora....SM4.asm application ONLINE OFFLINE ora....H4.lsnr application ONLINE OFFLINE ora.vm4rh4.gsd application OFFLINE OFFLINE ora.vm4rh4.ons application OFFLINE OFFLINE ora.vm4rh4.vip application ONLINE UNKNOWN vm4rh4 And this is bizarre , of course, I will test it on a ESX 3.0 with more capacity and see if it vanishes, but now the task is to start all the services one by one, which is not easy when you have “ora....H4.lsnr” names. So, when you do the following, you get full names of the services. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat
In my case, it gave these and now you have the full names. NAME=ora.brianic.brianic1.inst TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.brianic.brianic2.inst TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.brianic.brianic3.inst TYPE=application TARGET=OFFLINE
STATE=OFFLINE NAME=ora.brianic.brianic4.inst TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.brianic.db TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.brianic.fokeserv.brianic1.srv TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm1rh4 NAME=ora.brianic.fokeserv.brianic2.srv TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm2rh4 NAME=ora.brianic.fokeserv.brianic3.srv TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm3rh4 NAME=ora.brianic.fokeserv.brianic4.srv TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm4rh4 NAME=ora.brianic.fokeserv.cs TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm4rh4 NAME=ora.vm1rh4.ASM1.asm TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm1rh4.LISTENER_VM1RH4.lsnr TYPE=application TARGET=OFFLINE STATE=OFFLINE
NAME=ora.vm1rh4.gsd TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm1rh4 NAME=ora.vm1rh4.ons TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm1rh4 NAME=ora.vm1rh4.vip TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm2rh4.ASM2.asm TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm2rh4.LISTENER_VM2RH4.lsnr TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm2rh4.gsd TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm2rh4 NAME=ora.vm2rh4.ons TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm2rh4 NAME=ora.vm2rh4.vip TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm3rh4.ASM3.asm TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm3rh4.LISTENER_VM3RH4.lsnr TYPE=application
TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm3rh4.gsd TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm3rh4 NAME=ora.vm3rh4.ons TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm3rh4 NAME=ora.vm3rh4.vip TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm4rh4.ASM4.asm TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm4rh4.LISTENER_VM4RH4.lsnr TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.vm4rh4.gsd TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm4rh4 NAME=ora.vm4rh4.ons TYPE=application TARGET=ONLINE STATE=UNKNOWN on vm4rh4 NAME=ora.vm4rh4.vip TYPE=application TARGET=OFFLINE STATE=OFFLINE So I go ahead and stop them all first... [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm1rh4.gsd Attempting to stop `ora.vm1rh4.gsd` on member `vm1rh4`
Stop of `ora.vm1rh4.gsd` on member `vm1rh4` succeeded. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm2rh4.gsd Attempting to stop `ora.vm2rh4.gsd` on member `vm2rh4` Stop of `ora.vm2rh4.gsd` on member `vm2rh4` succeeded. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm3rh4.gsd Attempting to stop `ora.vm3rh4.gsd` on member `vm3rh4` Stop of `ora.vm3rh4.gsd` on member `vm3rh4` succeeded. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm3rh4.ons Attempting to stop `ora.vm3rh4.ons` on member `vm3rh4` Stop of `ora.vm3rh4.ons` on member `vm3rh4` succeeded. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm2rh4.ons Attempting to stop `ora.vm2rh4.ons` on member `vm2rh4` Stop of `ora.vm2rh4.ons` on member `vm2rh4` succeeded. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm1rh4.ons Attempting to stop `ora.vm1rh4.ons` on member `vm1rh4` Stop of `ora.vm1rh4.ons` on member `vm1rh4` succeeded. [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stop ora.vm3rh4.vip Attempting to stop `ora.vm3rh4.vip` on member `vm1rh4` Stop of `ora.vm3rh4.vip` on member `vm1rh4` succeeded. CRS-1016: Resources depending on 'ora.vm3rh4.vip' are running Check status [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t Name Type Target State Host -----------------------------------------------------------ora....SM1.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora.vm1rh4.gsd application OFFLINE OFFLINE ora.vm1rh4.ons application OFFLINE OFFLINE ora.vm1rh4.vip application OFFLINE OFFLINE ora....SM2.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora.vm2rh4.gsd application OFFLINE OFFLINE ora.vm2rh4.ons application OFFLINE OFFLINE ora.vm2rh4.vip application OFFLINE OFFLINE ora....SM3.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora.vm3rh4.gsd application OFFLINE OFFLINE ora.vm3rh4.ons application OFFLINE OFFLINE ora.vm3rh4.vip application OFFLINE OFFLINE ora....SM4.asm application OFFLINE OFFLINE ora....H4.lsnr application OFFLINE OFFLINE ora.vm4rh4.gsd application OFFLINE OFFLINE ora.vm4rh4.ons application OFFLINE OFFLINE ora.vm4rh4.vip application OFFLINE OFFLINE Then (fortunately) I just need the –all command to start all services.
[oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_start -all Attempting to start `ora.vm1rh4.vip` on member `vm1rh4` Attempting to start `ora.vm2rh4.vip` on member `vm2rh4` Attempting to start `ora.vm3rh4.vip` on member `vm3rh4` Attempting to start `ora.vm4rh4.vip` on member `vm4rh4` Start of `ora.vm2rh4.vip` on member `vm2rh4` succeeded. Attempting to start `ora.vm2rh4.ASM2.asm` on member `vm2rh4` Start of `ora.vm4rh4.vip` on member `vm4rh4` succeeded. Attempting to start `ora.vm4rh4.ASM4.asm` on member `vm4rh4` Start of `ora.vm3rh4.vip` on member `vm3rh4` succeeded. Start of `ora.vm1rh4.vip` on member `vm1rh4` succeeded. Attempting to start `ora.vm1rh4.ASM1.asm` on member `vm1rh4` Attempting to start `ora.vm3rh4.ASM3.asm` on member `vm3rh4` Start of `ora.vm2rh4.ASM2.asm` on member `vm2rh4` succeeded. Attempting to start `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4` Start of `ora.vm2rh4.LISTENER_VM2RH4.lsnr` on member `vm2rh4` succeeded. Start of `ora.vm1rh4.ASM1.asm` on member `vm1rh4` succeeded. Attempting to start `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4` Start of `ora.vm1rh4.LISTENER_VM1RH4.lsnr` on member `vm1rh4` succeeded. Start of `ora.vm3rh4.ASM3.asm` on member `vm3rh4` succeeded. Attempting to start `ora.vm3rh4.LISTENER_VM3RH4.lsnr` on member `vm3rh4` Start of `ora.vm4rh4.ASM4.asm` on member `vm4rh4` succeeded. Start of `ora.vm3rh4.LISTENER_VM3RH4.lsnr` on member `vm3rh4` succeeded. Attempting to start `ora.vm4rh4.LISTENER_VM4RH4.lsnr` on member `vm4rh4` Start of `ora.vm4rh4.LISTENER_VM4RH4.lsnr` on member `vm4rh4` succeeded. CRS-1002: Resource 'ora.vm1rh4.ons' is already running on member 'vm1rh4' CRS-1002: Resource 'ora.vm2rh4.ons' is already running on member 'vm2rh4' Attempting to start `ora.vm1rh4.gsd` on member `vm1rh4` CRS-1002: Resource 'ora.vm3rh4.ons' is already running on member 'vm3rh4' CRS-1002: Resource 'ora.vm4rh4.ons' is already running on member 'vm4rh4' These errors don't mean anything; the installation ran faster than the console command--the services were already started, causing the error messages. As you see now... [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t Name Type Target State Host -----------------------------------------------------------ora....SM1.asm application ONLINE ONLINE vm1rh4 ora....H4.lsnr application ONLINE ONLINE vm1rh4 ora.vm1rh4.gsd application ONLINE ONLINE vm1rh4 ora.vm1rh4.ons application ONLINE ONLINE vm1rh4 ora.vm1rh4.vip application ONLINE ONLINE vm1rh4 ora....SM2.asm application ONLINE ONLINE vm2rh4 ora....H4.lsnr application ONLINE ONLINE vm2rh4
ora.vm2rh4.gsd ora.vm2rh4.ons ora.vm2rh4.vip ora....SM3.asm ora....H4.lsnr ora.vm3rh4.gsd ora.vm3rh4.ons ora.vm3rh4.vip ora....SM4.asm ora....H4.lsnr ora.vm4rh4.gsd ora.vm4rh4.ons ora.vm4rh4.vip [oracle@vm1rh4
application application application application application application application application application application application application application ~]$
ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE
ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE
vm2rh4 vm2rh4 vm2rh4 vm3rh4 vm3rh4 vm3rh4 vm3rh4 vm3rh4 vm4rh4 vm4rh4 vm4rh4 vm4rh4 vm4rh4
After having successfully completed the 4-node installation: Click for larger image
We print out all of our CRS services: [oracle@vm1rh4 ~]$ $ORA_CRS_HOME/bin/crs_stat -t Name Type Target State Host -----------------------------------------------------------ora....c1.inst application ONLINE ONLINE vm1rh4 ora....c2.inst application ONLINE ONLINE vm2rh4 ora....c3.inst application ONLINE ONLINE vm3rh4 ora....c4.inst application ONLINE ONLINE vm4rh4 ora.brianic.db application ONLINE ONLINE vm1rh4 ora....ic1.srv application ONLINE ONLINE vm1rh4
ora....ic2.srv ora....ic3.srv ora....ic4.srv ora....serv.cs ora....SM1.asm ora....H4.lsnr ora.vm1rh4.gsd ora.vm1rh4.ons ora.vm1rh4.vip ora....SM2.asm ora....H4.lsnr ora.vm2rh4.gsd ora.vm2rh4.ons ora.vm2rh4.vip ora....SM3.asm ora....H4.lsnr ora.vm3rh4.gsd ora.vm3rh4.ons ora.vm3rh4.vip ora....SM4.asm ora....H4.lsnr ora.vm4rh4.gsd ora.vm4rh4.ons ora.vm4rh4.vip [oracle@vm1rh4
application application application application application application application application application application application application application application application application application application application application application application application application ~]$
ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE
ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE
vm2rh4 vm3rh4 vm4rh4 vm4rh4 vm1rh4 vm1rh4 vm1rh4 vm1rh4 vm1rh4 vm2rh4 vm2rh4 vm2rh4 vm2rh4 vm2rh4 vm3rh4 vm3rh4 vm3rh4 vm3rh4 vm3rh4 vm4rh4 vm4rh4 vm4rh4 vm4rh4 vm4rh4
As you can see, this is a very real challenge in the VMware environment. It is not yet ripe to be deployed in production--not because of the problems we encountered here, but also issues with OSs like RHEL on time synchronization! Let's take a look at what I did to resolve these issues.
Fixing the Time Synchronization issue on VMware ESX Server host for RHEL/Centos 4.2 The first major steps are these: •
Editing the following files for ESX 2.x Servers o /etc/ntp.conf o /etc/ntp/step-tickers o [root@esxhost]# esxcfg-firewall --enableService ntpClient
•
For ESX Server 3.0 only, run the following command. This opens the appropriate ports and enables the NTP daemon to talk with the external server. [root@esxhost]# esxcfg-firewall --enableService ntpClient
•
Restarting your ntp daemon, service
ntpd restart.
• • •
Disabling the VMware tools in guests Installing the ntp daemon as a service chkconfig --level 345 ntpd on Set your local hardware clock to NTP server by doing : hwclock –systohc
Editing ntp.conf In your ESX files, after making backups of the ntp.conf files, they should look like this: restrict default kod nomodify notrap server 0.pool.ntp.org server 1.pool.ntp.org server 2.pool.ntp.org driftfile /etc/ntp/drift
Editing step-tickers Here the listed servers should be your known NTP servers. Then your step-tickers file looks like this:
0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org pool.ntp.org And finally checking by running ntpq done.
–p
to get detailed realtime check on the NTP activities. And you are
Conclusion: In the next article, we will continue administering our ASM, making a new service and trying to disable and enable a particular instance in order to perform, say, an OS patch work or any regular maintenance.