Oracle 10g RAC load-balancing configuration
This is a discussion on Oracle 10g RAC load-balancing configuration within the Database Discussions forums in Database and Unix Discussions category; Hi, I am new to RAC administration. I have been trying to configure load-balancing across two nodes in a cluster. 192.168.2.5, 192.168.2.6 are public IP addresses of the nodes and 192.168.2.9, 192.168.2.10 are the VIP addresses. My configuration files look like this:
-------------------- SQLNET.ORA --------------------------TRACE_LEVEL_CLIENT = OFF names.directory_path = (TNSNAMES) names.default_domain = world name.default_zone = world break_poll_skip=2000000000 ------------------------------------------------------------------------------------------------ TNSNAMES.ORA -------------------------TPCC.world = (DESCRIPTION = (load_balance = on) (failover = on) (ADDRESS_LIST = (FAILOVER = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.10)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = tpcc) ) ) TPCC1.world = (DESCRIPTION = (load_balance = on) (failover = on) (ADDRESS_LIST = (FAILOVER = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = tpcc) ) ) TPCC2.world = (DESCRIPTION = (load_balance = on) (failover = on) (ADDRESS_LIST = (FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.10)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = tpcc) ) ) EXTPROC_CONNECTION_DATA = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC)) (CONNECT_DATA = (SID=PLSExtProc)(PRESENTATION=RO))) LISTENERS_TPCC = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.10)(PORT = 1521)) ) ------------------------------------------------------------------------------------------------ LISTENER.ORA -------------------------LISTENER_TPCC1 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.5)(PORT = 1521)) ) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC)) ) ) LISTENER_TPCC2 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.10)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.6)(PORT = 1521)) ) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC)) ) ) ---------------------------------------------------------------------------INIT.ORA --------------------------BACKGROUND_DUMP_DEST = /home/oracle/app/oracle/OraHome_1/rdbms/log USER_DUMP_DEST = /home/oracle/app/oracle/OraHome_1/rdbms/log control_files = ?/dbs/tpcc_disks/control_001
parallel_max_servers = 48 DB_BLOCK_SIZE = 2048 recovery_parallelism = 20 compatible = 10.1.0.2 db_name = tpcc db_files = 200 db_file_multiblock_read_count = 16 pga_aggregate_target = 800m dml_locks = 500 log_checkpoint_interval = 100000000 log_checkpoints_to_alert = TRUE open_cursors = 400 processes = 200 sessions = 600 transactions = 400 transactions_per_rollback_segment = 1 cursor_sharing = force cursor_space_for_time = TRUE timed_statistics = TRUE disk_asynch_io = TRUE db_writer_processes = 5 hpux_sched_noage = 178 db_cache_size = 2700m java_pool_size = 0 large_pool_size = 300m shared_pool_size = 1000m log_buffer = 1024000 db_recycle_cache_size = 16m db_cache_advice = ON plsql_optimize_level = 2 undo_management = AUTO #####cluster parameter ####################### cluster_database = true cluster_database_instances = 2 tpcc1.thread = 1 tpcc1.instance_name = tpcc1 tpcc1.instance_number = 1 tpcc1.undo_tablespace = UNDOTBS1 tpcc2.thread = 2 tpcc2.instance_name = tpcc2 tpcc2.instance_number = 2 tpcc2.undo_tablespace = UNDOTBS2 tpcc1.local_listener = 'LISTENER_TPCC1' tpcc2.local_listener = 'LISTENER_TPCC2' remote_listener = 'LISTENERS_TPCC' ############################################# -------------------------------------------------------------------
On both the nodes I have started the listeners LISTENER_TPCC1 and LISTENER_TPCC2 respectively and their status looks similar to this: ------------------------------------------------------------------> lsnrctl start LISTENER_TPCC1 LSNRCTL for HPUX: Version 10.1.0.2.0 - Production on 25-APR-2006 20:16:12 Copyright (c) 1991, 2004, Oracle. All rights reserved. Starting /home/oracle/app/oracle/OraHome_1/bin/tnslsnr: please wait... TNSLSNR for HPUX: Version 10.1.0.2.0 - Production System parameter file is /home/oracle/app/oracle/OraHome_1/bench/bench1000/benchrun/network/listener .ora Log messages written to /home/oracle/app/oracle/OraHome_1/network/log/listener_tpcc1.log Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168. 2.9)(PORT=1521))) Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168. 2.5)(PORT=1521))) Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)) ) Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168. 2.9)(PORT=1521))) STATUS of the LISTENER -----------------------Alias LISTENER_TPCC1 Version TNSLSNR for HPUX: Version 10.1.0.2.0 Production Start Date 25-APR-2006 20:16:12 Uptime 0 days 0 hr. 0 min. 0 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /home/oracle/app/oracle/OraHome_1/bench/bench1000/benchrun/network/listener .ora Listener Log File /home/oracle/app/oracle/OraHome_1/network/log/listener_tpcc1.log Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168. 2.9)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168. 2.5)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)) ) The listener supports no services The command completed successfully -------------------------------------------------------------------
Everytime I try to start the database instances I get the following error: SQL> startup pfile=init.ora ORA-00119: invalid specification for system parameter REMOTE_LISTENER ORA-00132: syntax error or unresolved network name 'LISTENERS_TPCC' ORA-01078: failure in processing system parameters
Can somebody please help me out in pointing out what am I doing wrong? Thanks a lot, Sudhanshu SOLUTION Thanks for your time guys. I tried quite a few things in the pfile single quotes, without quotes, double quotes - finally I just took the entry from the tnsnames.ora and pasted it there in the pfile itself. So the entries in the new pfile look like this: tpcc1.local_listener = '(ADDRESS = (PROTOCOL=TCP)(HOST=192.168.2.9)(PORT=1521))' tpcc2.local_listener = '(ADDRESS = (PROTOCOL=TCP)(HOST=192.168.2.10)(PORT=1521))' remote_listener = '(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9) (PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.10)(PORT = 1521)))' After this the database came up without problems. Still connection load balancing was not working. So I figured it had to do something with tnsnames.ora. I rewrote the tnsnames.ora file as: TPCC.WORLD = (DESCRIPTION = (load_balance = on) (failover = on) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.10)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = tpcc)) ) TPCC1.WORLD = (DESCRIPTION = (load_balance = on) (failover = on)
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = tpcc) (INSTANCENAME = tpcc1) (SERVER = DEDICATED) ) ) TPCC2.WORLD = (DESCRIPTION = (load_balance = on) (failover = on) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.10)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = tpcc) (INSTANCENAME = tpcc2) (SERVER = DEDICATED) ) --------------------------------------------------------------------------Restarting the listeners the new status was like this: ---------------------------------------------------------------------------------------------------------------------> lsnrctl services LISTENER_TPCC1 LSNRCTL for HPUX: Version 10.1.0.2.0 - Production on 27-APR-2006 11:54:26 Copyright (c) 1991, 2004, Oracle. All rights reserved. Connecting to (DESCRIPTION=(address=(protocol=ipc)(key=extproc)) ) Services Summary... Service "plsextproc" has 1 instance(s). Instance "plsextproc", status UNKNOWN, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 LOCAL SERVER Service "tpcc" has 2 instance(s). Instance "tpcc1", status READY, has 2 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready REMOTE SERVER (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.9)(PORT=152 1)) "DEDICATED" established:0 refused:0 state:ready LOCAL SERVER Instance "tpcc2", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:0 refused:0 state:ready REMOTE SERVER (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.10)(PORT=15 21))
The command completed successfully -------------------------------------------------------------------------------------------After checking several connections using: sqlplus "tpcc-at-tpcc/tpcc" I could see the connections were being load balanced properly between the two nodes. Thanks a lot for your time guys! Seems the problem had to do with entries in tnsnames.ora and pfile. - Sudhanshu
RAC Networking, Failover, and Load Balancing Setting up failover, load balancing, TAF and FCF
Objectives At the end of this module the student will understand the following tasks and concepts. •
•
•
Understand how Oracle Net interacts with RAC. Understand how to configure tnsnames for RAC with Transparent Application Failover (TAF) Understand how to configure JDBC connections for RAC
•
Understand how to configure Fast Connection Failover as an alternative to RAC
Overview • • • • • • •
•
The Listener Configuration Files Oracle Net and RAC Connect-Time Failover TAF Failover Load Balancing JDBC Configuration for Failover and Load Balancing Fast Connection Failover
Listener Client Node 1 Node 2
Database
Listener • •
•
•
Resides on the server Listens for incoming client connection requests Manages the traffic to the server; when a client requests a network session with a server, the listener actually receives the request and brokers the client request If the client's information matches the listener's information, then the listener grants a connection to the server.
listener.ora
•
•
Configuration file for the listener process. Identifies: o The listener name o Protocol addresses that it will accept connection requests on o Services for which it is listening Typically resides in the %ORACLE_HOME%\network\adm in directory on Windows NT.
tnsnames.ora •
Configuration file that determines how you connect to the database. o Connect to specific instance o Load Balancing o Connect Time Failover o TAF Client Failover
•
Must be present on server and “thick” clients (ODBC and JDBC “thick” OCI driver)
Oracle Net and RAC •
•
•
Oracle Net understands RAC Is used to set up the cluster behavior Supports multiple modes o o o
Connect-Time Failover Load Balancing TAF
Connect-Time Failover
•
• • •
One service o Tries one node o then the other node Used for failover Balance is done manually In the event of a failure o Transaction is lost o Session is lost o Reconnection occurs automatically
Connect-Time Failover • • • • • • • • • • •
TESTA1 = (DESCRIPTION= (ADDRESS_LIST= (LOAD_BALANCE=OFF) (FAILOVER=ON) (ADDRESS=(PROTOCOL=TCP)(HOST=oradb1)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=oradb2)(PORT=1521))) (CONNECT_DATA= (SERVICE_NAME=TESTA) ) )
• • • •
TESTA2 = (DESCRIPTION= (ADDRESS_LIST= (LOAD_BALANCE=OFF)
• • • • • • •
(FAILOVER=ON) (ADDRESS=(PROTOCOL=TCP)(HOST=oradb2)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=oradb1)(PORT=1521))) (CONNECT_DATA= (SERVICE_NAME=TESTA) ) )
Failover - Backup •
• • •
Pairs of services o The first service defines the second as its backup o The second service defines the first service as its backup Used for RAC failover Balance is done manually In the event of a failure o Transaction is lost o Session is maintained o Queries are restarted o Reconnection occurs automatically
Failover – Backup • • • • • • • • • • • • • • •
TESTA1 = (DESCRIPTION = (ENABLE = BROKEN) (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL = TCP)(HOST = oradb1)(PORT = 1523)) (CONNECT_DATA = (SERVICE_NAME = TESTA) (INSTANCE_NAME = TESTA1) (FAILOVER_MODE = (BACKUP = TESTA2) (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 120) (DELAY = 5))))
• • • • • • • • • • • • • • •
TESTA2 = (DESCRIPTION = (ENABLE = BROKEN) (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL = TCP)(HOST = oradb2)(PORT = 1523)) (CONNECT_DATA = (SERVICE_NAME = TESTA) (INSTANCE_NAME = TESTA2) (FAILOVER_MODE = (BACKUP = TESTA1) (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 120) (DELAY = 5))))
Failover - Preconnect
•
Pairs of services o
o
o
• • •
The first service defines the second as its backup The second service defines the first service as its backup Both services preconnect to their backup when the primary connections are made
Used for RAC failover Balance is done manually In the event of a failure o o o o
Transaction is lost Session is maintained Queries are restarted Connection is quickly switched over to the preconnected backup
Failover - Preconnect • • • • • • • • • • • •
TESTA1 = (DESCRIPTION = (ENABLE = BROKEN) (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL = TCP)(HOST = oradb1-vip)(PORT = 1523)) (CONNECT_DATA = (SERVICE_NAME = TESTA) (INSTANCE_NAME = TESTA1) (FAILOVER_MODE = (BACKUP = TESTA2) (TYPE = SELECT)
• • •
• • • • • • • • • • • • • • •
(METHOD = PRECONNECT) (RETRIES = 120) (DELAY = 5))))
TESTA2 = (DESCRIPTION = (ENABLE = BROKEN) (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL = TCP)(HOST = oradb2-vip)(PORT = 1523)) (CONNECT_DATA = (SERVICE_NAME = TESTA) (INSTANCE_NAME = TESTA2) (FAILOVER_MODE = (BACKUP = TESTA1) (TYPE = SELECT) (METHOD = PRECONNECT) (RETRIES = 120) (DELAY = 5))))
Load Balancing •
• • • •
One service o Oracle Net determines which node you use Tries to balance the load Only active nodes are used Balance is somewhat random In the event of a failure o Transaction is lost
o
Connection is lost
Load Balancing • • • • • •
•
• • • • • •
TESTA = (DESCRIPTION = (ADDRESS_LIST = (LOAD_BALANCE = on) (FAILOVER = off) (ADDRESS = (PROTOCOL = TCP)(HOST = oradb1-vip)(PORT = 1523)) (ADDRESS = (PROTOCOL = TCP)(HOST = oradb2-vip)(PORT = 1523)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = TESTA) ) )
Transparent Application Failover •
• •
One service o Multiple sessions are created Only active nodes are used In the event of a failure o available node is used o No loss of connection o Transaction is lost o Query is restarted
Transparent Application Failover • • • • • • •
TESTA = (DESCRIPTION = (ENABLE = BROKEN) (ADDRESS_LIST = (LOAD_BALANCE = on) (FAILOVER = on) (ADDRESS = (PROTOCOL = TCP)(HOST = oradb1vip)(PORT = 1523))
• • • • • • • • • • • • •
(ADDRESS = (PROTOCOL = TCP)(HOST = oradb2vip)(PORT = 1523)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = TESTA) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 120) (DELAY = 5) ) ) )
Characteristics of TAF •
•
TAF protects: o Client/server connection o User session state o OCI programs o Java thick drivers (OCI drivers) o Sql*plus o Active cursors (select statements) that have begun to return results Not failed over:
o o o
o o
o
Active update transactions PL/SQL procedure states PL/SQL server-side package variables alter session statements Server side program variables Uncommitted transactions
Network Timeouts and TAF •
•
When a “planned” instance shutdown occurs, failover occurs quickly o Shutdown instance with srvctl or OEM o Failover takes a few seconds When an “unplanned” instance failure occurs, failover timing is
•
sensitive to network timeout parameters o Failover cannot occur until the network connection is released o By default, it may take up to eight minutes for the network connection to be released and for failover to occur Server-side and client-side operating system level TCP/IP timeout parameters may be shortened to speed up failover o Failover can occur as quickly as one minute o Can increase risk of disrupted network connections and unnecessary failover events
Example Windows TCP/IP Timeout Parameters for TAF •
The following steps may be performed on Windows database servers, mid-tier servers, and clients to speed up failover o Open the registry hive under HKEY_LOCAL_MACHINE : SYSTEM CurrentControlSet : Services : Tcpip : Parameters o Add KeepAliveTime : REG_DWORD to the registry Note: The correct case is required for KeepAlive o Set the value to a reasonable timeout. Eg: (120000 = 2 minutes, 60000 = 1 minute)
JDBC Connections and TAF •
•
The Oracle JDBC thick driver supports TAF o tnsnames,ora file must be defined on the server-side o JDBC Data Source file must be defined on the client-side (or mid-tier) o TAF failover and load balancing are supported The Oracle JDBC thin driver does not fully support TAF o tnsnames,ora file must be defined on the server-side o JDBC Data Source file must be defined on the client-side (or mid-tier) o Load balancing is supported, but not TAF failover
JDBC Thick Driver Example • • • • • • • • • • • • • • • • • • • • • • • •
JDBC Thin Driver Example
o o
o o o o o
o o o o o
o
o
o o o o o o
Fast Application Notification •
ONS and Fast Application Notification o
o
o
The Oracle Notification Service provides a framework for passing event messages about failover between nodes ONS may notify both RAC nodes and client or mid-tier nodes Fast Application Notification provides the basis for Fast Connection Failover
Fast Connection Failover
•
Fast Connection Failover provides a way to control failover response o Effective for Oracle 10g Application Server JDBC connections o More mid-tier applications may be supported in the future o Works with JDBC Thick and JDBC Thin drivers o Provides an alternative to TAF o Avoids network timeout sensitivity o Fast and efficient
Pre-requisites for Fast Connection Failover •
The JDBC Implicit Connection Cache is enabled.
•
•
•
•
The application uses service names to connect to the database, not service ids. The database should be at least release 10.1.0.2 (10.1.0.3 is preferred). ONS is configured and running on the node where JDBC is running. The JVM in which the JDBC instance is running must have oracle.ons.oraclehome set to point to the ORACLE_HOME where the ONS files were installed.
Setting Up the Implicit Connection Cache for FCF
•
•
• • • •
•
• • •
• •
•
•
The following JDBC Data Source file sets up the Implicit Connection Cache for FCF on the Application Tier: OracleDataSource ods = new OracleDataSource() ... ods.setUser(“Scott”); ods.setPassword(“tiger”); ods.setConnectionCachingEnabled(Tru e); ods.setFastConnectionFailoverEnabled (True); ods.setConnectionCacheName(“MyCache”); ods.setConnectionCacheProperties(cp); ods.setURL("jdbc:oracle:thin:@(DESCRIPTIO N= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=VIP1) (PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=VIP2) (PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=service _name)))");
ONS on the Applications Tier •
The Oracle Notification Service must be installed and running on the Applications Tier nodes o
o
•
The ONS client is available with the 10.1.0.3 client CD For the 10.1.0.2 ONS client, contact Oracle Support
Run “onsctl ping” on each application node to determine that ONS is running
ONS on the Applications Tier •
Edit the file %ORACLE_HOME%\opmn\conf\o ns.config on each mid-tier node: o localport=6100 # This is the port ONS is writing to on this node o remoteport=6200 # This is the port ONS is listening on this node o loglevel=3 o # This is the list of hosts and ports ONS is posting to. o # Include RAC and client nodes. o nodes=RAC1.mycompany.co m:6200,RAC2.mycompany.co m:6200,
o
o
RAC3.mycompany.com:6200, APPS1.mycompany.com:6200 , APPS2.mycompany.com:6200
ONS on the Applications Tier •
When starting the application: o
o
Specify the system property – Doracle.ons.iraclehome=
Include the ons.jar file in the application CLASSPATH
Modifying Server-Side ONS for FCF •
Use the %ORA_CRS_HOME%\bin\racgons command to modify the serverside ONS configuration to be aware of the mid-tier nodes: o racgons add_config APPS1.mycompany.com:6200 APPS2.mycompany.com:6200 o This adds the host and port names to the OCR
Differences between FCF and TAF
•
Application-Level Connection Retries. o
o
Fast Connection Failover supports application-level connection retries. This gives the application control of responding to connection failovers. TAF supports connection retries only at the OCI/Net layer.
Differences between FCF and TAF
•
Integration with the Connection Cache o
o
o
Fast Connection Failover is well-integrated with the Implicit Connection Cache, which allows the connection cache manager to manage the cache for HA. Failed connections are automatically invalidated in the cache. TAF works at the network level on a per-connection basis, which means that the connection cache cannot be notified of failures.
Differences between FCF and TAF •
Event-Based o
o
Fast Connection Failover is based on the RAC event mechanism. This means that Fast Connection Failover is efficient and detects failures quickly for both active and inactive connections. TAF is based on the network call mechanism.
Differences between FCF and TAF •
Load-Balancing Support
o
Fast Connection Failover supports UP event load balancing of connections and runtime work request distribution across active RAC instances.
Review •
•
•
•
What are the two key configuration files for Oracle Net? What key phrase in the tnsnames.ora file influences load balance behaviour? What key phrase in the tnsnames.ora file influences failover behaviour? Which Oracle Net configurations allow users to maintain connections through a failover event?
•
•
Which JDBC drivers are supported under TAF and FCF? What is the name of the utility to add mid-tier nodes to the server-side ONS configuration?
Summary • •
•
• • •
•
Listener process Configuration files o listener.ora o tnsnames.ora Failover o Basic o Preconnect Load Balancing TAF JDBC Thick and Thin drivers and TAF Setting up Fast Connection Failover
The listener is a process that resides on the server whose responsibility is to listen for incoming client connection requests and manage the traffic to the server. The listener brokers the client request, handing off the request to the server. Every time a client (or server acting as a client) requests a network session with a server, a listener receives the actual request. If the client's information matches the listener's information, then the listener grants a connection to the server.
Oracle databases do not require identification of the database service in listener.ora because of service registration. However, static service configuration is required for an Oracle databases if you plan to use a listener.
The tnsnames.ora (or the equivalent) must be used if you want to use Transparent Application Failover. There must be a copy located on each RAC server. In addition, key components of the server-side tnsnames.ora file must be present in a client-side tnsnames.ora file. However, for JDBC connections, the tnsnames.ora content is often contained in a long connection string in the JDBC configuration file on the client. Note that the JDBC driver for TAF must be the Oracle supported “thick” OCI driver.
Connect-Time Failover By setting up multiple network paths, OCI will try each of the paths in the connection list. In this manner you can set up multiple nodes in the cluster, but at the same time, specifying which node is the primary connection. This can be done by putting the following entry into the tnsnames.ora file. prod1.us.company.com = (DESCRIPTION= (ADDRESS_LIST= (LOAD_BALANCE=OFF) (FAILOVER=ON) (ADDRESS=(PROTOCOL=tcp)(HOST=prod1)(PORT=1521)) (ADDRESS=(PROTOCOL=tcp)(HOST=prod2)(PORT=1521))) (CONNECT_DATA= (SERVICE_NAME=prod1.us.company.com)))
prod2.us.company.com = (DESCRIPTION= (ADDRESS_LIST= (LOAD_BALANCE=OFF)
(FAILOVER=ON) (ADDRESS=(PROTOCOL=tcp)(HOST=prod2)(PORT=1521)) (ADDRESS=(PROTOCOL=tcp)(HOST=prod1)(PORT=1521))) (CONNECT_DATA= (SERVICE_NAME=prod2.us.company.com)))
In this example, there are two different service names. The first uses prod1 as the primary and prod2 as the failover, the second uses prod2 as the primary and prod1 as the failover. This allows you to do your own manual load balancing, including failover to the designated node. The connection is re-established during failover, but session state is lost for all transactions and queries. This type of failover pre-dates RAC, and can even be used with some Oracle 8i Net connections. TAF failover is preferred with RAC.
Failover – Backup Method This failover method uses only one address per service name. However, each service explicitly identifies another service as its BACKUP service. If the primary service fails, the connection will be switched to the backup service. This can be done by putting the following entry into the tnsnames.ora file. prod1.us.company.com = (DESCRIPTION = (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL=tcp)(HOST=prod1)(PORT=1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod.us.company.com) (INSTANCE_NAME = prod1.us.company.com) (FAILOVER_MODE = (BACKUP = prod2.us.company.com) (TYPE = select) (METHOD = basic)))) prod2.us.company.com = (DESCRIPTION = (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL=tcp)(HOST=prod1)(PORT=1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod.us.company.com) (INSTANCE_NAME = prod2.us.company.com) (FAILOVER_MODE = (BACKUP = prod1.us.company.com) (TYPE = select) (METHOD = basic))))
In this example, there are two different service names. The first uses prod1 as the primary and prod2 as the failover, the second uses prod2 as the primary and prod1 as the failover. Note the extended syntax under “FAILOVER_MODE=” as compared to connect-time failover. This syntax is RAC specific, and allows the session state to be maintained through failover for queries (transactions are rolled back). The RETRIES and DELAY parameters allow the connection to be retried if the first attempt is not successful.
Note that the IP addresses or host names used in the tnsnames.ora file should be Virtual IP addresses or host names for Oracle 10g RAC.
Failover - Preconnect Note that setting the “METHOD = preconnect” allows the backup connection to be preallocated. This can aid failover speed, at the cost of twice as many concurrent connections. prod1.us.company.com = (DESCRIPTION = (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL=tcp)(HOST=prod1)(PORT=1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod.us.company.com) (INSTANCE_NAME = prod1.us.company.com) (FAILOVER_MODE = (BACKUP = prod2.us.company.com) (TYPE = select) (METHOD = preconnect)))) prod2.us.company.com = (DESCRIPTION = (LOAD_BALANCE = off) (FAILOVER = on) (ADDRESS = (PROTOCOL=tcp)(HOST=prod1)(PORT=1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod.us.company.com) (INSTANCE_NAME = prod2.us.company.com) (FAILOVER_MODE = (BACKUP = prod1.us.company.com) (TYPE = select) (METHOD = preconnect))))
Note that the IP addresses or host names used in the tnsnames.ora file should be Virtual IP addresses or host names for Oracle 10g RAC.
Load Balancing Load balancing is similar to Connect-time Failover in that one service is defined with multiple addresses. In this case, automatic load balancing is turned on while failover is turned off. This can be done by putting the following entry into the tnsnames.ora file. prod.us.company.com = (DESCRIPTION= (ADDRESS_LIST= (LOAD_BALANCE=ON) (ADDRESS=(PROTOCOL=tcp)(HOST=prod1)(PORT=1521)) (ADDRESS=(PROTOCOL=tcp)(HOST=prod2)(PORT=1521))) (CONNECT_DATA= (SERVICE_NAME=prod.us.company.com))) Here either prod1 or prod2 will be used. The system used is determined at connection time. Load balancing is commonly used with RAC, both in the server-side tnsnames.ora file and the client-side tnsnames.ora file.
Note that the IP addresses or host names used in the tnsnames.ora file should be Virtual IP addresses or host names for Oracle 10g RAC.
Transparent Application Failover (TAF) The TAF mechanism provides some fault tolerance in the event of a node failure. It does this by establishing a connection to multiple nodes in the cluster. TAF will automatically resend some transactions, and will not result in the loss of connection to the user in the event of a failure. TAF is configured by specifying both FAILOVER and LOAD_BALANCE as shown here: prod.us.company.com= (DESCRIPTION= (LOAD_BALANCE=on) (FAILOVER=on) (ADDRESS= (PROTOCOL=tcp)
(HOST=prod1) (PORT=1521)) (ADDRESS= (PROTOCOL=tcp) (HOST=prod2) (PORT=1521)) (CONNECT_DATA= (SERVICE_NAME=prod.us.company.com) (FAILOVER_MODE= (TYPE=select) (METHOD=basic)))) The TYPE parameter accepts the following options: session - In this mode a new session is created on the failover server in the event of a failure. select - Cursor processing is maintained. This incurs significant overhead on the client. none (default) - No failover is done. The METHOD parameter accepts the following options: basic - establishes the connection to the failover server when it is needed. preconnect - establishes the connection to the failover server at initial connection time.
Note that the IP addresses or host names used in the tnsnames.ora file should be Virtual IP addresses or host names for Oracle 10g RAC.
What TAF Restores TAF automatically restores some or all of the following elements associated with active database connections. Other elements, however, may need to be embedded in the application code to enable TAF to recover the connection. Client-Server Database Connections TAF automatically reestablishes the connection using the same connect string or an alternate connect string that you specify when configuring failover. Users' Database Sessions TAF automatically logs a user in with the same user ID as was used prior to failure. If multiple users were using the connection, then TAF automatically logs them in as they attempt to process database commands. Unfortunately, TAF cannot automatically restore other session properties. These properties can, however, be restored by invoking a callback function. Executed Commands If a command was completely executed upon connection failure, and it changed the state of the database, TAF does not resend the command. If TAF reconnects in response to a command that may have changed the database, TAF issues an error message to the application. Open Cursors Used for Fetching TAF allows applications that began fetching rows from a cursor before failover to continue fetching rows after failover. This is
called "select" failover. It is accomplished by re-executing a SELECT statement using the same snapshot, discarding those rows already fetched and retrieving those rows that were not fetched initially. TAF verifies that the discarded rows are those that were returned initially, or it returns an error message Active Transactions Any active transactions are rolled back at the time of failure because TAF cannot preserve active transactions after failover. The application instead receives an error message until a ROLLBACK is submitted. Server-side Program Variables Server-side program variables, such as PL/SQL package states, are lost during failures; TAF cannot recover them. They can be initialized by making a call from the failover callback.
Note the “failover=on”, “failover_mode=”, and “rac_enabled=‘true’” sections.
Note the “LOAD_BALANCE=on” is set, but failover is not set (TAF failover is not supported with the thin driver).
The command “%ORACLE_HOME%\opmn\bin\onsctl start” can be used to start ONS, and the command “%ORACLE_HOME%\opmn\bin\onsctl stop” can be used to stop ONS.
Do not modify the server-side ons.config file to add the mid-tier hosts and ports. This does not modify the OCR. Use the racgons command instead.
Note: The application should not try to roll back the transaction; the transaction was already rolled back in the database by the time the application received the exception. You should not attempt to run both TAF and FCF at the same time.
Note: The application should not try to roll back the transaction; the transaction was already rolled back in the database by the time the application received the exception. You should not attempt to run both TAF and FCF at the same time.
Note: The application should not try to roll back the transaction; the transaction was already rolled back in the database by the time the application received the exception. You should not attempt to run both TAF and FCF at the same time.
Note: The application should not try to roll back the transaction; the transaction was already rolled back in the database by the time the application received the exception. You should not attempt to run both TAF and FCF at the same time.
Oracle RAC Administration - Part 15: Connection Load Balancing and FAN Tarry Singh, [email protected]
Brief intro In our last article, we looked at the service architecture. We will continue our quest to understand the workload characteristics and the load balancing connections in our Virtualized RAC.
Introduction to Workload Management Connection Load Balancing With Oracle Notification Services (ONS), we are able to have our RAC balance our client connections across the nodes. The CLB (connection load balancing) can be handled on the server-side or on the client-side. The client-side takes care of the connection sharing across the Listeners, while server-side load balancing manages and handles the connection requests to the most willingly available node whose information is in turn provided to it by the LBA (Load Balancing Advisory). This in turn is analyzing the goals that may have been set for the LBA. You can use a long goal or a short goal for this connection load balancing. Let’s take a look at them both quickly. Long Goals: Use long goals for applications that need long uninterrupted connections. It is the default connection load balancing goal. To see how to implement them, lets take a simple syntax. Using the DBMS_SERVICE and CLB_GOAL_LONG packages, you can define the connection load balancing for long connections as follows and here FOKESERV is our service:
EXECUTE DBMS_SERVICE.MODIFY_SERVICE (service_name => 'FOKESERV' , clb_goal => DBMS_SERVICE.CLB_GOAL_LONG); Short Goals: Think of quick and fast transactions like ordering or placing a secure purchase like that on Amazon or any other place. You would prefer to fill the order up fast and if, for any reason the process is delayed, the connection is restarted, ensuring that the transaction is performed in a short-lived connection. Checking the syntax, here CARTSERV is the service in question:
EXECUTE DBMS_SERVICE.MODIFY_SERVICE (service_name => 'CARTSERV' , CLB_GOAL => DBMS_SERVICE.CLB_GOAL_SHORT); Note: When you install your RAC with the DBCA (Database Configuration Assistant), server-side load balancing is enabled by default. Also check out the sample script in tnsnames.ora file. The
services created through the DBCA will have the Connection Load Balancing default setting as CLB_GOAL=CLB_GOAL_LONG. The ONS maintains client connections until the client decides to close the connection or if a node fails or any other form of outage prevents the continuity of that connection. However, if you configure TAF (transparent application failover) then the connections are moved to another healthy instance. For a typical SARG (Select Arguments), TAF can restart the query. Alternately, from the client side, if you were accessing information on the web, the search would simply carry on without you noticing the failover of the connection to another node. However, if you were in the middle of a transaction (INSERT, UPDATE, or DELETE) then the application must rollback the failed transaction and submit it again. Any other session's customizations must be re-executed. However, in a normal processing scenario the sessions don’t move irrespective of the workload changes, newer sessions are just redirected to other nodes. With the services, the deployment of TAF only becomes easier. By defining a TAF policy for a service, all connections using this service will have TAF automatically enabled for them. No clientside intervention is required, meaning the server-side TAF setting will override the TAF setting at the client level. In order to define the TAF policy you will need the DBMS_SERVICE procedure:
EXECUTE DBMS_SERVICE.MODIFY_SERVICE (service_name => 'fokeserv.domain.com' , aq_ha_notifications => TRUE , failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC , failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT , failover_retries => 180 , failover_delay => 5 , clb_goal => DBMS_SERVICE.CLB_GOAL_LONG); Client-side load balancing is defined in your client connection with the parameter LOAD_BALANCE=ON (the default is ON). Upon setting this parameter to ON, Oracle will pick out an address from the address list at random and connect to that node, thereby balancing the client connections across the cluster. Run the command lsnrctl services occasionally to see what services a listener supports. FAN (Fast Application Notification) As the Oracle RAC manual states: FAN is a notification mechanism that RAC uses to notify other processes about configuration and service level information such as includes service status changes, such as UP or DOWN events. Applications can respond to FAN events and take immediate action. FAN UP and DOWN events can apply to instances, services, and nodes. RAC publishes the FAN events the minute any changes are made to the cluster. So, instead of waiting for the application to check on individual nodes to detect an anomaly, the applications are notified by FAN events and are able to react immediately. FAN also publishes load balancing advisory (LBA) events. Applications are in a position to take full advantage of the LBA FAN events to facilitate a smooth transition of connections to healthier nodes in a cluster. One can take advantage of FAN is the following ways:
•
•
•
When using integrated Oracle Client, the applications can use FAN with no programming whatsoever. Oracle 120g JDBC, ODP.NET and OCI would be considered as the components of the integrated clients. Programmatic changes in ONS API make it possible for applications to still subscribe to the FAN events and can execute the event handling actions appropriately. FAN can be implemented with server-side callouts on your database tier.
For instance, a typical DOWN event will prevent any further disruption of service by cleanly terminating the sessions on that failed node and notifying the user. Moreover, a typical UP event can address and allocate extra resources for new incoming requests/ connections. There are however several additional benefits with the server-side callouts, mainly you can utilize FAN in order to: • • •
•
Logging Paging/SMS the DBA and/or to open trouble tickets when the resources fail to (re)start Change resource plans or to shut down services when the number of available instances decreases, thus preventing further load on the cluster and keeping the RAC running until another healthy node is added to the cluster. Automate the fail service back to PREFERRED instances when required.
Conclusion In this article, we looked at the CLB and FAN. In future articles, we will continue to discuss the services architecture and discuss FAN in detail and take a look at the LBA.
Oracle RAC Administration - Part 16: Balancing act between Server and Client Brief intro We will take a little detour from our regular path of following the manual, some would say, to more practical path. One topic that most intrigues both an Oracle DBA and their clients (in their own way) is connectivity. DBAs are sometimes puzzled with the load balancing and failover part of connectivity. Where is it, on the Client-side OR the Server-side? Is tnsnames.ora still useful if I deploy the Easy Connect Net Naming method? Let's brush up on some basics.
Easy Connect Naming Methods Easy Connect Naming methods were introduced in Oracle 10g. The benefits were simple. The need to do a service name lookup in the tnsnames.ora files for TCP/IP environment was eliminated, a directory naming system is no longer needed and clients are connected automatically. Connecting is also fairly simple: Syntax:
CONNECT username/password@[//]host[:port][/service_name]
So our typical examples could look like:
CONNECT oe/oe@localhost/nickserv CONNECT scott/[email protected]/examples
Connect-Time Failover From the clients end, when your connection fails at one node or service, you can then do a look up from your tnsnames.ora file and go on seeking a connection with the other available node. Take this example of our 4-node VMware ESX 3.x Oracle Linux Servers:
FOKERAC = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST nick01.wolga.com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST com)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = fokerac) ) )
= = nick02.wolga. = brian01.wolga. = brian02.wolga.
Here the first address in the list is tried at the client’s end. Should the connection to nick01.wolga.nl fail, then the next address, nick02.wolga.nl, will be tried. This phenomenon is called connection-time failover. You could very well have a 32-node RAC cluster monitoring the galactic system at NASA and thus have all those nodes typed in your tnsnames.ora file. Moreover, these entries do not necessarily have to be part of the RAC cluster. So it is possible that you are using Streams, Log Shipping or Advanced Replication to maintain your HA (High Availability) model. These technologies facilitate continued processing of the database by such a HA (High Availability) model in a non-RAC environment. In a RAC environment we know (and expect) the data to be the same across all nodes since there is only one database.
TAF (Transparent Application Failover) Transparent Application Failover actually refers to a failover that occurs when a node or instance is unavailable due to an outage or other reason that prohibits a connection to be established on that node. This can be set to on with the following parameter FAILOVER. Setting it to ON will activate the TAF. It is turned on by default unless you set it to OFF to disable it. Now, when you turn it on you have two types of connections available by the means of the FAILOVER_MODE parameter. The type can be session, which is default or select. When the type is SESSION, if the instance fails, then the user is automatically connected to the next available node without the user’s manual intervention. The SQL statements need to be carried out again on the next node. However, when you set the TYPE to SELECT, then if you are connected and are in the middle of your query, then your query will be restarted after you have been failed over to the next available node. Take this example of our tnsnames.ora file, (go to the section beginning with CONNECT_DATA):
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = fokerac.wolga.com) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5) ) ) This is also demonstrated in this image from my article on Quest’s soRAC tool. This is more intelligent in the sense that should the instance, (where the user has fired a query), go down then the session will remember the rows in its Global Buffer Cache, restart the query and pick up from where the user left off when they were disconnected from the first node. Compare it to a scenario where the manager is running the query through a client such as Crystal Reports or Oracle Reports. Running the monthly query to get the data from his Server Monitoring software, let's say NetIQ, to get the statistics of all the servers the DBA needs to draw a report on monitoring the CPU, memory or disk usage. Now if he ran a query that had to fetch 5000 rows and he gets cut off at 3000 rows, he still has to reissue (not manually though) start from the beginning but will only see the results when he crosses the 3001 mark. Then upon prompt, the results will be shown. However, this does not apply to the DML (Data Manipulation Language) such as insert, update, delete. The user is met with an error if node failure happens in the middle of the processing and all the (uncommitted) transactions are rolled back. The session eventually fails over without the user having to connect, however, he does have to issue the query again.
Preconnect Method Consider our tnsnames.ora file again. Here in our older example, where the METHOD was BASIC, there is some additional time loss in reconnecting the lost connection to the immediate available node. However, when you set the METHOD to PRECONNECT, a session is connected to all available nodes in the list. The advantage is that failover is a lot quicker than the BASIC method but the downside is that the other instances will continue to claim necessary resources such a system memory to keep those connections alive.
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = fokerac.wolga.com) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = PRECONNECT) (RETRIES = 180) (DELAY = 5) ) )
Conclusion As a DBA, these little things are very crucial in deciding what you intend to do with the RAC. Is it a DSS or will it be used for a heavy OLTP environment. On a DSS, you can easily choose a PRECONNECT method as you expect to have less connections with long durations (batch processing, etc.) where as a typical OLTP will perform better if the cluster is optimally configured with a BASIC method.