Best Practices for 10g RAC December 2005
Roy Rossebo RAC Pack Oracle Corporation
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Session Objectives Ÿ Best Practices based on latest field experiences. Ÿ Avoid some common issues. Ÿ Get ready for the Grid. Beyond 2-node clusters.
Ÿ Assume that basic RAC architecture is known
1. Define Service Level Objectives Ÿ Cannot be Successful without Objectives Ÿ Realistic and Measurable Objectives – – – –
–
Driven by and linked to business objectives Existing service levels typically provide the baseline Max planned/unplanned downtime Performance requirements: #user, response times, jobs/transactions per time period Consolidation, cost saving
Realistic Expectations Ÿ If your application will scale transparently on SMP, then it is realistic to expect it to scale well on RAC, without having to make any changes to the application code. Ÿ RAC eliminates the database instance, and the node itself, as a single point of failure, and ensures database integrity in the case of such failures
2. Keep It Simple Ÿ Use Proven Configurations Ÿ Certified Components – –
Metalink Certification matrix Vendor certifications
Ÿ Minimize number of Components – –
Other Cluster Manager required anymore? Volume Managers needed anymore?
Ÿ Partner with Vendors Ÿ Don’ t be too ‘ Creative’
3. Get ready for the Grid Ÿ 2 node clusters give no/few Grid benefits – –
30+ customer with 6+ nodes in cluster 4+ node clusters are the norm for new systems
Ÿ Large Clusters = Large HW Savings Ÿ Provisioning, Capacity on Demand Ÿ RAC performance scales better than SMP for large number of CPUs Ÿ Grid Control: RAC System Management very similar to Single Instance
4. Use Oracle10g Release 2 Ÿ Ÿ Ÿ Ÿ
OCR and Voting Disk mirroring Simpler Install Process for Large Clusters Significant Performance Improvements Framework for Workload Management in Large Clusters: Services, FAN and LBA Ÿ Clusterware High Availability API Ÿ Etc.
Before Oracle10g Release 2 Buffer Cache
Buffer Cache
Gcs message to master
Read from disk
Object,e.g. Table Data Blocks
Messages are sent to remote node when reading into cache
Object Affinity Buffer Caches
Object,e.g. Table Data Blocks
NO Messages are sent to remote node when reading into cache
5. Implement Workload Mgt. Ÿ Workload Management for Clusters –more then just load balancing – – – –
Effective Utilization of Cluster Resources Minimize Synchronization Cost in Cluster Monitor and Optimize Response Times Reduce Network Delays During Failover
Ÿ Take advantage of Services, FAN and LBA
Connect to Services Service: GL Preferred 1,2,3,4 Service: ERP Preferred: 1,2 Available: 3,4 Instance: FINPROD1
Instance: FINPROD2
Database Service: FINPROD
Service: CRM, Preferred: 3,4 Available: 1,2 Instance: FINPROD3
Instance: FINPROD4
Initial State C1 C2 C3 C4 C5
Instance 1 Instance 2
C6
Service across RAC
Connection Pool
Instance Join C1 C2 C3 C4 C5
Instance 1
C6
Instance 3
Instance 2
Service across RAC
Connection Pool
Instance Join –Undesired Cases Option 1: Nothing Happens C1 C2 C3 C4 C5
Instance 1
C6
Instance 3
Instance 2
Service across RAC Connection Pool
Instance Join –Undesired Cases Option 2: Add New Connections Randomly C1 C2 C3 C4 C5 C6 C7 C8 C9
Connection Pool
Instance 1 Instance 2 Instance 3
Service across RAC
Instance Join –Desired Result C1 C2 C3 C4 C5 C6 C7 C8 C9
Connection Pool
Instance 1 Instance 2 Instance 3
Service across RAC
Node Leaves C1 C2 C3 C4 C5 C6 C7 C8 C9
Connection Pool
Instance 1 Instance 2
Service across RAC
Runtime Connection Load Balancing CRM Client connection requests connection cache
? 60%
“CRM is bored”
30%
10%
“CRM is very busy”
Instance 1
Instance 2
Instance 3
“CRM is busy”
6. Configure Interconnect Correctly Ÿ Use UDP over Gigabit Ethernet for Interconnect! Ÿ OS Bonding for Failover & Loadbalancing –
NIC bonding, multiple switches
Ÿ Set UDP buffers to max Check: $ /sbin/sysctl net.core.rmem_max net.core.wmem_max net.core .rmem_default net.core.wmem_default net.core.rmem_max = 262144 net.core.wmem_max = 262144 net.core.rmem_default = 262144 net.core.wmem_default = 262144
Ÿ Use switch(es) - crossover cable not supported Ÿ Eliminate any Transmission Problems Ÿ LMS CPU Resources
Correct network used? select * from gv$cluster_interconnects; INST_ID ------1 2
NAME ---eth2 eth2
IP_ADDRESS ------------138.2.238.74 138.2.238.75
IS_PUBLIC --------NO NO
SOURCE ------------------------Oracle Cluster Repository Oracle Cluster Repository
SQL> oradebug setmypid Statement processed. SQL> oradebug ipc Information written to trace file. SQL> oradebug tracefile_name SSKGXPT 0xcc1c78c flags SSKGXPT_READPENDING socket no 7 IP 138.2.238.74 sflags SSKGXPT_UP
info for network 0 UDP 33594
Correct network used? $ oifcfg getif eth0 138.2.236.114 global public eth2 138.2.238.74 global cluster_interconnect $ocrdump $view OCRDUMPFILE ... [SYSTEM.css.node_numbers.node2.privatename] ORATEXT : stnsp014-rac SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}
Interconnect Ÿ Jumbo Frames (NIC and switch) –test properly first! $ /sbin/ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:0E:0C:09:6C:81 inet6 addr: fe80::20e:cff:fe09:6c81/64 Scope:Link UP BROADCAST RUNNING NOARP MULTICAST MTU:1500 Metric:1 … $ ping -s 8972 -M do stnsp013-rac From stnsp014-rac (138.2.238.74) icmp_seq=0 Frag needed and DF set (mtu = 1500)
Ÿ Full Duplex $ /sbin/mii-tool -v eth0 eth0: 100 Mbit, full duplex, link ok product info: vendor 00:50:43, model 2 rev 3 basic mode: 100 Mbit, full duplex basic status: link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
7. Use Cluster Verification -pre dbcfg Utility
Configures RAC DB
-pre dbinst Installs RAC
-pre crsinst Installs CRS
-post crsinst
-pre cfs Sets up OCFS ( OPT )
-post cfs User sets up the Hardware, network & storage
-post hwos
List of Components Ÿ
$> ./cluvfy comp -list
Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ
Valid components are: nodereach : checks reachability between nodes nodecon : checks node connectivity cfs : checks CFS integrity ssa : checks shared storage accessibility space : checks space availability sys : checks minimum system requirements clu : checks cluster integrity clumgr : checks cluster manager integrity ocr : checks OCR integrity crs : checks CRS integrity nodeapp : checks node applications existence admprv : checks administrative privileges peer : compares properties with peers
CVU locations Ÿ
Oracle DVD – clusterware/cluvfy/runcluvfy.sh – clusterware/rpm/cvuqdisk-1.0.1-1.rpm
Ÿ
CRS Home – $ORA_CRS_HOME/bin/cluvfy – $ORA_CRS_HOME/cv/rpm/cvuqdisk-1.0.11.rpm
Ÿ
Oracle Home – $ORACLE_HOME/bin/cluvfy
Deployment of cluvfy l
Install only on local node. Tool deploys itself on remote nodes during execution, as required. CVU
Ÿ
User installs on local node
Ÿ
Issues verification command for multiple nodes
l
Tool copies the required bits to the remote nodes
l
Executes verification tasks on all nodes and generates report
8. Test, Test, Test… Ÿ Why? – – –
Verify that System Infrastructure meets SLO Verify Correct Install / Configuration Build Skills
Ÿ How? – – – –
– –
Test Plan Separate Test Cluster Realistic Configuration –matching production Realistic Workload Ÿ Best Insurance, but can be difficult / expensive Functional, Performance and Destructive Testing Include Normal and Exception Operational Procedures
9. Monitor Performance Ÿ Ÿ Ÿ Ÿ
Establish Performance Baseline AWR / Statspack ADDM Active Session History
ADDM ( excerpt ) FINDING 5: 7.1% impact (5377 seconds) ------------------------------------Read and write contention on database blocks was consuming significant database time in the cluster. RECOMMENDATION 1: Schema, 2.7% benefit (2074 seconds) ACTION: Consider partitioning the INDEX "FHUS_NEW.PK_OBJECT_STORE" with object id 101859 in a manner that will evenly distribute concurrent DML across multiple partitions. RELEVANT OBJECT: database object with id 101859 RATIONALE: The INSERT statement with SQL_ID "gf99f4tkwcvt9" was significantly affected by "global cache buffer busy". RELEVANT OBJECT: SQL statement with SQL_ID gf99f4tkwcvt9 insert into OBJECT_STORE (VERSION, TYPE_ID, CREATE_DATE, UPDATE_DATE, STATE_ID, ENCODING_TYPE, COMPRESSION_TYPE, STORED_DATA_SIZE, ORIGINAL_DATA_SIZE, FORWARD_ID, OBJECT_ID) values (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11) RATIONALE: The INSERT statement with SQL_ID "cfpgqrsxcfhg1" was significantly affected by "global cache buffer busy". RELEVANT OBJECT: SQL statement with SQL_ID cfpgqrsxcfhg1 insert into OBJECT_REFERENCE (OBJECT_ID, REFERENCE_ID) values (:1, :2)
Identifies SQL and objects with serializing contention @?/rdbms/admin/addmrpt.sql
Active Session History Report
@?/rdbms/admin/ashrpt.sql
10. Prepare Diagnostics Procedures, just in case… Ÿ Capture max possible diagnostics before restarting system etc. –
Tradeoff with recovery time
Ÿ Oracle side – – – –
Hanganalyze Systemstate dumps gv$views Statspack / AWR
Ÿ OS setup, e.g. Linux – – – –
Netdump Utility (Metalink note 226057.1) Alt SysRq Keys Utility (Metalink node 228203.1) Serial Console (Metalink node 228204.1) Performance / load data
And Don’t Forget … Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ
Synchronize Time across cluster –NTP Use ASM for Database Storage Use Grid Control Cache Sequences Use ASSM for all Tablespaces Linux hugetlb for large memory Etc.
Summary 1. Define Service Level Objectives 2. Keep It Simple 3. Get ready for the Grid 4. Use Oracle10g Release 2 5. Implement Workload Management 6. Configure the Interconnect Correctly 7. Use Cluster Verification Utility 8. Test, Test, Test… 9. Monitor Performance 10. Prepare Diagnostics Procedures, just in case…
Q& A
QUESTIONS ANSWERS