Rac Best Practices - Rac Sig 9dec05[1]

  • Uploaded by: Ankush Kumar
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Rac Best Practices - Rac Sig 9dec05[1] as PDF for free.

More details

  • Words: 1,464
  • Pages: 34
Best Practices for 10g RAC December 2005

Roy Rossebo RAC Pack Oracle Corporation

“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”

Session Objectives Ÿ Best Practices based on latest field experiences. Ÿ Avoid some common issues. Ÿ Get ready for the Grid. Beyond 2-node clusters.

Ÿ Assume that basic RAC architecture is known

1. Define Service Level Objectives Ÿ Cannot be Successful without Objectives Ÿ Realistic and Measurable Objectives – – – –



Driven by and linked to business objectives Existing service levels typically provide the baseline Max planned/unplanned downtime Performance requirements: #user, response times, jobs/transactions per time period Consolidation, cost saving

Realistic Expectations Ÿ If your application will scale transparently on SMP, then it is realistic to expect it to scale well on RAC, without having to make any changes to the application code. Ÿ RAC eliminates the database instance, and the node itself, as a single point of failure, and ensures database integrity in the case of such failures

2. Keep It Simple Ÿ Use Proven Configurations Ÿ Certified Components – –

Metalink Certification matrix Vendor certifications

Ÿ Minimize number of Components – –

Other Cluster Manager required anymore? Volume Managers needed anymore?

Ÿ Partner with Vendors Ÿ Don’ t be too ‘ Creative’

3. Get ready for the Grid Ÿ 2 node clusters give no/few Grid benefits – –

30+ customer with 6+ nodes in cluster 4+ node clusters are the norm for new systems

Ÿ Large Clusters = Large HW Savings Ÿ Provisioning, Capacity on Demand Ÿ RAC performance scales better than SMP for large number of CPUs Ÿ Grid Control: RAC System Management very similar to Single Instance

4. Use Oracle10g Release 2 Ÿ Ÿ Ÿ Ÿ

OCR and Voting Disk mirroring Simpler Install Process for Large Clusters Significant Performance Improvements Framework for Workload Management in Large Clusters: Services, FAN and LBA Ÿ Clusterware High Availability API Ÿ Etc.

Before Oracle10g Release 2 Buffer Cache

Buffer Cache

Gcs message to master

Read from disk

Object,e.g. Table Data Blocks

Messages are sent to remote node when reading into cache

Object Affinity Buffer Caches

Object,e.g. Table Data Blocks

NO Messages are sent to remote node when reading into cache

5. Implement Workload Mgt. Ÿ Workload Management for Clusters –more then just load balancing – – – –

Effective Utilization of Cluster Resources Minimize Synchronization Cost in Cluster Monitor and Optimize Response Times Reduce Network Delays During Failover

Ÿ Take advantage of Services, FAN and LBA

Connect to Services Service: GL Preferred 1,2,3,4 Service: ERP Preferred: 1,2 Available: 3,4 Instance: FINPROD1

Instance: FINPROD2

Database Service: FINPROD

Service: CRM, Preferred: 3,4 Available: 1,2 Instance: FINPROD3

Instance: FINPROD4

Initial State C1 C2 C3 C4 C5

Instance 1 Instance 2

C6

Service across RAC

Connection Pool

Instance Join C1 C2 C3 C4 C5

Instance 1

C6

Instance 3

Instance 2

Service across RAC

Connection Pool

Instance Join –Undesired Cases Option 1: Nothing Happens C1 C2 C3 C4 C5

Instance 1

C6

Instance 3

Instance 2

Service across RAC Connection Pool

Instance Join –Undesired Cases Option 2: Add New Connections Randomly C1 C2 C3 C4 C5 C6 C7 C8 C9

Connection Pool

Instance 1 Instance 2 Instance 3

Service across RAC

Instance Join –Desired Result C1 C2 C3 C4 C5 C6 C7 C8 C9

Connection Pool

Instance 1 Instance 2 Instance 3

Service across RAC

Node Leaves C1 C2 C3 C4 C5 C6 C7 C8 C9

Connection Pool

Instance 1 Instance 2

Service across RAC

Runtime Connection Load Balancing CRM Client connection requests connection cache

? 60%

“CRM is bored”

30%

10%

“CRM is very busy”

Instance 1

Instance 2

Instance 3

“CRM is busy”

6. Configure Interconnect Correctly Ÿ Use UDP over Gigabit Ethernet for Interconnect! Ÿ OS Bonding for Failover & Loadbalancing –

NIC bonding, multiple switches

Ÿ Set UDP buffers to max Check: $ /sbin/sysctl net.core.rmem_max net.core.wmem_max net.core .rmem_default net.core.wmem_default net.core.rmem_max = 262144 net.core.wmem_max = 262144 net.core.rmem_default = 262144 net.core.wmem_default = 262144

Ÿ Use switch(es) - crossover cable not supported Ÿ Eliminate any Transmission Problems Ÿ LMS CPU Resources

Correct network used? select * from gv$cluster_interconnects; INST_ID ------1 2

NAME ---eth2 eth2

IP_ADDRESS ------------138.2.238.74 138.2.238.75

IS_PUBLIC --------NO NO

SOURCE ------------------------Oracle Cluster Repository Oracle Cluster Repository

SQL> oradebug setmypid Statement processed. SQL> oradebug ipc Information written to trace file. SQL> oradebug tracefile_name SSKGXPT 0xcc1c78c flags SSKGXPT_READPENDING socket no 7 IP 138.2.238.74 sflags SSKGXPT_UP

info for network 0 UDP 33594

Correct network used? $ oifcfg getif eth0 138.2.236.114 global public eth2 138.2.238.74 global cluster_interconnect $ocrdump $view OCRDUMPFILE ... [SYSTEM.css.node_numbers.node2.privatename] ORATEXT : stnsp014-rac SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}

Interconnect Ÿ Jumbo Frames (NIC and switch) –test properly first! $ /sbin/ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:0E:0C:09:6C:81 inet6 addr: fe80::20e:cff:fe09:6c81/64 Scope:Link UP BROADCAST RUNNING NOARP MULTICAST MTU:1500 Metric:1 … $ ping -s 8972 -M do stnsp013-rac From stnsp014-rac (138.2.238.74) icmp_seq=0 Frag needed and DF set (mtu = 1500)

Ÿ Full Duplex $ /sbin/mii-tool -v eth0 eth0: 100 Mbit, full duplex, link ok product info: vendor 00:50:43, model 2 rev 3 basic mode: 100 Mbit, full duplex basic status: link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

7. Use Cluster Verification -pre dbcfg Utility

Configures RAC DB

-pre dbinst Installs RAC

-pre crsinst Installs CRS

-post crsinst

-pre cfs Sets up OCFS ( OPT )

-post cfs User sets up the Hardware, network & storage

-post hwos

List of Components Ÿ

$> ./cluvfy comp -list

Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ

Valid components are: nodereach : checks reachability between nodes nodecon : checks node connectivity cfs : checks CFS integrity ssa : checks shared storage accessibility space : checks space availability sys : checks minimum system requirements clu : checks cluster integrity clumgr : checks cluster manager integrity ocr : checks OCR integrity crs : checks CRS integrity nodeapp : checks node applications existence admprv : checks administrative privileges peer : compares properties with peers

CVU locations Ÿ

Oracle DVD – clusterware/cluvfy/runcluvfy.sh – clusterware/rpm/cvuqdisk-1.0.1-1.rpm

Ÿ

CRS Home – $ORA_CRS_HOME/bin/cluvfy – $ORA_CRS_HOME/cv/rpm/cvuqdisk-1.0.11.rpm

Ÿ

Oracle Home – $ORACLE_HOME/bin/cluvfy

Deployment of cluvfy l

Install only on local node. Tool deploys itself on remote nodes during execution, as required. CVU

Ÿ

User installs on local node

Ÿ

Issues verification command for multiple nodes

l

Tool copies the required bits to the remote nodes

l

Executes verification tasks on all nodes and generates report

8. Test, Test, Test… Ÿ Why? – – –

Verify that System Infrastructure meets SLO Verify Correct Install / Configuration Build Skills

Ÿ How? – – – –

– –

Test Plan Separate Test Cluster Realistic Configuration –matching production Realistic Workload Ÿ Best Insurance, but can be difficult / expensive Functional, Performance and Destructive Testing Include Normal and Exception Operational Procedures

9. Monitor Performance Ÿ Ÿ Ÿ Ÿ

Establish Performance Baseline AWR / Statspack ADDM Active Session History

ADDM ( excerpt ) FINDING 5: 7.1% impact (5377 seconds) ------------------------------------Read and write contention on database blocks was consuming significant database time in the cluster. RECOMMENDATION 1: Schema, 2.7% benefit (2074 seconds) ACTION: Consider partitioning the INDEX "FHUS_NEW.PK_OBJECT_STORE" with object id 101859 in a manner that will evenly distribute concurrent DML across multiple partitions. RELEVANT OBJECT: database object with id 101859 RATIONALE: The INSERT statement with SQL_ID "gf99f4tkwcvt9" was significantly affected by "global cache buffer busy". RELEVANT OBJECT: SQL statement with SQL_ID gf99f4tkwcvt9 insert into OBJECT_STORE (VERSION, TYPE_ID, CREATE_DATE, UPDATE_DATE, STATE_ID, ENCODING_TYPE, COMPRESSION_TYPE, STORED_DATA_SIZE, ORIGINAL_DATA_SIZE, FORWARD_ID, OBJECT_ID) values (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11) RATIONALE: The INSERT statement with SQL_ID "cfpgqrsxcfhg1" was significantly affected by "global cache buffer busy". RELEVANT OBJECT: SQL statement with SQL_ID cfpgqrsxcfhg1 insert into OBJECT_REFERENCE (OBJECT_ID, REFERENCE_ID) values (:1, :2)

Identifies SQL and objects with serializing contention @?/rdbms/admin/addmrpt.sql

Active Session History Report

@?/rdbms/admin/ashrpt.sql

10. Prepare Diagnostics Procedures, just in case… Ÿ Capture max possible diagnostics before restarting system etc. –

Tradeoff with recovery time

Ÿ Oracle side – – – –

Hanganalyze Systemstate dumps gv$views Statspack / AWR

Ÿ OS setup, e.g. Linux – – – –

Netdump Utility (Metalink note 226057.1) Alt SysRq Keys Utility (Metalink node 228203.1) Serial Console (Metalink node 228204.1) Performance / load data

And Don’t Forget … Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ

Synchronize Time across cluster –NTP Use ASM for Database Storage Use Grid Control Cache Sequences Use ASSM for all Tablespaces Linux hugetlb for large memory Etc.

Summary 1. Define Service Level Objectives 2. Keep It Simple 3. Get ready for the Grid 4. Use Oracle10g Release 2 5. Implement Workload Management 6. Configure the Interconnect Correctly 7. Use Cluster Verification Utility 8. Test, Test, Test… 9. Monitor Performance 10. Prepare Diagnostics Procedures, just in case…

Q& A

QUESTIONS ANSWERS

Related Documents

Rac
December 2019 22
Rac
August 2019 18
Rac File.pdf
April 2020 4

More Documents from "kfdkfjdk"