Larson - Resource Mapping

  • Uploaded by: rockerabc123
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Larson - Resource Mapping as PDF for free.

More details

  • Words: 1,829
  • Pages: 38
Resource Mapping A Wait Time Based Methodology for Database Performance Analysis Prepared for Hotsos Symposium, 2005 Presented by Matt Larson Chief Technology Officer Confio Software

Presentation Agenda       2

Introduction Conventional Tuning vs. Wait-based Tuning Foundation: Resource Mapping Methodology 5 Key Steps of Applying RMM RMM Advantages Conclusion

Who am I?

 

Former DBA consultant specializing in Oracle performance tuning Co-author of three Oracle books (Oracle Development Unleashed, Oracle Unleashed 2 Edition, Oracle8 Server Unleashed) Co-author of two other database related books CTO and founder of Oracle performance software company nd

  3

Problems with Conventional Tuning Tools: Like the Drunk Under the Streetlight

4

Conventional Tuning

    

5

Art, not a science Ratio-based (cache hit ratios, etc.) Sometimes fruitless It’s “tuned” (I guess?) Different tuning/investigation process for each DBA/DBA Team/Company

Problems with Conventional Tuning Tools

 

Optimize systems, not business results Conventional tools: • V$ Views: limited visibility & granularity • Statspack: averages across entire database • Explain Plan: deemphasizes how non-object resources affect performance



Incorrect Data hides real results • System-wide averages • Event counters • Incomplete visibility

6

What Problems are you Trying to Solve? • I spend the whole week monitoring and optimizing Oracle configurations, but I have no demonstrable results to show for it - why? • Will more hardware make my application run faster? By how much? • Will the new application run efficiently on the production server? • Why does one application keep impacting my SLA compliance? • If I could make one (or 2, 3, or 4) changes to my database to have the biggest impact, what would they be? 7

You know you are working on the wrong thing when…



After spending an agonizing week tuning Oracle buffers to minimize I/O operations, management typically rewards you with: • • • •

8

A. B. C. D.

An all expense paid vacation A free lunch A stale donut Reward? Nobody even noticed!

You know you have a visibility problem when…



You measure database performance based on: • • • •

9

A. B. C. D.

Increasing trends in user response time Increasing system down time Increasing help desk calls Increasing decibel levels from irate users

Your role is sub-essential to the business of your organization when…



Your role in the rollout of a new customer facing application results in: • • • •

10

A. Keys to drive the CEO’s Porsche B. Keys to use the executive restroom C. A mop to use in the executive restroom D. Your office has been moved to the restroom

You know you are accustomed to measuring the wrong thing when…



You measure the commute time to work based on: • • • •

11

A. B. C. D.

The time it takes to get there Counting the times your wheels rotate Monitoring your tachometer The number of speeding tickets

Wait-based Performance Tuning   

 12

Emerging best-practice for database tuning Proponents include leading consultants, trainers and authors Oracle is starting to build wait-based tuning tools into the database particularly in 10g Tune by determining where processing time is spent

Oracle 10g - Moving towards waitbased  

Adding wait-based columns to existing views New wait-based views

Example: v$session_wait_history • • •

13

Provides the last 10 wait events for a session Session ID, Username, Event, Wait_Time, etc. Used to provide wait_time for only a few events

DBA Success Stories using RMM 





14

DBA solves a “Cold Case”. Problem unresolved for 1 year with traditional tools; Solution identified in 10 minutes during hands-on training DBA ends “Crit Sit” 2 week situation ends quickly after identification of Library Cache pin wait and load locks. Metalink identifies Oracle bug, patch successfully applied DBA saves $700K. 90% CPU capacity initiates expansion from 12 to 24 CPU server. DBA identifies parallel queries across 16 parallel threads as source of bottleneck. CPU eliminated as constraint, no new server required.

RMM: Confio’s Underlying Methodology 

Resource Mapping Methodology: Wait-Event Analysis General approachbest practice



Resource Mapping Methodology Rigorous, complete requirement s

DBFlash Packaged product implementatio n

Three Key Principles of RMM 1. SQL View: All statistics at SQL statement level 2. Time View: Measure Time, not number of times a resource is utilized 3. Full View: Separately measure every resource to isolate source of problems

15

Confio’s Resource Mapping Methodology • The principles of RMM can be illustrated by using the analogy that data processing is like an assembly line. Data goes in one end, is subject to a series of changes, and comes out the other end as a finished product • The assembly line (or SQL Statement) must be observed at the lowest level where a unit of work is being performed (SQL View Principle) • Measurements are made with regard to time instead of counting how often an event occurs (Time View Principle) • All resources system-wide must be monitored to get a full view of Counters potential bottlenecks i.e. no blind spots (Full ViewCounters Principle) CPU 74%

16

Blind Spot

CPU 38%

Reads 1789327

Reads 4955

145 seconds

8726 seconds

Time

Follow a unit of work through every operation

Time

Blind Spot

Track SQL Time, Not System Counters • Watching Counters leads to wrong conclusions: Time is more relevant • Total System Counters hide information: Need breakdown to individual SQLs Total System 80K Reads Counter

SQL 1 SQL 2 SQL 3 Resources 17

30 Minutes 5R 15M 25 R 5M 50 Reads

I/O

5K Packets

4M

200 Minutes

5M

Network

125 Attempts 216K Writes

6M 50 A 10 M 35 A 100 Minutes 50 A

Locks

4M

200 Minutes

5M

Redo

RMM-compliant Performance Tools 

Oracle Tracing • RMM compliant when wait events are traced • Shows SQL level statistics (SQLView), all events (FullView) and events by time (TimeView) • Text-based, short-term technical reporting • Primarily used for reactive tuning



Confio DBFlash for Oracle • • • •

18

RMM compliant 24/7 proactive monitoring Graphical, long-term trend reporting RMM-based Alerting

Applying RMM for Business Results Five Step Process focusing on what matters 1. Identify

19

2. Allocate

3. Quantify

4. Prioritize

5. Assign

Step 1: Identify



Identify SQL Statements having largest impact • (SQL View and Time View principles)

  20

Longest wait times = most significant “pain points” for customers Conversely, low cache hit ratios or high latch usage may not impose high wait times for users (so why fix

SQL statements prioritized by Total Wait Time

Step 2: Allocate

 

Allocate impact to real customers (internal or external) Allocate wait time to Program, Session, Machine • SQL View principle makes this connection

 21

Understanding database customer and application

Programs Prioritized by Total Wait Time

Step 3: Quantify   

How much is save in time/money if fixed? Enabled by Full View and Time View principles Soft dollar savings • Data entry clerks • DBA time spent in problem resolution



Hard dollar savings • Reduce hardware upgrades • Meet SLA’s avoiding penality • Ensure business isn’t lost due to poor performing or unavailable system

Quantifiable benefit of Tuning a specific statement 22

Step 4: Prioritize   

23

If last step properly executed, this step is fairly straight forward Allow’s DBA to cut through the clutter of potential new projects, investigations, and trials. Better justification for priorities. (e.g. We aren’t working on your problem since this other has a higher demonstrable business impact)

Step 5: Assign 

Assign the right people to the problem • Log_buffer waits • Network issues • Same query 10,000/hour

 

24

Enabled by Full View principle Avoid finger-pointing by accurately assigning quickly

Resource Mapping Methodology RMM

Wait Based Tuning

25

Network, Storage, Application, Web, etc.

Silo Monitoring Business Management

LIMITED VIEW

IT Management

LIMITED VIEW

Web Team

Sitescope

Custom App Team Network Team Database/OS Teams Storage/OS Teams

Software Layers Web Server

Often No Commercial Tools Custom Biz Logic

HP Openview Wait-based tuning EMC Control Center

Network Database Server Storage Box

Each team uses their own tool to partially monitor their non-Oracle layers. No view across layers. Management has no clear view. 26

The Solution - Integrated Vision Business Management

RMM across the stack

IT Management Web Team

Web Server

Custom App Team

Custom Biz Logic

Network Team

Network

Database/OS Teams

Database Server

Storage/OS Teams

Storage Box

All teams see a complete picture of all layers and dependencies. Enables more efficient “Umbrella” solution.

27

RMM Achieved Business Benefits

28

RMM Does:

Business Benefit:

35% reduction in database capacity requirement

Reduce capital investment Avoid unnecessary additions Recovers un-used capacity

Standardizes “expert” analysis ability across entire DBA team performance Quantifies

Reduce training & consulting costs

impact

Focus tuning efforts on biggest business impacts

Identifies problem Root Cause and resolution

Assign human resources and responsibility

Anticipates + resolves performance bottlenecks

Maintain SLA and end user performance

Example 1: Problem Observed 

Critical situation: Secure Service Center application performance unsatisfactory • Response time between 2400 and 9000 seconds • Very high network traffic (3x—4x normal), indicating time-outs and user refreshes • “CritSit” declared: major effort to resolve problem

29

Observations using Resource Mapping Methods  

1: Identify accumulated Waits 2: Identify specific resources used

Lib cache pin wait

Lib cache load lock 30

Notice scale: > 8000 secs

Results

Library cache pin nearly unobservable

Library cache load lock no longer observable

31

Notice scale: < 1400 secs max

Results  

Response time improvement from 8000 seconds (worst case) to 900 seconds Variance improvement: • Before: response time 2400 - 8000 sec • After: response time 800 - 900 sec

32

Example 2: Performance Drain – Identify the Source     

33

Slow response reported DBA and database focus of delays Database problem? No – SQL*Net Message identified as source of delay 2nd highest wait event

RMM Drill Down identifies source of problem     

Single application generates all SQL*Net Messages App on same server as Oracle! Answer: Misconfiguration – TCP/IP used within server Change to IPC, eliminate NIC traffic and 30% of wait time

Solution requires knowing: Which SQL, What Wait Time, Which Resource 34

Example 3: Scattered Reads  

Situation: LINS06 database - Hourly profile identifies high wait anomaly 3-10x higher than other periods – requires investigation

wait time 42,000 seconds 10:00-11:00

35

Drill Down to Key RMM Parameters

Db file scattered reads

Db file scattered reads 36

Notice scale: > 6000 secs

Conclusion  

Look for what has an impact Resource Mapping is more that Wait Time – Analysis must include: • SQL level granularity • Full Resource granularity

 

37

Isolating the SQL and Resource allows you to find and fix the Root Cause DBAs can have an impact and be heroes!

Thank you for coming Matt Larson Contact Information • [email protected] • 303-938-8282 ext. 110 • Company website www.confio.com

38

Related Documents


More Documents from ""