IBM Informix Dynamic Server Performance Tuning
IBM Data Management Solutions Education Services
Course #: L2-403 Version 6 09-2002 000-9272 November 5, 2002
Trademarks IBM and the IBM logo are registered trademarks of International Business Machines Corporation. The following are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both: AnswersOnLine; C-ISAM; Client SDK; Cloudscape; DataBlade; DynamicScalableArchitecture; DynamicServer; DynamicServer.2000; DynamicServerwithAdvancedDecision SupportOption; DynamicServer withExtended ParallelOption; DynamicServerwithUniversalDataOption; DynamicServerwithWebIntegration Option; DynamicServer,WorkgroupEdition; Foundation.2000; Illustra; Informix; Informix4GL; InformixExtendedParallelServer; InformixInternet Foundation.2000; InformixRedBrick DecisionServer; J/Foundation; MaxConnect; ON-Bar; OnLineDynamicServer; RedBrickandDesign; RedBrickDataMine; RedBrickDecisionServer; RedBrickMineBuilder; RedBrickDecisionscape; RedBrickReady; RedBrickSystems; RelyonRedBrick; UniData; UniData&Design; UniversalDataWarehouseBlueprint; UniversalDatabaseComponents; UniversalWebConnect; UniVerse; VirtualTableInterface; Visionary; WebIntegrationSuite
Microsoft, Windows, Window NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java, JDBC, and all Java-based trademarks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.
The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without any warranty either express or implied. The use of this information of the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. The original repository material for this course has been certified as being Year 2000 compliant. © Copyright International Business Machines Corporation 2001, 2002. All rights reserved. This document may not be reproduced in whole or in part without the prior written permission of IBM. Note to U.S. Government Users—Documentation related to restricted rights—Use, duplication, or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
iii
Objectives At the end of this course, you will be able to: n n n n n n
Use Informix Dynamic Server monitoring utilities Identify optimal fragmentation strategies Identify strategies for improving data loads and indexing Manage memory and CPU system resources Tune an OLTP environment Tune checkpoints
Prerequisites To maximize the benefits of this course, we require that you have met the following prerequisites: n n n
iv
Managing and Optimizing Informix Dynamic Server Databases Informix Dyanamic Server System Administration (Unix) or equivalent Experience using Informix Dynamic Server
Acknowledgments Course Developer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kelvin Quimby Technical Review Team. . . . . . . . . . . . . . . . . . . . . . . . Mike Lowe, Shirley Brost, Rickard Linck, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Mark Scranton, Gene Rebman Course Production Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Susan Dykman
Further Information To find out more about IBM education solutions and resources, please visit the IBM Education website at http://www-3.ibm.com/software/info/education. Additional information about IBM Data Management education and certification can be found at http://www-3.ibm.com/software/data/education.html. To obtain further information regarding IBM Informix training, please visit the IBM Informix Education Services website at http://www-3.ibm.com/software/data/informix/education.
Comments or Suggestions Thank you for attending this training class. We strive to build the best possible courses, and we value your feedback. Help us to develop even better material by sending comments, suggestions and compliments to
[email protected].
v
vi
Table of Contents Module 1
An Introduction To Performance Tuning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 Performance Tuning Begins with a Problem . . . . . . . . . . . . 1-3 Set Specific and Quantifiable Goals . . . . . . . . . . . . . . . . . . 1-4 Rules of Performance Tuning . . . . . . . . . . . . . . . . . . . . . . . 1-5 Factors Impacting Performance . . . . . . . . . . . . . . . . . . . . . 1-7 Optimizing Disk I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-9 Minimize Disk Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Minimize Disk Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11 Maximize Data Transferred in a Single I/O . . . . . . . . . . . . 1-12 Factors Affecting Disk Performance . . . . . . . . . . . . . . . . . 1-13 Disk Transfer Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 10 to 1 Performance Gains in I/O . . . . . . . . . . . . . . . . . . . 1-15 Light Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-16 Eliminate Disk Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . 1-18 Parallel I/O Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19 Optimizing Device I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 Pfread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21 Measuring I/O Throughput . . . . . . . . . . . . . . . . . . . . . . . . . 1-22 pfread.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24 Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25 RAID Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-27 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28 Optimizing CPU Utilization by CPUVPs . . . . . . . . . . . . . . 1-29 Multiprocessor Configuration . . . . . . . . . . . . . . . . . . . . . . . 1-31 Shared Memory Components . . . . . . . . . . . . . . . . . . . . . . 1-33 System Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-34 OLTP Shared Memory Tuning . . . . . . . . . . . . . . . . . . . . . . 1-36 DSS Shared Memory Tuning . . . . . . . . . . . . . . . . . . . . . . . 1-37 Tuning SHMVIRTSIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-38 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-40 Application/Database Design and Implementation . . . . . . 1-41
vii
Module 2
Monitoring Utilities Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Informix Monitoring Utilities . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 onstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4 SMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 oncheck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 ipcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Cleaning Up after an Engine Crash . . . . . . . . . . . . . . . . . . . 2-8 Unix Monitoring Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 System Activity Reporter (sar) . . . . . . . . . . . . . . . . . . . . . . 2-10 Performance History via SAR . . . . . . . . . . . . . . . . . . . . . . 2-11 time and timex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 Process Status (ps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 The iostat Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14 Virtual Memory Statistics (vmstat) . . . . . . . . . . . . . . . . . . . 2-16
Module 3
Optimizing Load Performance Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Data Loading Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 Wisconsin Benchmark Specification . . . . . . . . . . . . . . . . . . 3-4 Loading Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 SQL LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Running SQL LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 dbload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 Running dbload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 ESQL/C INSERT CURSOR . . . . . . . . . . . . . . . . . . . . . . . . 3-10 ESQL/C Code Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 High Performance Loader . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 HPL: The Data Load Process . . . . . . . . . . . . . . . . . . . . . . 3-13 HPL: Component Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 HPL: Deluxe Load Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 HPL: Express Load Mode . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 PLCONFIG Tuning with the HPL . . . . . . . . . . . . . . . . . . . . 3-17 HPL Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 Indexes, Constraints, Logging and Extent Sizes . . . . . . . . 3-24 Turning Logging Off at the Table Level . . . . . . . . . . . . . . . 3-25 Violations and Diagnostics Tables . . . . . . . . . . . . . . . . . . . 3-26 Fragmentation Strategies Loading Data . . . . . . . . . . . . . . 3-28 Optimizing Loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29
viii
Optimizing Loads Continued . . . . . . . . . . . . . . . . . . . . . . . 3-30 Other Tips for Loading Data . . . . . . . . . . . . . . . . . . . . . . . 3-31 Load Performance Discussion . . . . . . . . . . . . . . . . . . . . . . 3-34
Module 4
Indexing Performance Issues Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Benefits and Costs of Indexing . . . . . . . . . . . . . . . . . . . . . . 4-3 B+ Tree Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 The Leaf Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 B+ Tree Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6 FILLFACTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 Pre - 9.x Tablespace Architecture . . . . . . . . . . . . . . . . . . . . 4-9 9.x: Separate Tablespaces for Data and Indexes . . . . . . . 4-10 Explicit Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 Implicit Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12 Use Explicit Indexes for Constraints . . . . . . . . . . . . . . . . . 4-13 Clustered Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14 Monitoring Clustered Indexes . . . . . . . . . . . . . . . . . . . . . . 4-15 Parallel Index Builds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18 Index Builds Are From the Bottom Up . . . . . . . . . . . . . . . . 4-19 Calculating Parallel Index Build Threads . . . . . . . . . . . . . . 4-20 Environment Variable Settings . . . . . . . . . . . . . . . . . . . . . 4-21 Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 4-22 Using Onmode for Index Builds . . . . . . . . . . . . . . . . . . . . . 4-23 How Did Index Creation Times Change? . . . . . . . . . . . . . 4-26
Module 5
Optimizing Update Statistics Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 The Query Optimizer and UPDATE STATISTICS . . . . . . . . 5-3 Important! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 Data Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 More Accurate Data Distributions . . . . . . . . . . . . . . . . . . . .5-7 Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 Update Statistics Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 New Performance Enhancements . . . . . . . . . . . . . . . . . . . 5-12 Improved Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 Tuning Memory Utilization . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
ix
Information Reported to DBA . . . . . . . . . . . . . . . . . . . . . . 5-15 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 Tuning Parameters for Updating Statistics . . . . . . . . . . . . 5-18 What Is the Impact On Performance? . . . . . . . . . . . . . . . . 5-23
Module 6
OLTP Performance Tuning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 The OLTP vs. DSS Spectrum . . . . . . . . . . . . . . . . . . . . . . . 6-3 OLTP Performance Tuning . . . . . . . . . . . . . . . . . . . . . . . . .6-4 Optimizing Read Caching . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 Write Cache Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 How Many Buffers? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-7 LRU Queues (LRUs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 Managing LRU Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 Monitoring LRU Queues . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 Monitoring LRU Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 Fuzzy Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 Full (Sync) Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 Checkpoint Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15 Disk and Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16 Balancing I/O Across Disks . . . . . . . . . . . . . . . . . . . . . . . . 6-17 Monitor IBM IDS I/O Activity . . . . . . . . . . . . . . . . . . . . . . . 6-18 Tuning LRU Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 AIO vs. Kernel Asynchronous I/O . . . . . . . . . . . . . . . . . . . 6-20 Monitoring Disk & Buffer Activity . . . . . . . . . . . . . . . . . . . . 6-22 Configuring AIOVPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23 Tuning the Log Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24 Read-Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-26 Monitoring Read-Aheads, onstat -p . . . . . . . . . . . . . . . . . . 6-28
Module 7
Fragmentation Strategies Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 What is Fragmentation? . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 Advantages of Fragmentation . . . . . . . . . . . . . . . . . . . . . . .7-4 Basic Fragmentation Issues . . . . . . . . . . . . . . . . . . . . . . . . 7-5 Distribution Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 Round Robin Fragmentation Guidelines . . . . . . . . . . . . . . .7-7 Expression Fragmentation Guidelines . . . . . . . . . . . . . . . . . 7-8
x
Fragmentation and DSS . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 DSS Fragmentation Matrix . . . . . . . . . . . . . . . . . . . . . . . . 7-10 DSS Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11 DSS Fragmentation Example . . . . . . . . . . . . . . . . . . . . . . 7-12 DSS Fragmentation Example, Continued . . . . . . . . . . . . . 7-13 Fragmentation and OLTP . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 OLTP Fragmentation Matrix . . . . . . . . . . . . . . . . . . . . . . . 7-15 OLTP Query Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16 OLTP Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17 Attached Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18 Detached Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 Fragmented Index Scenarios . . . . . . . . . . . . . . . . . . . . . . 7-20 Attached Index on a Non-fragmented Table . . . . . . . . . . . 7-21 Attached Index on a Fragmented Table . . . . . . . . . . . . . . 7-22 Attached Index on a Fragmented Table - Example 1 . . . . 7-23 Attached Index on a Fragmented Table - Example 2 . . . . 7-24 Detached Index on a Non-fragmented Table . . . . . . . . . . 7-25 Detached Fragmented Index, Non-fragmented Table . . . . 7-26 Detached Index on a Fragmented Table . . . . . . . . . . . . . . 7-27 Detached Fragmented Index on a Fragmented Table . . . . 7-28 Detached Index on a Fragmented Table - Example . . . . . 7-29 Planning Expression Based Fragmentation . . . . . . . . . . . 7-30 Fragment Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31 Improving Data Availability . . . . . . . . . . . . . . . . . . . . . . . . 7-32 Avoid Deletes with ALTER FRAGMENT . . . . . . . . . . . . . . 7-33
Module 8
Managing Query Resources Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 Parallel Database Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3 How to Turn on PDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4 PDQ and Parallel Sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6 PDQ Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9 Allocating Shared Memory for PDQ . . . . . . . . . . . . . . . . . 8-10 PDQ Configuration Parameters . . . . . . . . . . . . . . . . . . . . . 8-11 Changing Parameters Dynamically . . . . . . . . . . . . . . . . . . 8-12 Light Scan Buffer Size and PDQ . . . . . . . . . . . . . . . . . . . . 8-13 The Memory Grant Manager (MGM) . . . . . . . . . . . . . . . . . 8-14 How Memory is Granted . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15 Scan Threads and Secondary Threads . . . . . . . . . . . . . . . 8-16
xi
Monitoring MGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17 onstat -g mgm continued . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18 onstat -g mgm (continued) . . . . . . . . . . . . . . . . . . . . . . . . . 8-20 Concurrent Query Environments . . . . . . . . . . . . . . . . . . . . 8-21 Factors to Consider in Tuning PDQ . . . . . . . . . . . . . . . . . . 8-22
Module 9
Tuning Client Server Communication Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2 Efficient Client Server Communication . . . . . . . . . . . . . . . . 9-3 SQL Statement Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 Using Cursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 FET_BUF_SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-6 OTPOFC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-7 IFX_DEFERRED_PREPARE . . . . . . . . . . . . . . . . . . . . . . . 9-8 OPTMSG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-9 Client Server Communication . . . . . . . . . . . . . . . . . . . . . . 9-10 Poll Threads and Connection Requests . . . . . . . . . . . . . . 9-11 Poll Threads and Listen Threads . . . . . . . . . . . . . . . . . . . . 9-12 Monitoring Listen Threads . . . . . . . . . . . . . . . . . . . . . . . . . 9-13 Listen Thread and sqlexec Thread . . . . . . . . . . . . . . . . . . 9-14 Poll and sqlexec Threads . . . . . . . . . . . . . . . . . . . . . . . . . 9-15 Shared Memory Connections . . . . . . . . . . . . . . . . . . . . . . 9-16 In-Line Poll Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17 Tune Poll Threads For OLTP Applications . . . . . . . . . . . . 9-18 Adding Multiple Listen Threads . . . . . . . . . . . . . . . . . . . . . 9-19 Remote Client Server Challenges . . . . . . . . . . . . . . . . . . . 9-21 Monitoring Network Capacity . . . . . . . . . . . . . . . . . . . . . . . 9-22 Monitoring Network Activity for User/Threads . . . . . . . . . . 9-23 onstat -g ntt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-24 Checking For IBM IDS Reject Entries . . . . . . . . . . . . . . . . 9-25 Network Configuration Options . . . . . . . . . . . . . . . . . . . . . 9-26 Multiple Network Card Configuration: Method 1 . . . . . . . . 9-27 Multiple Network Card Configuration: Method 2 . . . . . . . . 9-28 Tuning Network Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-29 Distributed Database Applications . . . . . . . . . . . . . . . . . . . 9-32
xii
Module 10
Managing Large Object Space Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 Information Explosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 Large Object Storage Options . . . . . . . . . . . . . . . . . . . . . . 10-4 Host File System Example . . . . . . . . . . . . . . . . . . . . . . . . 10-5 Simple Large Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 Blobspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7 Blob Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8 Blobspace Storage Sizing . . . . . . . . . . . . . . . . . . . . . . . . . 10-9 Blobspace Page Sizing . . . . . . . . . . . . . . . . . . . . . . . . . .10-10 Storing TEXT and BYTE Data . . . . . . . . . . . . . . . . . . . . .10-11 Blobpage Storage Statistics . . . . . . . . . . . . . . . . . . . . . . 10-12 Using oncheck -pT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13 Smart Large Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14 Smart Large Object Processing . . . . . . . . . . . . . . . . . . . . 10-15 Sbspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16 Advantages of Sbspaces . . . . . . . . . . . . . . . . . . . . . . . . . 10-17 Sbspace Extent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-18 Using Sbspace: Datablades . . . . . . . . . . . . . . . . . . . . . . 10-19 Sbspace Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20 Sbspace Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21 Allocating Additional Chunks to Sbspace . . . . . . . . . . . . 10-22 Sbspace Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23 Monitoring Sbspaces: oncheck . . . . . . . . . . . . . . . . . . . . 10-25 onstat -g smb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-27
Appendix A
IBM IDS Utilities The Oninit Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 The Onmode Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3 Creating Dbspaces with Onspaces . . . . . . . . . . . . . . . . . . A-4 Adding and Dropping Chunks . . . . . . . . . . . . . . . . . . . . . . A-6 The Onparams Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7 The Onstat Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8 The System Monitoring Interface . . . . . . . . . . . . . . . . . . . A-10 The Oncheck Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-11
Appendix B
Informix Server Administrator
xiii
xiv
Module 1 An Introduction To Performance Tuning
An Introduction To Performance Tuning 07-2002 © 2002 International Business Machines Corporation
1-1
Objectives At the end of this module, you will be able to: n Understand the performance tuning process n Identify the five primary factors in RDBMS performance
2
1-2 An Introduction To Performance Tuning
Performance Tuning Begins with a Problem n
Database or application design issues: How should we design our application/database for the best performance?
n
User complaint(s): This transaction is taking too long.
n
System capacity: Can the current system handle a 50% increase in the size of our database?
3
Performance tuning must begin with an issue or a problem to be solved. The problem can either be a complaint from a user, a decision about the application or database design, or a capacity planning issue.
An Introduction To Performance Tuning 1-3
Set Specific and Quantifiable Goals Performance tuning ends with attainment of specific and quantifiable goals: n “Application supports 600 concurrent users with 98% of transactions completing in less than 1 second.” n “Market trending and analysis system supports 15 concurrent users and 98% of queries complete in less than 5 minutes.”
4
Once the problem is identified, you must set goals for resolving the problem. The success of performance tuning can only be measured in terms of clear and concise goals.
1-4 An Introduction To Performance Tuning
Rules of Performance Tuning
1) Perform Test
4) Document the Change
2) Analyze Statistics
no Is Performance Satisfactory?
3) Make One Change yes
5
Performance tuning must be done in a controlled and predictable environment. Ideally, performance tuning requires exclusive access to all components being tuned: hardware, software, and network. It also requires replication of the environment being tuned. For example, if your goal is to measure performance and tune an application for 600 concurrent users, you need emulation software to replicate 600 concurrent users. Finally, performance tuning is an iterative process. The following steps offer a simple performance tuning model: 1. 2.
Perform a quantifiable, measurable test. Analyze the statistics collected. w
Capture onstat statistics or SMI data to measure Dynamic Server’s performance.
w
Capture output from sqexplain.out to determine the efficiency of the SQL and indexes.
w
Capture Unix statistics such as sar, vmstat, iostat, etc., to measure hardware and operating system performance.
w
Time the execution of processes.
An Introduction To Performance Tuning 1-5
3.
Make one change. w
4.
You may determine that you need to change a configuration parameter, modify the application, redistribute data or increase one of the hardware resources. The most critical factor in successful tuning is that only one change is made, and that the results of that change are analyzed before any other changes are made. Document that change. w
This is critical. Without documentation you will never know what the previous value of a configuration parameter was, or even which configuration parameter had been changed. You also would not know what the effect of the change was.
Return to step one, and continue this process until the performance goal is met.
1-6 An Introduction To Performance Tuning
Factors Impacting Performance The five primary factors impacting RDBMS performance are: n Disk I/O n CPU n Memory n Network n Application
Disks
CPUs
RAM Memory LAN Application 7
The five primary factors impacting RDBMS performance are disk, CPU, memory, network, and application(s). Most of these factors will be discussed in detail throughout this course. n
Disk I/O: The number and speed of disks available is probably the single most important resource affecting the performance of the DBMS. After all, the primary job of a database server is to store and retrieve data, and the database server is limited by the physical capacity of the hardware and I/O subsystem available.
n
CPU: In tuning for maximum performance, once disk performance has been maximized, CPU utilization can be evaluated. IBM IDS provides the flexibility, through its multi-process architecture offering both horizontal and vertical parallelization of tasks, multi-threading, and thread migration, to fully exploit the processing capability of multiple CPU machines.
n
Memory: Memory, like any resource, becomes a performance issue when there is not enough available. In this course, you will monitor IBM IDS memory utilization for various tasks and discuss the impact of insufficient memory.
n
Network: Network capacity can impose a significant impact on performance in client server environments and distributed environments. In this course, you will investigate how the Dynamic Server uses networks and how to optimize database communications across networks.
An Introduction To Performance Tuning 1-7
n
Application: Although outside the scope of this course, poor application design and/or implementation can dramatically reduce overall database throughput and system response time. Optimal tuning of disk, cpu, memory and network cannot overcome an improperly designed or implemented application system.
1-8 An Introduction To Performance Tuning
Optimizing Disk I/O To optimize disk I/O: n Minimize the number of disk reads and writes n Maximize the data transferred in a single I/O n Eliminate disk bottlenecks with intelligent distribution of data n Maximize disk throughput by determining the optimal number of read and write threads per disk
9
The retrieval and storage of data are primary tasks of an RDBMS. Since disk I/O is the slowest component of computing, optimizing disk throughput will often deliver the most significant improvements in performance.
An Introduction To Performance Tuning 1-9
Minimize Disk Reads n n n
Efficient read caching Fragment elimination Filtering unneeded columns and rows through SQL IBM IDS Shared Memory data
index
10
The IBM Informix Dynamic Server architecture uses data caching to minimize disk reads and writes. A well-tuned system should have a read cache percentage of 95% or better. Read caching is most significantly effected by the size of the data buffer cache in the resident portion of shared memory. You may configure the size of the data buffer cache by modifying the BUFFERS parameter in the ONCONFIG file. Another obvious way to minimize disk reads is to read less data. While this may seem simplistic, it is a powerful performance benefit delivered by intelligent partitioning and IDS’s ability to eliminate unneeded table fragments when executing queries
1-10 An Introduction To Performance Tuning
Minimize Disk Writes Increase write caching and minimize physical writes by tuning: n n n n n n
CKPTINTVL LRUS LRU_MAX_DIRTY LRU_MIN_DIRTY PHYSFILE CLEANERS
LRU Queue
kaio queue cleaner thread
kaio thread chunk1
11
To minimize physical writes, a well-tuned OLTP system should target a write cache percentage of 85% of better. Write caching is tuned by modifying the frequency and duration of checkpoints. The frequency of checkpoints is determined by the CKPTINTVL configuration parameter and the size of the physical log. The checkpoint duration may be effected by modifying these configuration parameters: LRUS, LRU_MIN_DIRTY, and LRU_MAX_DIRTY. Forcing most physical writes to occur during checkpoints will provide the best write cache rates, but may decrease productivity in very active OLTP systems because the Dynamic Server will block applications from entering critical sections (activities that modify data) for the duration of a checkpoint. Setting CLEANERS too low can cause performance to suffer whenerver a checkpoint occurs because page cleaners must flush all modified pages to disk during checkpoints. If you do not configure a sufficient number of page cleaners, checkpoints take longer, causing overall performance to suffer. In DSS systems, it may be impossible to achieve a write cache rate of 85% because very little write activity occurs.
An Introduction To Performance Tuning 1-11
Maximize Data Transferred in a Single I/O Read or write as much data as possible in a single I/O operation. n Exploit the read ahead capability for sequential scans n Exploit light scans for reading large tables in DSS environments n Use light appends during loads n Use clustered indexes to enable sequential scans when reading many rows in sorted order
12
In addition to its data caching capabilities, the Informix Dynamic Server provides several features that enable you to maximize the amount of data read or written in a single I/O operation. One of these features is read ahead, which is discussed later in the course.
Light Scans and Light Appends In DSS environments, when a sequential scan of a table that is larger than the data buffer pool is performed under appropriate conditions, the server will perform a sequential read of up to 32 pages. This large read is called a light scan. Similar to light scans, light appends allow up to 32 index or data pages to be written to disk in a single physical write operation.
Clustered Indexes Clustered indexes also help you to maximize the amount of data read in a single operation. If the index is clustered, the data will be stored in the order of the index. The benefit of clustering is that it eliminates the need for reading index nodes, and the need for sorting the data.
1-12 An Introduction To Performance Tuning
Factors Affecting Disk Performance I/O time = overhead time + seek time + latency + data transfer time Command Overhead time - time to process I/O command Seek time - the time it takes to move to a cylinder Latency - the time it takes for the data on the drive to arrive under the disk head Data Transfer time - time to transfer a fixed amount of data to or from disk 13
The I/O time or data access time is a function of the time to transfer the data to or from the disk: n
Command overhead time - generally fixed for a specific platform
n
The time required for the disk head to move or seek to the appropriate cylinder - for a given disk capacity, seek time is shorter for smaller diameter disk drives. The fastest seek time occurs when moving from one track to an adjacent track. The slowest is for a full-stroke between the outer and inner tracks.
n
The time the disk head waits for the disk to rotate until the sector being read reaches the disk head, called latency. The average is the amount of time required for a one-half rotation, or 5.6 ms for 5400 rpm, and 4.2 ms for 7200 rpm, etc.
n
The data transfer time is the (number of KB transfered)/ (data transfer rate in KB)
Example overhead time = 0.5 ms, seek time = 9.5 ms, average rotational latency for 5400 rpm drive = 5.6 ms, and data transfer time of 0.3 ms for 4 KB of data: 0.5 + 9.5 + 5.6 + 0.3 = 15.9 ms
An Introduction To Performance Tuning 1-13
Disk Transfer Rates The rate at which data is read and written to disk depends upon: n number of sectors per track n rotational speed of the drive n number of disk surfaces in a drive (or tracks per cylinder) n the time to switch between heads and cylinders
Theoretical maximum disk transfer rate
=
Sectors per track X 0.5KB Revolution Time
14
The actual rate at which data is read or written to a disk drive depends upon a number of factors. n
Number of sectors per track w
For a given size disk platter with a given number of tracks, squeezing more sectors onto a track means more data.
w
For a given modern disk drive, there are more sectors on the outer tracks, with the number of sectors decreasing as you pass from one “zone” of sectors (called ‘notches’) to the next (towards the center of the disk). The innermost of the twenty or so zones on a drive will have the fewest sectors per track (approximately one half of the number in the outermost track.)
n
Rotational speed of the drive - 5400 rpm in the mid 1990’s, to 15000 rpm more recently means an almost 200% increase in transfer rate based on revolution time alone.
n
Number of disk surfaces in a drive (or tracks per cylinder)
n
The time to switch between heads and cylinders w
Head switch time is the average time the drive takes to switch between two of the heads when reading or writting data
w
Cylinder switch time is the average time the drive takes to move the heads to the next cylinder
1-14 An Introduction To Performance Tuning
10 to 1 Performance Gains in I/O (2KB/(0.01 + (2KB/(4000KB/sec)))) = 190 KB/sec Transfer rate = 4000 KB Avg seek time = 0.01 sec
Data starts here
Head
Top View Track
Data ends here
8K sector
(60 KB/(0.01 + (60 KB/4000 KB/sec))) = 2400 KB/sec 15
Consider a disk with a maximum transfer rate of 4000 KB per second and an average seek time of 10 milliseconds. Normal page reads on a 2 KB machine would enable you to read only 190 KB per second, (2 KB / (.010 + (2 KB / (4000 KB / sec))). Sequential scans of 30 pages would enable you to read 2400 KB per second, (60 KB / (.010 + (60 KB / (4000 KB / sec))). Comparing 190 KB to 2400 KB, you see that a large read, such as those triggered by light scans and read ahead operations, can provide better than a 10 to 1 improvement in read performance.
An Introduction To Performance Tuning 1-15
Light Scans Used for sequential scans of tables larger than the buffer pool
resident portion virtual portion Light Scan Buffer Light Scan Buffer Light Scan Buffer
Scan Threads
dbspace1
rootdbs
dbspace3
dbspace2 16
A potential problem for DSS environments is that a very large table scan might use all or most of the pages available in the resident buffer pool. This could negate the benefits of the resident buffer pool caching by forcing pages that are still in use to be flushed and replacing them with new pages. To avoid this problem and allow the Dynamic Server to support concurrent OLTP and DSS operations, the Dynamic Server provides light scans. Light scans are called light because they are read into specially created light scan buffers in the virtual segment of the Dynamic Server’s shared memory. Performance advantages include the following: n
Larger blocks of data are transferred in one operation, usually 64 KB or 128 KB.
n
The overhead of the buffer pool and the associated LRU queue management is bypassed.
n
Frequently accessed pages are not forced out of the buffer pool when many sequential pages are read for a single query, especially if the optimizer has chosen a sequential scan of the table.
Light scans occur under the following conditions: n
The optimizer chooses a sequential scan of the table.
n
Table must be larger than the resident buffer pool.
n
Application must be performing a read-only operation.
1-16 An Introduction To Performance Tuning
n
Read isolation must be either: w
Dirty read
w
Repeatable read with a shared or exclusive table lock
w
Committed read with a shared table lock.
DSS environments require a large virtual segment and may benefit from a small resident buffer pool. Light scans, which allow large sequential scans to bypass the LRU queue and buffer pool, require that the table being scanned is larger than the resident buffer pool. To ensure that more tables may be read using light scans, you may want to configure an artificially small resident buffer pool. You can determine if light scans are being utilized with the onstat command: onstat -g lsc.
Tip When the table being scanned is smaller than the resident buffer pool, but meets the other criteria for light scans, you may force light scans by setting the environment variable LIGHT_SCANS to the value FORCE. For example, using ksh: export LIGHT_SCANS=FORCE
An Introduction To Performance Tuning 1-17
Eliminate Disk Bottlenecks IBM Informix Dynamic Server allows you to virtually eliminate disk bottlenecks through a combination of techniques. n For large tables with heavily used indexes, you can distribute the table or index across multiple disks using fragmentation n Take advantage of disk striping to maximize throughput for tables that are too small to fragment but are very read or write intensive
18
The Informix Dynamic Server supports intelligent partitioning of tables and indexes using the FRAGMENT clause in CREATE TABLE and CREATE INDEX statements. Intelligent partitioning of data and indexes allows you to: n
Evenly distribute I/O requests across multiple disks.
n
Reduce I/O by eliminating unneeded fragments.
n
Optimize I/O by scanning disks in parallel.
For tables that are not good candidates for partitioning but are very heavily accessed, the disk striping technology available from most hardware vendors can be used to optimize placement on disk and I/O performance.
Warning! Do not use disk striping as a substitute for intelligent fragmentation. Also, different tables that are heavily accessed should be on different disks, and if possible, different controllers.
1-18 An Introduction To Performance Tuning
Parallel I/O Operations Intelligent partitioning allows reads and writes to be performed in parallel. CREATE TABLE customer (cust_num FRAGMENT BY EXPRESSION cust_num < 1000000 and cust_num cust_num < 2000000 and cust_num cust_num < 3000000 and cust_num ...; SELECT * from customer;
db1 db3 db5
integer, ...) > 0 in db1, >= 1000000 in db2, >= 2000000 in db3,
parrallel scan threads
db2 db4 db6 19
Intelligent partitioning allows you to place each table or index fragment in a specific dbspace. For each query, the server can determine n
Which dbspaces contain fragments required to satisfy the query.
n
Which dbspaces do not contain required fragments and may be skipped to eliminate unnecessary I/O.
n
Which dbspaces may be scanned in parallel.
When performing parallel scans for a query, the Dynamic Server will spawn up to one scan thread per dbspace per table. The formula for calculating scan threads is: scan_threads = min( nfrags, ( DS_MAX_SCANS * ( pdqpriority / 100 ) * ( MAX_PDQPRIORITY / 100 ))
To fully exploit this parallel scan feature, you must be able to identify the optimum number of concurrent scan threads, and therefore the optimum number of dbspaces to create on each disk.
An Introduction To Performance Tuning 1-19
Optimizing Device I/O pfread can help you maximize I/O throughput by determining the optimum number of concurrent scan threads per disk
200 KB/sec
pfread pfread
300 KB/sec
pfread pfread
396 Kb/sec
pfread pfread
20
Most I/O subsystems are designed to support multiple concurrent read and write processes. A given disk might deliver 200 KB per second throughput for a single read process. With two concurrent read threads or processes, the same disk might provide 300 KB per second throughput (150 KB per second per thread). Three concurrent read threads might allow you to read 396 KB per second (132 KB per second per thread). Even though the amount of data read by each thread decreases with additional threads, the total throughput may continue to increase. One of the tools you will use in this course is a utility called pfread, which will help you measure read throughput for individual disks. From the information provided by pfread, you will be able to determine the optimum number of concurrent scan threads per disk. Once you have determined the optimum number of parallel scan threads per disk, you know the optimum number of dbspaces to create on each disk. Pfread is a simple application that executes as a single-threaded read process and will perform various types of reads simulating the normal page reads and big buffer reads performed by the IBM Informix Dynamic Server.
1-20 An Introduction To Performance Tuning
Pfread n n
n n
Program to measure disk read performance. Measures: w Random and sequential reads w Regular buffer reads, big buffer reads, and light scan reads You must specify the disk to be read. Optionally, you may specify: w the read buffer size w the number of reads to perform w the minimum and maximum offsets.
21
pfread is a C program for simulating the Dynamic Server disk reads in order to measure disk performance and optimize dbspace layouts for the Informix Dynamic Server installations 1. pfread expects a number of command line arguments as documented below: Usage: pfread [-b bufsize ] [ -n nloops ] [ -m min] [ - M Max ] [ -v ] [ -s seed|=seq|file ] file
The three read modes are: -s seed -- do random reads, seed the generator using seed -s =seq -- do sequential reads -s file -- read file to get buffer numbers to read
The default is random reads with a seed of zero (0). The -v verbose option sends the output of each read to standard output. pfread will use default values for all arguments except the disk to be read. The default buffer size is 2048, the default number of loops is 1000, the default minimum offset is 0, the default maximum offset is 51200, and the default read mode is random.
1. pfread must be run on raw devices to get valid results.
An Introduction To Performance Tuning 1-21
Measuring I/O Throughput The following command: pfread /dev/rdsk/dev210
will produce output similar to: /dev/rdsk/dev210 2048 1000 11 seconds 181KB/sec device measured total elapsed time for reads buffer size
average KB per second throughput
number of loops
22
The pfread command produces output including the device name, the buffer size, the number of buffer reads, the elapsed time, and the average KB per second throughput ((( buffer size / 1024 ) * number of buffer reads) / elapsed time). The average KB per second number tells you the read throughput for a single thread. To generate a sum of KB read per second for multiple concurrent threads, you can use a shell script like the one shown on the next page.
1-22 An Introduction To Performance Tuning
Code Sample ## excerpts from pfread.sh ## a simple shell script to generate ## read performance statistics for multiple ## parallel read processes (( c = 0 )) outfile=/tmp/$$ while (( c < $1 )) do ## execute a pfread process in the background pfread $2 >> $outfile & (( c = $c + 1 )) done ## on Sun OS, a wait with no argument waits for all ## background processes to complete wait ## We have already divided the total KB read/thread ## by the duration of execution for a KB/sec/thread ## Here, we will calculate the KB/sec sum. (( sum = 0 )) for k in `awk '{print $6}' $outfile` do (( sum = $sum + $k )) done echo “$2:\t$1 concurrent read threads\t$sum KB/sec.”
An Introduction To Performance Tuning 1-23
pfread.sh The command: pfread.sh 5 /dev/rdsk/dev204 will produce the sum of the average throughput per thread for five concurrent threads: /dev/rdsk/204: 5 concurrent read threads 249KB/sec
24
By creating a simple shell script, you can generate a sum of the average KB read per second per thread. To make your task even easier, you might execute pfread.sh in a loop and generate statistics for different numbers of concurrent read threads for comparison. For example: $for i in 1 2 3 4 5 6 7 8 9 >do > pfread.sh $i /dev/rdsk/dev204 >> pfread_dev204.out >done $ more pfread_dev204.out /dev/rdsk/dev204: 1 concurrent read threads 142 KB/sec. /dev/rdsk/dev204: 2 concurrent read threads 166 KB/sec. /dev/rdsk/dev204: 3 concurrent read threads 198 KB/sec. /dev/rdsk/dev204: 4 concurrent read threads 200 KB/sec. /dev/rdsk/dev204: 5 concurrent read threads 249 KB/sec. /dev/rdsk/dev204: 6 concurrent read threads 341 KB/sec. /dev/rdsk/dev204: 7 concurrent read threads 367 KB/sec. /dev/rdsk/dev204: 8 concurrent read threads 405 KB/sec. /dev/rdsk/dev204: 9 concurrent read threads 400 KB/sec.
1-24 An Introduction To Performance Tuning
Disk Arrays Most large database systems use disk arrays, (Redundant Array of Inexpensive Disks - RAID) for performance, hardware mirroring and transparent failover for disk drive failures: logical volume 1
logical volume 2
logical volume 3
logical volume 4
logical volume 5
25
RAID terms: n
Chunk: a contiguous virtual disk storage mapped to a physically contiguous disk storage on a single member disk in the array.
n
Stripe: a set of chunks in corresponding positions across member disks in the disk array. A stripe is a logical unit (LUN) in the disk array. The RAID software should allow logical partitioning of the disks and user configuration of chunk and stripe sizes.
The database server cannot distinguish raw disk allocations on RAID virtual disks from allocations on traditional disks. Instead, logical volumes should be defined so that they are properly aligned across the physical disk members. Logical
Physical
An Introduction To Performance Tuning 1-25
The simplest method is to specify a volume size that is a multiple of the RAID chunk size, and to specify a starting address of the volume that is a multiple of the chunk size. For example, in the RAID system shown in the above slide, all logical units or volumes are made up of chunks of the same size, so that the offset of each chunk in the volume is a multiple of the chunk size. To take advantage of I/O efficiency and fragment elimination, create chunks for database server storage spaces so that they match the RAID chunk size. Use the appropriate multiple of the chunk size as the offset into the logical volume. If you had four disk drives in your system for example, you could consider creating chunk sizes of 500 MB, so that the logical volume (the stripe across the four disks) would be two GB in size.
disk 1 block
disk 2 block
disk 3 block
disk 4 block
disk 5 block
When placing table fragments on RAID devices, consider the logical unit level instead of the physical level. Place fragments of the same table on separate logical units, and also place fragments of tables that are commonly joined on separate logical units. Fault tolerance for data protection is provided by different types of RAID architecture (shown on the following slide). Each RAID level offers different trade-offs in fault tolerant features and performance.
1-26 An Introduction To Performance Tuning
RAID Levels n n n
RAID 1 - disk mirroring RAID 5 - parity checking, uses storage capacity equivalent to one disk in the array to write parity information RAID 1+0 (RAID 1 + RAID 0) provides reliability through mirroring and high performance through disk striping Disk Array
Virtual Disk Array Management Software Physical Disk
Physical Disk
Physical Disk
Physical Disk
27
The three most common RAID configurations are: n
RAID level 1 is disk mirroring. RAID level 1 should be used to store critical data for which availability and reliability are the number one concern. RAID level 1 is appropriate for the root dbspace, the logical log files, and the physical log file.
n
RAID level 5 is a RAID implementation that uses the equivalent storage space of one disk in the array to store parity data. The parity data allows the recovery of any data stored on the rest of the disk array. RAID level 5 is an inexpensive way to provide increased availability of disks. However, in exchange for the increased availability comes added overhead associated with write requests. Each write request to the array requires a physical write to the appropriate member disk or disks and a physical write to the parity data. This performance cost may be offset by exploiting the write cache capability of the array management software. In one test, the write cache delivered a 90% increase in data load performance over the same load without the cache.
n
RAID level 1+0 is really RAID level 1 (mirroring) used in combination with disk striping (sometimes referred to as RAID 0). RAID level 0 provides no redundancy but allows data to be distributed across multiple disks to avoid I/O bottlenecks - i.e., it offers higher I/O performance but no fault tolerance.
An Introduction To Performance Tuning 1-27
CPU IBM IDS multi-threading and parallel capabilities allow you to fully exploit symmetrical multi-processor machines.
More work can be accomplished if you: n Maximize CPU user state time n Minimize CPU system state time n Minimize or eliminate wait for I/O
28
The second performance factor to consider is CPU. The internal processing speed of CPUs varies greatly and ultimately determines, assuming adequate memory and disk, how quickly you can perform work. All Unix System V vendors report on four states of CPU activity: user, system, wait for I/O and idle. n
The user state time is the amount of time the CPU spends running the application program in user state. It is under the programmer’s control and reflects program language effectiveness and programmer efficiency.
n
The system state time is the amount of time the CPU spends executing system (kernel) code on behalf of the application program. This includes time spent executing system calls, performing administrative functions for the application program, and network calls. Normal system state time should account for 5 to 25% of total CPU time.
n
CPU utilization is the proportion of time that the CPU is busy in either user or system state. Normally, the CPU will be idle only when there is no work to be performed or all processes are waiting for I/O to complete. CPU utilization is the CPU busy time divided by the elapsed time. If the CPU were busy 30 seconds over a 100 second interval, the utilization would be 30%.
1-28 An Introduction To Performance Tuning
Optimizing CPU Utilization by CPUVPs n
n
Optimize dispatching priority with NOAGE w NOAGE - turns off priority aging Optimize scheduling by enabling processor affinity w AFF_NPROCS - specifies the number of CPUVPs to bind to physical CPUs w AFF_SPROC- specifies the first CPU to bind to
Informix recommends that you use VPCLASS instead of the above parameters. The following is an example ONCONFIG file entry:
VPCLASS cpu,num=8,aff=0-7,noage=1 29
IBM IDS provides three configuration parameters that allow you to optimize CPU utilization by the CPUVPs. To understand their usefulness, you need to first review how Unix performs process scheduling.
Dispatching Priority Dispatching priority refers to the priority of each Unix process. When a CPU becomes available, Unix must decide which process should run next. The highest priority task which is ready to run will be given control of the CPU. All things being equal, higher priority tasks will complete faster than lower priority tasks.
Priority Aging With some operating systems, as a process accumulates more CPU time, its priority is reduced. This behavior is called priority aging. While this strategy works well for long running monitor processes, it is not desirable for business critical processes such as oninit. To optimize CPU utilization for oninit processes, you can disable priority aging. Set NOAGE to 1 to disable priority aging on operating systems that support this feature.
An Introduction To Performance Tuning 1-29
Processor Affinity Processor affinity customizes the scheduling process by allowing you to bind specific processes to specific processors. The processor may still execute other processes, but the bound process will execute exclusively on the specified CPU. IBM IDS allows you to bind or affinitize CPUVPs. The configuration parameter, AFF_NPROCS, specifies the number of CPUVPs to bind and the configuration parameter, AFF_SPROC, specifies on which CPU to start the binding. CPUVPs will be bound to CPUs in sequential order. For example, if you have a four CPU machine and you are running 3 CPUVPs, you might set AFF_NPROCS to 3 and AFF_SPROC to 0. This would bind your CPUVPs consecutively to processor 0, 1, and 2. Remember, processors are always numbered starting with 0. Some SMP platforms may reserve one CPU for handling all interrupts. If this is the case, you should avoid binding a CPUVP to that processor. For example, if processor 0 handles all interrupts, you may want to set AFF_NPROCS to 3 and AFF_SPROC to 1 so that your CPUVPs are bound to processors 1, 2, and 3. On Sun platforms, the Unix command mpstat will allow you to determine which CPUs are processing interrupt requests.
Tip If your operating system supports processor affinity but your IBM IDS port does not, you may try to bind the oninit processes to physical CPUs using Unix commands.
1-30 An Introduction To Performance Tuning
Multiprocessor Configuration Single Processor System
Multiprocessor System
Central Processing Unit CPU VP
CPU VP
VPCLASS cpu,num=1 MULTIPROCESSOR 0 SINGLE_CPU_VP 1
CPU VP
CPU VP
VPCLASS cpu,num=4 MULTIPROCESSOR 1 SINGLE_CPU_VP 0 31
The MULTIPROCESSOR configuration parameter specifies whether you wish to turn on specific multiprocessor features in the server system, such as changes in default parameters for VPs, default read-ahead parameters, etc. A parameter value of 1 activates these features. This parameter also determines if the server can use spin locks. If you set MULTIPROCESSOR to 1, server threads that are waiting for locks (known as mutexes in IBM Informix servers) in some cases spin (keep trying at short intervals) instead of being put on a wait queue. Setting the SINGLE_CPU_VP configuration parameter to 1 indicates that you intend to use only one CPU VP on your server. This prevents the server from being started with more than one CPU VP and does not allow you to dynamically add CPU or user-defined VPs. If the server is restricted to only one CPU VP, it can bypass locking of some internal data structures (with mutex calls) because no other process is using these structures.
Multiprocessor Configuration Guidelines For single processor machines, set the MULTIPROCESSOR parameter to 0 and the SINGLE_CPU_VP parameter to 1. For machines with more than one processor, set MULTIPROCESSOR to 1 and SINGLE_CPU_VP to 0. Two-processor systems might benefit by setting SINGLE_CPU_VP to 1 because of the increased performance from avoiding the additional internal locking.
An Introduction To Performance Tuning 1-31
Guidelines for Configuring Number of CPU VPs If you are working on a single processor machine, or a multiprocessor machine with only two processors, set the num option of VPCLASS to 1. For multiprocessor machines, you can start with a few CPU VPs (relative to the total number of CPUs on your system) and add more VPs after monitoring the server during average loads. The following table summarizes various possible hardware and software configurations, and the recommended settings for NUMCPUVPS and SINGLE_CPU_VP. Configuring CPUVPs Number of Physical CPUs
Client on same machine?
Workload
NUMCPUVPS
SINGLE_ CPU_VP
1
N/A
N/A
1
1
2
No
N/A
2
0
2
Yes
OLTP
1
1
2
Yes
DSS
2
0
3-7
No
N/A
number of physical CPUs
0
3-7
Yes
OLTP
number of physical CPUs -1
0
3-7
Yes
DSS
number of physical CPUs
0
>7
No
N/A
number of physical CPUs
0
>7
Yes
OLTP
number of physical CPUs * 0.7 0
>7
Yes
DSS
number of physical CPUs
0
Note Never configure more CPU VPs than actual CPUs on your system.
1-32 An Introduction To Performance Tuning
Shared Memory Components When tuning your system for optimum performance, you must resolve two issues: n Is there adequate memory available on your hardware? n Is the server configured to use memory efficiently?
Resident Portion Buffer pool and other system data structures
Virtual Portion
Message Portion
Session and thread information
Used for shared memory communication (optional)
33
The third performance factor to consider is memory utilization. It is important that your system has adequate memory for the demands that will be placed upon it, and that you are using the shared memory allocated to IBM IDS efficiently. Shared memory in the database server is divided into three portions: n
n
n
The resident portion - The resident portion contains the buffer pool and other system information. This portion of shared memory can be configured to remain resident in main memory (if this capability is supported by the operating system). The virtual portion - The virtual portion contains information about the threads and sessions, and the data that is used by them. This information grows and shrinks constantly, so the database server is responsible for handling the allocation and deallocation of the memory in this portion. The message portion - The optional shared message portion holds the message buffers that are used in communication between the client and the server when the communication method is through shared memory.
An Introduction To Performance Tuning 1-33
System Memory You can determine whether your system has adequate memory by evaluating virtual memory management
Swapping
Swapping will significantly degrade performance of the system.
Real Memory
entire working set of process
selected pages of process
Paging
Consistent paging out indicates a memory shortage in the system.
34
In virtual memory or virtual storage systems, only a small portion of the procedures and data of any specific process is in real memory at once. This small set of pages is called the process’s working set, and the movement of pages in and out of memory is called paging. The paging algorithm keeps track of which pages were least recently used, and it moves these pages from memory out to disk, and then uses that freed memory for other processes. This is because as a process’s working set changes or other processes are executed, the memory requirements for each process’s working set may exceed available real memory. Unix systems allow you to track these page outs, the number of pages that are moved out of memory per second. If page outs increment consistently, this indicates a memory shortage on the system. To manage paging and to keep a minimum number of free pages available, the operating system will scan memory looking for pages that can be freed. Frequent scanning may also have a negative impact on performance. If you see that page scan values are incrementing on your system but you believe that sufficient memory is available, you may want to consult your operating system documentation or hardware vendor about the kernel parameters that affect the page scan rates. When a memory shortage is severe, a process’s entire working set may be moved out of real memory. This process is called swapping.
1-34 An Introduction To Performance Tuning
Swapping sometimes occurs as a normal housekeeping process. Programs that sleep for twenty seconds or more may be considered idle by the OS, and swapped out at any time to minimize cluttering of system memory. However, the swapping that occurs to combat extreme memory shortages is called desperation swapping, or thrashing. This type of swapping will significantly degrade the performance of the system. This is because the next time the process needs to run, the memory management system has to copy the entire working set of the swapped out process back into memory, and to do so, it will first need to copy the entire working set of a process that is already in memory back to disk. The system degenerates to spending almost all of its resources moving entire working sets in and out of memory rather than executing the processes themselves.
An Introduction To Performance Tuning 1-35
OLTP Shared Memory Tuning n n
Size the resident segment buffer pool for 95% or better read caching and 85% or better write caching Small virtual segment resident portion
virtual portion
36
The Informix Dynamic Server shared memory configuration and tuning will vary greatly, depending on whether you are configuring your system for OLTP or DSS activities.
OLTP For OLTP applications, the largest portion of shared memory will be dedicated to the resident segment buffer pool. A good rule of thumb for initial sizing of the resident buffer pool is 20% of physical memory. For example, on an OLTP system with 512 Mb physical memory, the initial setting for BUFFERS would be 52425 pages or 104,858 KB on a 2 KB page machine. Continue to increase BUFFERS until increases in read and write caching are insignificant, or the system begins to experience paging. If you are tuning an OLTP client/server environment where the database server and the client application are executing on separate hardware, you may be able to allocate as much 50 to 80% of physical memory to the resident buffer pool.
1-36 An Introduction To Performance Tuning
DSS Shared Memory Tuning n
n
Large virtual segment w SHMVIRTSIZE w SHMADD DS_TOTAL_MEMORY
The goal of tuning SHMVIRTSIZE is to have as few segments as possible without wasting shared memory
resident portion virtual portion Light Scan Buffer
Light Scan Buffer Light Scan Buffer
37
DSS applications put different demands on the IBM IDS system and its memory utilization. In DSS applications, typically, a very small number of users are accessing very large amounts of data. Manipulating these large amounts of data will frequently cause additional memory pools, such as sort pools and hash pools, to be allocated in the virtual segment of the Dynamic Server’s shared memory. In optimizing DSS environments, SHMVIRTSIZE, the size of the initial virtual memory segment, may be set as high as 75% of physical memory, as long as this does not induce paging. DS_TOTAL_MEMORY, the total memory available for decision support tasks, should be set to 90% of SHMVIRTSIZE.
Tip In versions prior to 7.2, IBM IDS will not allow DS_TOTAL_MEMORY to be greater than 50% of SHMVIRTSIZE at initialization. You must use onmode -M to set DS_TOTAL_MEMORY to 90% of SHMVIRTSIZE after initialization.
An Introduction To Performance Tuning 1-37
Tuning SHMVIRTSIZE To tune SHMVIRTSIZE: n Increase SHMVIRTSIZE to avoid adding additional segments n Allow an average of 400 blocks free (4 KB per block) onstat -g seg ... id
key
addr
size
ovhd
class
blkused
blkfree
6000
1386825729
a000000
876544
872
R
200
14
6701
1386825730
a0d6000
4096000
656
V
990
10
6602
1386825731
a4be000
8388608
720
V
206
1842
Two virtual segments allocated. Increase SHMVIRTSIZE to 6464.
Second segment has > 400 blocks free. 38
You can measure the efficiency of SHMVIRTSIZE by monitoring the allocation of additional segments and monitoring the number of free blocks in the virtual segment using onstat -g seg. If additional segments are being allocated, you may want to increase the SHMVIRTSIZE setting in the ONCONFIG file. Use the following formula to calculate a new SHMVIRTSIZE value: SHMVIRTSIZE = SHMVIRTSIZE + (SHMADD * number of segments added)
Shared Memory
n Initial segment
n Additional segment
If you notice a consistently high number of free memory blocks (> 400) when monitoring the virtual memory segment, then decrease SHMVIRTSIZE or SHMADD accordingly:
1-38 An Introduction To Performance Tuning
segsize = segsize - (( number of blocks free - 400 ) * 8 )
In the preceding example, the server has been configured with these values: SHMVIRTSIZE 4000 #initial virtual shared memory segment size SHMADD 8192 #Size of new shared memory segment (KBytes)
If the onstat -g seg output has stayed consistent with several samplings, you can determine that only 206 blocks of the second segment are being used and that you need to reduce the size of this segment. You can apply the formula above to calculate a new size for the second segment (SHMADD): new_segsize = 8192 - (( 1842 - 400 ) * 4 ) = 2424
Since it is preferable to allocate only one segment, you can increase SHMVIRTSIZE using the new SHMADD value: SHMVIRTSIZE = 4000 + ( 2424 * 1 ) = 6464
Note The following command frees up unused virtual memory blocks: onmode -F
An Introduction To Performance Tuning 1-39
Network Tuning the network and IBM Informix Dynamic Server communications: n Does the application communicate with the Dynamic Server efficiently? n Are the Dynamic Server’s communication mechanisms configured optimally? n How can you accommodate a large number of remote users? n Is the network adequate?
LAN
40
Network issues can significantly impact client/server and distributed database computing performance, but even local applications require adequate communication resources for optimum performance. The module covering client/server communication discusses how to optimize communications between applications and the IBM IDS server, how to efficiently service a large number of remote users, and how to diagnose insufficient network resources. The Unix command netstat -i, and the onstat -g ntt, onstat -g ntd and onstat -g ntu commands are also discussed in that module.
1-40 An Introduction To Performance Tuning
Application/Database Design and Implementation A well-tuned database server will not compensate for a poorly designed or poorly implemented application!
A well-designed application (which includes a well-designed database!) can dramatically increase overall system performance. 41
Although application design and implementation are beyond the scope of this course, the importance of this issue must be recognized by everyone involved in the overall performance of any database system, including systems analysts, system architects, software architects, software engineers, and database administrators. And, of course, the management team. In order of importance, the following steps should be followed when tuning for overall database system performance: n
Database design
n
Primary and foreign keys: logical design
n
Indexes: physical realisation
n
Fragmentation/partioning dbspace design
n
Locking strategy: row/page/table level
n
Program design: application programming
n
Transaction design: grouping of sql statements
n
DB Server tuning
n
OS tuning
An Introduction To Performance Tuning 1-41
1-42 An Introduction To Performance Tuning
Exercises
An Introduction To Performance Tuning 1-43
Exercise 1 In this exercise, each team will familiarize themselves with their instance of an IBM Informix Dynamic Server that has previously been initialized by the instructor as part of the class setup procedure. Throughout this exercise, and other exercises in this course, please replace the # with one team member’s student number. For example, if your team is using student id stu201, replace stuxxx with stu201. 1.1 Verify that an sqlhosts file has been created in you home directory. Your instructor will provide you with the host name of the training box. HOSTNAME: _________________________________________________ Also, be sure to check the release notes to find out if tli or sockets are supported. Dbserver Name
Nettype
Host Name
Service Name
stuxxx stuxxx_tcp
onipcshm ontlitcp
trainx trainx
stuxxx stuxxx_tcp
Options
1.2 Verify that INFORMIXSQLHOSTS is set to the path and file name of this sqlhosts file. If not, modify your .profile environment file, and then source it by typing in the following stuxxx $ . ./.profile 1.3 Verify that INFORMIXSERVER is set to stuxxx, where xxx is your team’s student number. If not, modify your .profile environment file, and then source it by typing in the following stuxxx $ . ./.profile 1.4 Using Informix Server administator (ISA), click on the link Manage Another Server, and then click on the link to your “Server” in the left hand column, such as stu201_tcp, for example. Then click on the link Configuration in the left-hand column. Next, click on the link -ONCONFIG.
1-44 An Introduction To Performance Tuning
1.5 Verify that your instance has been configured as follows: w
ROOTSIZE is 100000 (KB)
w
PHYSFILE is 1000 (KB)
w
LOGFILES is 4
w
LOGSIZE is 1000 (KB)
w
MSGPATH is /export/home/stuxxx/iif.log
w
Both the backup and logical log backup tape devices are set to /dev/null.
w
WARNING: DO NOT CHANGE YOUR “SERVERNUM”!!!
w
VPCLASS is cpu,num=1
w
NUMAIOVPS is 2
w
CLEANERS is 2
w
LRUS is 8
w
LRU_MAX_DIRTY is 60
w
LRU_MIN_DIRTY is 50
1.6 Click on the “Edit” button and make the following changes to your ONCONFIG file: w
Change RESIDENT to 1
w
Change BUFFERS to 1500
w
Click on the “Save” button
1.7 Bounce your engine: w
Click on the link named Mode
w
Click on “Off-Line”, then “Yes”
w
Click on Mode again.
w
Click on “On-Line”
An Introduction To Performance Tuning 1-45
Exercise 2 In this exercise, you will use pfread.sh and pfread to determine the optimum number of dbspaces per disk. Assume that your system will perform primarily 2 KB page reads. Also assume that your system is a predominantly read only system, and therefore, you are primarily concerned with distributing your dbspaces and data to enhance read performance. 2.1 The instructor will assign each team a disk to measure. For accurate results, each team needs exclusive access to their disk for the duration of their test. DISK = __________________________________________________ 2.2 Capture measurements for increasing numbers of read threads. You may do this by placing the pfread.sh command in a loop like the one shown here: for i in 1 2 3 4 5 6 7 8 9 10 do pfread.sh $i /dev/rdsk/disk_to_measure >> pfread.out done
This loop will measure throughput for 1 to 10 concurrent read threads and provide output on the total KB read by each group. You can monitor what the script is doing by typing in the following: tail -f pfread.out
2.3 You may graph your findings to share with the class. The class may discover that different disks exhibit different performance.
1-46 An Introduction To Performance Tuning
Module 2 Monitoring Utilities
Monitoring Utilities 07-2002 © 2002 International Business Machines Corporation
2-1
Objectives At the end of this module, you will be able to: n Identify the onstat commands for monitoring specific performance metrics n Write SQL to capture the same data via the System Monitoring Interface (SMI) n Identify the appropriate oncheck commands for monitoring specific disk issues
2
2-2 Monitoring Utilities
Informix Monitoring Utilities IBM IDS offers several utilities for monitoring and tuning system performance
Shared Memory
onstat
SMI
oncheck 3
The most useful of these utilities for performance tuning are onstat, the System Monitoring Interface (SMI), and oncheck.
Monitoring Utilities 2-3
onstat Provides a snapshot of IBM IDS shared memory data structures: n Identifying immediate problems such as latch or lock contention n Monitoring resource utilization and potential inadequacies over a period of time n Monitoring users and SQL activity on the system n Monitoring PDQ utilization Shared Memory
onstat
onstat report 4
The onstat utility provides a snapshot of the Informix Dynamic Server’s shared memory data structures. Because the information is a point-in-time snapshot, two successive onstat commands may return different results. Most of onstat’s commands are documented in the Informix Dynamic Server Administrator’s Guide. The onstat utility is particularly useful during performance tuning because it is very efficient. Since onstat gathers its information from shared memory, no disk I/O is necessary. Since it is a character based tool, it requires much less CPU resources than graphical tools. You will rely on the onstat utility for many of the tuning observations you will make in this course.
Tip You can run onstat recursively using the “-r” option. For example, to execute onstat -g ioq every five seconds (the default), you would run it as onstat -gr ioq.
2-4 Monitoring Utilities
SMI The System Monitoring Interface (SMI) provides very similar information to onstat with a few added benefits: n n
Provides SQL access to shared memory structures Provides profile information for specific user sessions
Shared Memory
sysmaster Database
5
Like onstat, the SMI provides you with point-in-time information about the contents of the IBM IDS shared memory data structures. The SMI is implemented via the sysmaster database. There is one sysmaster database for each IBM IDS instance. The sysmaster database contains its own system catalog tables and a set of virtual tables that serve as pointers to shared memory data. The SMI is useful because it simplifies the task of programming monitoring functions for repetitive use.
Note The sysmaster database is documented in the Informix Dynamic Server Administrator’s Guide. The sysmaster database schema is also provided and commented in the $INFORMIXDIR/etc/sysmaster.sql file.
Monitoring Utilities 2-5
oncheck n n n
Check IDS disk structures for inconsistencies Repair index corruption Report on space available within extents
oncheck
Chunk Chunk
oncheck report
Chunk
6
The oncheck utility allows you to view and check IDS disk structures. With oncheck, you can view the reserved pages for an instance, look at individual pages within a tablespace, display the number of free pages allocated to a table extent, display an index structure, repair a corrupted index structure, and even view the individual rows as they are stored on a page. It is often used to monitor the number of extents within individual tables. This utility provides a wealth of information that can assist in both troubleshooting and performance tuning the IDS server.
Warning! Note that oncheck places a shared lock on the table (if its lock level is page) when it runs, and if it finds that the table is corrupted, the lock is upgraded to exclusive. Because of this oncheck can only be run by user informix, and is generally run in production environments only when there is minimal activity in the database.
2-6 Monitoring Utilities
ipcs Prints information about inter-process communication
$ ipcs IPC status from xxx as of Mon Nov 25 14:11:52 2001 Message Queue facility not in system. Shared Memory: m 2301 0x529d4801 --rw-rw---- root informix m 2302 0x529d4802 --rw-rw---- root informix m 1803 0x529d4803 --rw-rw-rw- root informix Semaphores: s 1245185 0x0x00000000 --ra-ra-ra- root informix s 1769475 0x0x00000000 --ra-ra-ra- root informix 7
The ipcs command prints information about active inter-process communication facilities, including shared memory segments, semaphores, and message queues. This command is frequently used by IDS administrators to monitor shared memory segments and semaphores associated with the Dynamic Server instance. When the Dynamic Server instance is running, it has two or more shared memory segments associated with it. The first segment allocated is the resident segment. Its shmkey will end with 4801. If shared memory communications have been configured, the third segment allocated will be the message segment. Message segments are always allocated with read and write permissions for public. You may calculate the initial shared memory id associated with your instance using the following formula: ( 0x52564801 + ( hex( SERVERNUM ) * 0x10000 ))
Monitoring Utilities 2-7
Cleaning Up after an Engine Crash If the IBM IDS engine crashes, it may abort without releasing all shared memory segments! n The administrator (as root) must use ipcrm to release these segments prior to restarting the server ipcrm -m [segment id]
8
Occasionally, IBM IDS may be caused to abort without sufficient warning to clean up or release its shared memory segments. If this happens, the engine may not be restarted because the shared memory id it requests will still be allocated. In these cases, the administrator (as root) must use the ipcrm command to release the segments prior to executing oninit to restart the server (sid is segment id): # ipcrm -m [sid]
2-8 Monitoring Utilities
Unix Monitoring Utilities n n n n n
sar time or timex ps iostat vmstat
9
In order to tune the IBM IDS server, you must be able to monitor and tune the operating system and hardware on which it runs. Different flavors of Unix offer many different monitoring utilities. Some are standard and available on nearly all Unix implementations; others are proprietary and only available on particular platforms.
Tip To find out which utilities are available to you, consult your system man pages, or refer to the operating system documentation.
Monitoring Utilities 2-9
System Activity Reporter (sar) The sar command is useful for monitoring CPU utilization, disk activity and memory utilization
$ sar -u 60 10 SunOS amber 5.6 Generic_105181-05 sun4m 03/10/01 09:42:17%usr 09:43:17 1 09:44:17 1 09:45:17 5 09:46:17 4 ....
%sys 5 4 3 6
%wio 0 0 0 1
%idle 94 95 92 89
Example shown monitors CPU utilization at intervals of 60 seconds for 10 iterations
10
The sar command is a useful and versatile tool that can be used to monitor nearly every aspect of system utilization. You can use sar to measure CPU utilization by the oninit processes, specifically the CPU VPs, and to monitor the run queue length on busy systems. On systems that do not offer tools specifically for monitoring disk and memory utilization, you can use sar to measure the activity on your disks and to monitor memory allocation and paging. Some of the options include the following: sar -b
buffer activity
sar -d
disk activity
sar -g and sar -p
paging activity
sar -r
unused memory pages and disk blocks
sar -u (the default)
CPU activity n % time in user mode n % time in system mode n % time idle, including items blocked waiting for IO
sar -w
2-10 Monitoring Utilities
system swapping and switching activity
Performance History via SAR To identify potential or actual performance bottlenecks or problems, it is necessary to continuously monitor your system and capture these data points over an extended period of time. The Unix sar command is particularly useful for creating and maintaining long-term performance histories: n sar is available on all IBM-Informix IDS hardware platforms n when run with a named output file, a single binary-encoded file is created from which all sar reports can be generated at a later time
11
To build a performance history and profile of your system, take regular snapshots of resourceutilization information. For example, if you chart the CPU utilization, paging-out rate, and the I/O transfer rates for the various disks on your system, you can begin to identify peak-use levels, peak-use intervals, and heavily loaded resources. If you monitor fragment use, you can determine whether your fragmentation scheme is correctly configured. Monitor all computer resource use as appropriate for your database server configuration and the applications that run on it. You can run sar so that it stores all system activity data in a file for later analysis. For example, the following command runs sar at a one minute interval for 24 hours, and stores the information in binary form in a file named march10_2001. sar -o march10_2001 60 1440
You can later run a specific report (such as CPU utilization) and redirect it to an output file (named in this example for the day and the type of report) by typing in the following: sar -u -f march10_2001 > cpu_03_10_2001
Monitoring Utilities 2-11
time and timex The time and timex commands allow you to time the running of a process. Both time and timex report on real time as well as user and system CPU time.
12
The time and timex commands allow you to measure the duration of a running process and CPU utilization for both system and user time. Remember when using time or timex to measure an application process, such as dbaccess, that a large percentage of the work is being done by the oninit process. The CPU utilization for the oninit process will not be accounted for in the output generated. The time or timex command is still useful for measuring the real time required to execute the request. Be sure to consult your man pages to find out which command is available on your system.
Example To measure the real time to run a the query in the script q1.sql, you would run the following: $ time dbaccess stores_demo q1.sql
2-12 Monitoring Utilities
Process Status (ps) The ps command is a good source of snapshot information about system processes currently running process ID number process start time
accumulated CPU time for process
$ ps -el S T S S S S
UID root root root root root
PID PPID 0 0 1 0 434 1 435 434 445 434
STIME 08:59:54 08:59:57 09:03:51 09:03:53 09:06:02
TTY TIME ? 0:01 ? 0:00 ? 0:05 ? 0:00 ? 0:00
CMD sched /etc/init oninit oninit oninit 13
The ps command is a good source of snapshot information about processes currently running. You can cross reference output from onstat -g glo and the ps command to see how much CPU time virtual processors (oninit) are accumulating, or to monitor the priority the operating system has assigned to your oninit processes. You can also use ps to monitor your application processes. Consult the system man pages for information on the ps options available.
Monitoring Utilities 2-13
The iostat Command The iostat command provides highly accurate measures of throughput, utilization, queue lengths, transaction rates and service times.
$ iostat -x 5 1 device sd0 sd1 sd6 sd30
r/s 6.2 1.8 0.0 0.2
w/s 0.0 0.0 0.0 0.2
Kr/s 21.5 14.3 0.0 25.7
Kw/s 0.0 0.0 0.0 0.2
extended device statistics wait actv svc_t %w 0.0 0.1 24.1 0 0.0 0.1 41.6 0 0.0 0.0 0.0 0 0.0 0.1 22.5 0
%b 15 7 0 13
14
The iostat command is used to monitor disk performance and utilization. To compute this information, the system counts seeks, data transfer completions, and the number of words transferred for each disk (bytes read and bytes written). Also, the state of each disk is examined at predefined intervals (check your operating system documentation), and a tally is made if the disk is active. These numbers can be combined with the transfer rates of each device to determine average service times (in milliseconds) for each device. Sample output from the iostat command with the -d option is shown below, in this case to display four reports with an interval of 60 seconds. iostat -d 60 4
kps
sd0 tps
kps
sd1 tps
serv
kps
sd6 tps
serv
5 2 2
1 0 0
42 12 20
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 6028 0 8674 0 8809
388 354 349
5 21 22
2
0
19
0
0
0
0
0
0 9407
363
19
2-14 Monitoring Utilities
serv
sd30 kps tps
serv
Some of the common options are shown in the table below: iostat -d
For each disk, the number of kilobytes transferred/second (kps), the number of transfers/second (tps), and the average service time in milliseconds (serv).
iostat -D
For each disk, reads/second (rps), writes/second (wps), and % disk utilization (util).
iostat -x
For each disk, a report of extended disk statistics, with the output in tabular form. This includes reads/sec (r/s), writes/second (w/s), the average number of transaction waiting for service, or queue length (wait), the average number of transactions actively being serviced - these are removed from the queue but not yet completed (actv), the percent of time there are transactions waiting for service - queue not empty (%w), and the percent of time the disk is busy - transactions in progress (%b).
Monitoring Utilities 2-15
Virtual Memory Statistics (vmstat) Provides information about the status of processes, memory utilization, paging statistics, system activity, and CPU usage.
$ vmstat 5 4 procs r bw
memory swap fre
re
page mf pi
po
...
disk d0 d1
d2
faults in sy
...
4 1 0 2 3 0 3 2 0
104512 105184 106688
1792 1952 1952
8 4 7
0 288 192 0 96 128 0 256 224
... ... ...
4 0 4
6 2 4
1 1 12
23 14 29
15 15 21
... ... ...
4 2 0
104672
6240
4
0 384
...
3
1
3
32
75
...
32
Kbytes of swap space available processes swapped out
# of disk operations/sec page outs 16
The vmstat command provides information about the status of processes, memory utilization, paging statistics, system activity, and CPU usage. When tuning the Informix Dynamic Server, vmstat is especially useful for monitoring system paging. Paging may indicate that too much memory has been allocated to either the resident segment (OLTP) or the virtual segment (DSS) of shared memory, or that more memory should be added to the hardware configuration. The vmstat utility may also be used to monitor page scan rates, the amount of free memory, the number of context switches occurring, disk activity, and cpu activity. A subset of the columns from sample output in Unix System V are shown in the slide above (from “vmstat 5 4”):
The fields of the vmstat display are as follows:
2-16 Monitoring Utilities
procs
r = runnable process (in run queue) b = process blocked for high speed I/O w = processes swapped out (but runnable)
memory
swap = amount of swap space currently available (Kbytes) fre = size of the free list (Kbytes)
page
re = page reclaims mf = minor faults pi = page in (Kbytes) po = page out (Kbytes) fr = freed (Kbytes) de = anticipated short-term memory shortfall (Kbytes) sr = pages scanned by clock algorithm
disk
Under each disk, the number of disk operations/sec. There are slots for up to four disks. Disks labeled “s” are SCSI. The number is the logical unit number.
faults
in = non-clock device interrupts sy = system calls cs = CPU context switches
cpu
us = user time sy = system time id = idle time Note: these are averages across all processors
Monitoring Utilities 2-17
2-18 Monitoring Utilities
Exercises
Monitoring Utilities 2-19
Exercise 1 Team 1 Write an onstat and/or SMI SQL command to do the following: n Monitor I/O by chunk ______________________________ n Monitor I/O by table or fragment ______________________________ n Monitor I/O by user/thread ______________________________
Team 2 Write an onstat and/or SMI SQL command to do the following: n Monitor I/O by process (VP) _______________________________ n
Determine if sufficient AIO VPs are configured for a system using Informix AIO libraries.
_______________________________
If maximum queue length for all the AIO VPs consistently exceeds 25 during non-checkpoint activity, consider increasing the number of AIO VPs.
Team 3 Identify the onstat options necessary to tune CPU utilization: n Determine CPU utilization by the CPU VPs _______________________________ n
Determine if additional CPU VPs should be started (assuming that #CPU VPs < # CPUs and the physical CPUS are not already fully utilized).
_______________________________
2-20 Monitoring Utilities
Team 4 Write an onstat and/or SMI SQL command to capture statistics on: n Network read and write activity by thread ________________________________
The reads and writes columns indicate the number of network packets received and sent by the IBM IDS server. n Rejected network connections ________________________________
Note You may want to refer to the IBM IDS Administrator’s Guide as a reference for these exercises.
Monitoring Utilities 2-21
Exercise 2 Use the man pages to determine which Unix utilities are available on the training system and write a script to monitor: 2.1 Memory - page outs and page scans _______________________________
2.2 Disk throughput _______________________________
2.3 CPU utilization by CPU VP _______________________________
2-22 Monitoring Utilities
Solutions
Monitoring Utilities 2-23
Solution 1 Team 1 Write an onstat and/or SMI SQL command to do the following: n Monitor I/O by chunk onstat -g iof n Monitor I/O by table or fragment onstat -g ppf n Monitor I/O by user/thread onstat -g tpf
Team 2 Write an onstat and/or SMI SQL command to do the following: n Monitor I/O by process (VP) onstat -g iov n
Determine if sufficient AIO VPs are configured for a system using Informix AIO libraries.
onstat -g ioq
Team 3 Identify the onstat options necessary to tune CPU utilization: n Determine CPU utilization by the CPU VPs onstat -g glo
The onstat -g glo command displays cumulative user, system, and total CPU utilization by the CPU VPs. To determine how busy the CPU VPs are, compare onstat -g glo output over regular intervals of time and calculate the amount of CPU time accumulated as a percentage of real time. n
Determine if additional CPU VPs should be started (assuming that #CPU VPs < # CPUs and the physical CPUS are not already fully utilized).
onstat -gr rea
If the number of CPU class threads on the ready queue is consistently greater than the number of CPU VPs, you may need another CPU VP.
2-24 Monitoring Utilities
Team 4 Write an onstat and/or SMI SQL command to capture statistics on: n Network read and write activity by thread onstat -g ntu
The reads and writes columns indicate the number of network packets received and sent by the IBM IDS server. n Rejected network connections onstat -g ntd
The reject column increments each time a connection receives a timeout error or if a user table overflow occurs. (You can monitor for user table overflow by checking onstat -p for values greater than 0 in the ovuserthreads entry.)
Monitoring Utilities 2-25
Solution 2 Use the man pages to determine which Unix utilities are available on the training system and write a script to monitor: 2.4 Memory - page outs and page scans vmstat 2 5
(This will monitor the number of page outs and page scans at 2 second intervals for 10 seconds) or sar -g 2 5
(This will report on memory page outs and scan rate.) 2.5 Disk throughput iostat -d 2 5
(On Solaris, this will report Kbytes read per second for four disks, at two second intervals for 10 seconds.) 2.6 CPU utilization by CPU VP for i in ‘onstat -g glo | awk ’{if ($1 ~ /Ind/) {getline;getline} if ($3 == ”cpu”) {print $2}}’ ‘ do ps -lfp $i done
(On Solaris and HP, this will produce a ps listing, including CPU time for each CPU VP running.)
2-26 Monitoring Utilities
Module 3 Optimizing Load Performance
Optimizing Load Performance 07-2002 © 2002 International Business Machines Corporation
3-1
Objectives At the end of this module, you will be able to: n Understand the impact of transaction logging on load performance n Understand the impact of integrity constraints and indexes on load performance n Optimize SQL DDL statements for data loadin g n Tune the Dynamic Server configuration parameters to optimize load performance
2
3-2 Optimizing Load Performance
Data Loading Discussion What problems have you experienced with loading data?
Data Loading
3
What problems have students experienced with loading data?
Optimizing Load Performance 3-3
Wisconsin Benchmark Specification n n n
Three tables: onektup, tenktup1, and tenktup2. All tables share the same schema Each has 13 integer columns and 3 x 52 byte character columns The onektup table contains 1/10 the number of rows in the tenktup1 (or tenktup2) table onektup
tenktup2 tenktup1
4
Because we will be using a modified version of the Wisconsin Benchmark database for various exercises throughout this course, now is a good time to introduce the Wisconsin Benchmark. The original Wisconsin Benchmark was developed by David J. Dewitt of the University of Wisconsin. Its specification incorporates the criteria shown in the above slide. IBM Informix adopted and modified the Wisconsin database schema and uses it to demonstrat e performance issues and solutions for decision support environments. In this course, you will create a modified Wisconsin Benchmark database. You will use these tables to test performance of loads, index builds, OLTP queries, and decision support queries.
3-4 Optimizing Load Performance
Loading Data n n n n
SQL LOAD (dbaccess) DBLOAD ESQL/C INSERT CURSOR HPL w Deluxe mode w Express mode
tenktup1 tenktup2
/export 5
Informix offers several mechanisms for loading data. This module will introduce each and discuss its pros and cons so that you will be prepared to select the correct tool for the requirements of each load you attempt. In the lab exercises, we will be using an ESQL/C program to generate data and load it via an insert cursor.
Optimizing Load Performance 3-5
SQL LOAD The SQL LOAD command offers the following functionality
Ease of use
High - is run from within dbaccess
Speed
Moderate to high
Flexibility
Low
Considerations
n n n n n
Load file must be delimited, ASCII Load file loaded as one transaction No field substitutions Cannot skip fields in load file No parallel load capability 6
The SQL LOAD command is a very easy and quick way to load data, but it lacks flexibility. The SQL LOAD command will work well if: n
The data to be loaded exists in a delimited ASCII file.
n
Every field is being loaded into a column in the table.
n
The file is small enough to be loaded in a single transaction.
Generally this is not the case and other tools will have to be used to accomplish the load.
3-6 Optimizing Load Performance
Running SQL LOAD LOAD FROM 'new_cust' INSERT INTO customer (customer_num, fname, lname)
Column-list specified
Use the zero as a place holder for serial values.
customer
new_cust 0|Peter|Pan| 0||Hook|
Use two delimiters with no space in between to insert a null value. 7
You can specify columns in your LOAD statement if the file you are loading data from does not contain data for all of the columns in your table. In the above example, the file contains data only for the customer_num, fname and lname columns of the customer table. The columns specified must match the data in your file. If you do not include a list of columns in the INSERT INTO clause, the fields in the file you are loading data from must match the columns in the specified table.
Note If no value is specified for a column, a null value is inserted. However, a serial column will not accept null values. For serial columns, use a zero as a place holder in the load file, and the system will assign the next available value. Or, you can omit the serial column from both the INSERT INTO clause and the load file.
Optimizing Load Performance 3-7
dbload The dbload utility offers the following functionality:
Ease of use
Moderate
Speed
Moderate to low
Flexibility
Moderate to high
Considerations
n n n n n n
Requires a command file Supports commit interval Does not perform parallel loads Supports field mapping Supports limited field substitution Loads delimited and fixed length data 8
The dbload utility is a relatively robust load tool. It supports error logging, a user specified commit interval, and allows users to map input fields and do limited field substitution. Dbload will also load both delimited and fixed length data. Its primary limitations are that it requires the user to define a load command file, and it does not support parallel loads. The dbload utility uses a fixed 4096 byte buffer during loads. It does not utilize FET_BUF_SIZE.
3-8 Optimizing Load Performance
Running dbload $ dbload -d stores7 -c load_stock.cmd
stores7 stock
/export
stock.unl FILE “/export/home/trainer3/stock.unl” DELIMITER “|” 6; INSERT INTO stock; 9
You must first create a command file before you run dbload. Command files may use delimiters or character positions. For the delimiter form, the fields in the data file will be separated by a delimiter that you specify in your FILE statement within quotes, and you also specify the number of columns in the input file, as shown in the slide above. The character-position file statement assumes that the load file is fixed length, and each field starts at a specific character position within each line. The example below describes five values, time_of_day, usr, sys, wio and idle. Each value is given a start position and a stop position:
Example FILE “/export/home/trainer3/cpu_04_09_2001.dat” (time_of_day 1-8, usr 15-16, sys 23-24, wio 31-32, idle 38-40); INSERT INTO cpu_history VALUES (time_of_day, usr, sys, wio, idle);
The environment variable FET_BUF_SIZE and/or the C variable “extern int2 FetBufSize” controls the size of the buffer for INSERT CURSORS.
Optimizing Load Performance 3-9
ESQL/C INSERT CURSOR You may also load data with an Informix-ESLQ/C program using insert cursors. ESQL/C provides the following:
Ease of use
Low, requires coding
Speed
Moderate to High
Flexibility
High
Considerations
Error handling coding requirements
10
By writing a program using the INSERT CURSOR statement, the developer could incorporate any degree of parallelism, field mapping, data substitution, and data conversion necessary. The only drawback is that writing complex code could be time consuming, and the developer must use special care in his error handling routines. INSERT CURSORS buffer rows before writing them to the database. This buffering provides the high performance. Unfortunately, because rows are buffered, if an error is encountered during the load, all buffered rows following the last successfully inserted row are discarded. This requires that error handling routines provide mechanisms to identify the last successfully loaded row and restart inserts with the next row.
Note With the advent of the HPL, such laborious effort as writing an ESQL/C program for high speed loading is no longer necessary.
3-10 Optimizing Load Performance
ESQL/C Code Sample EXEC SQL DECLARE dup_row CURSOR FOR SELECT * FROM orders main INTO :ord_row WHERE 1 < (SELECT COUNT(*) FROM orders minor WHERE main.order_num = minor.order_num) ORDER BY order_date;
EXEC SQL BEGIN WORK; EXEC SQL OPEN dup_row; EXEC SQL OPEN ins_row; while (SQLCODE == 0) { EXEC SQL FETCH dup_row; EXEC SQL DECLARE ins_row if (SQLCODE == 0) { CURSOR FOR if (ord_row.o_num != last_ord){ INSERT INTO new_orders EXEC SQL PUT ins_row; VALUES (:ord_row); last_ord = ord_row.o_num; continue; } } break; }
11
The code above declares two cursors. n
dup_row returns the duplicate rows in the table. Because dup_row is for input only, it can use the ORDER BY clause to impose some order on the duplicates other than the physical record order. In this example, the duplicate rows are ordered by their dates (the oldest one remains).
n
ins_row is an insert cursor. This cursor takes advantage of the ability to use a C structure, ord_row, to supply values for all columns in the row.
The remainder of the code examines the rows that are returned through dup_row. It inserts the first one from each group of duplicates into the new table and disregards the rest.
For More Information See Chapter 9, Modifying Data Through SQL Programs in the IBM Informix Guide to SQL Tutorial for a detailed discussion of this subject.
Optimizing Load Performance 3-11
High Performance Loader The IBM IDS High Performance Loader provides two load modes: n Deluxe n Express The HPL offers these special features: n Parallel load capabilities n Data conversion n Field mapping
12
The High Performance Loader (HPL) is the latest load utility introduced by IBM Informix. It is the most flexible, most powerful, and the fastest of the load utilities. The High Performance Loader offers two different load modes, express and deluxe. The HPL also offers several other very powerful features: Parallel loads
The HPL can parallelize all aspects of the load process including data conversion, filtering, and I/O.
Data conversion
The HPL can do data conversion, including case sensitive, null substitution, and even EBCDIC to ASCII conversions.
Field mapping
The HPL offers sophisticated field mapping capabilities. These capabilities make it easy to load one file into multiple tables or only a portion of the data in a file into a table. In addition, noncontiguous fields may be loaded into a single column.
3-12 Optimizing Load Performance
HPL: The Data Load Process tenktup2
/export Format
Map
Data files Device array
Selected records
or tape(s)
Input records
Filter
13
When you prepare to run a data load using the HPL, you describe the actions that the HPL must take by defining a set of metadata components. The components describe different aspects of the load process. The slide above shows the data load process. The HPL uses information from: n
The device array to find the set of the source-data files.
n
The format to define the data types and layout of the source data.
n
The filter to select the records from the source data that should be written to the database table.
n
The map to modify and reorganize the data.
Optimizing Load Performance 3-13
HPL: Component Parts tape
ipload
file
onpload client/server connections
pipe Database Server
onpload database
wisc_db 14
The ipload utility (an X-windows utility) has the following features: n
Creates and manages the onpload database
n
Creates and stores information for onpload
n
Lets you create, edit, and group the components of the load and unload jobs
n
Stores information about the load components in the database
The onpload utility has the following features: n
Converts, filters, and moves data between a database and a storage device
n
Uses information from the onpload database to run the load and unload jobs and to convert the data
n
Records information during a load about data records that do not meet the load criteria
The slide above shows the relationships among the parts of the HPL. The ipload utility connects to the database server to populate the onpload database. ipload controls the onpload utility, which uses the multi threaded architecture to make multiple connections to the database server and multiple I/O connections to tapes, files, or pipes.
3-14 Optimizing Load Performance
HPL: Deluxe Load Mode The High Performance Loader deluxe load mode provides:
Ease of use
Moderate
Speed
Moderate to high
Flexibility
High
Considerations
n Data is logged during the load. The user
may specify a commit interval as appropriate. n Tables are not locked during loads. n Constraints, indexes, and triggers are set to FILTERING WITHOUT ERROR during the load operation. 15
Deluxe load mode is the most flexible choice for loading data. Deluxe load mode simulates an array of insert cursors, one per device. Deluxe load mode: n
Does not restrict the type of data that can be loaded.
n
Allows the table to be in use during the load.
n
Does not disable indexes, constraints, or triggers during the load.
Since indexes, constraints, and triggers are not disabled, they do not have to be enabled when the load completes.
Tip Deluxe load mode may be the fastest choice for loading data into a table that already contains a large number of rows.
Optimizing Load Performance 3-15
HPL: Express Load Mode The High Performance Loader express load mode provides:
Ease of use
Moderate to high
Speed
High
Flexibility
Moderate to high
Considerations
n No logging is performed. n Constraints, indexes, and triggers are
disabled. n Blobs are not supported. n Row size must be less than page size. n Holds exclusive table lock for duration of the load. 16
Express load mode provides very fast performance because it does not perform logging, and the data bypasses the SQL layer of the IBM IDS server code. Using express load, whole pages of rows are moved directly from light append buffers in shared memory to extent pages on disk. During the load, these extent pages are not part of the tablespace being loaded; only at the end of the load are the newly loaded extents appended to the table.
Light Append Buffers The light append buffers are located in the virtual segment of shared memory. You may monitor light append buffer utilization with the onstat -g lap command.
Tip For best load performance, use express load mode whenever possible.
3-16 Optimizing Load Performance
PLCONFIG Tuning with the HPL
CONVERTVPS
One convert VP per CPU
CONVERTTHREADS
2 threads/device
STRMBUFFSIZE
Should be some multiple of AIOBUFSIZE. One or two times AIOBUFSIZE is sufficient.
STRMBUFFERS
CONVERTTHREADS + 4
AIOBUFSIZE
Choose a buffer size to match the best block size for the tape drive. Large buffers increase performance if sufficient memory is available.
AIOBUFFERS
CONVERTTHREADS + 4 or (2 * CONVERTTHREADS), whichever is larger.
17
The onpload configuration parameters are stored in the file specified by the PLCONFIG environmental variable. These parameters will affect the performance of load operations.
CONVERTVPS
Number of available physical CPUs minus one. For example, consider a machine with eight physical CPUs. If IBM IDS is configured for four CPUVPS you might want to configure three CONVERTVPS, allocating four CPUs for the IDS CPUVPS, one CPU for other system work, and three CPUS for the load conversion process. To avoid the cost of context switching between CPUVPS and converter VPs, we consider available CPUs to be those not allocated for IDS CPUVPS.
CONVERTTHREADS Two or more for jobs that require conversions, one if no data conversion
must be performed. While loading character data is relatively inexpensive in terms of conversion, decimal and money types are stored internally as C structures and are much more expensive to convert. When loading a table with a large number of decimal columns, more converter threads should be started. STRMBUFSIZE
Specifies the size of the buffers allocated for transporting data between the loader and IBM IDS. STRMBUFFSIZE should be a multiple of the page size and should be configured as large as possible for optimum performance (32 to 64 KB).
Optimizing Load Performance 3-17
STRMBUFFERS AIOBUFSIZE
AIOBUFFERS
Specifies the number of stream buffers per device. For ASCII loads, set STRMBUFFERS to (CONVERTHREADS + 4). Specifies the size of the internal AIO buffer used by onpload to perform device I/O. AIOBUFSIZE should be at least as large as the device block size. If memory is available, increasing AIOBUFSIZE in multiples of the block size may be beneficial. Specifies the number of internal AIO buffers allocated per I/O device. Set AIOBUFFERS to (3 * CONVERTHREADS). For no conversion jobs, configure four AIO buffers.
Note Each session can have its own plconfig file as specified by the PLCONFIG environment variable. These files are located in $INFORMIXDIR/etc.
Tip The PDQPRIORITY environment variable is set at 100. A setting of 100 means that HPL uses all available resources to process queries in parallel. Users cannot change the setting. For more information on this environment variable, see the “Informix Guide to SQL: Reference.”
3-18 Optimizing Load Performance
HPL Demonstration Your instructor will demonstrate for you the High Performance Loader and show you how to use it for the labs in this course.
19
The graphical user interface to the High Performance Loader is started with the ipload command. The ipload executable will start and connect to the IBM IDS system that is specified with the INFORMIXSERVER environment variable. If the onpload database does not exist on this system, it will be automatically created. If you wish to use the onpload database on a different IBM IDS system, you can do so by choosing the Configure:Server menu option.
Optimizing Load Performance 3-19
Exercise 1 The instructions for this lab exercise will lead you through the steps for loading your database tables with an insert cursor program designed for the Wisconsin Benchmark database. The goal of this exercise is to allow you to perform table loads without any optimization of the IBM IDS configuration, and to capture the time required for the load. Later, we will compare the time required for this load with the time required to load data after optimizing the IBM IDS configuration. 1.1 Create a new schema file for the database. Make a copy of the wisc.sql file, named wisc_3_1.sql, for example. Notice in the proposed solution below that the new schema file creates the tables only. Indexes will be created in a later exercise. Add a fragmentation scheme, so that you will load each of these tables into your four dbspaces.
Note Your instance of the engine already has four 50 MB dbspaces configured for it: dbspace1, dbspace2, dbspace3 and dbspace4.
drop database wisc_db; create database wisc_db; create table onektup ( unique1 INTEGER NOT NULL, unique2 INTEGER NOT NULL, two INTEGER NOT NULL, four INTEGER NOT NULL, ten INTEGER NOT NULL, twenty INTEGER NOT NULL, onePercent INTEGER NOT NULL, tenPercent INTEGER NOT NULL, twentyPercent INTEGER NOT NULL, fiftyPercent INTEGER NOT NULL, unique3 INTEGER NOT NULL, evenOnePercent INTEGER NOT NULL, oddOnePercent INTEGER NOT NULL, stringu1 CHAR(52) NOT NULL, stringu2 CHAR(52) NOT NULL, 3-20 Optimizing Load Performance
string4 CHAR(52) NOT NULL ) fragment by round robin in dbspace1, dbspace2, dbspace3, dbspace4; create table tenktup1 ( unique1 INTEGER NOT NULL, unique2 INTEGER NOT NULL, two INTEGER NOT NULL, four INTEGER NOT NULL, ten INTEGER NOT NULL, twenty INTEGER NOT NULL, onePercent INTEGER NOT NULL, tenPercent INTEGER NOT NULL, twentyPercent INTEGER NOT NULL, fiftyPercent INTEGER NOT NULL, unique3 INTEGER NOT NULL, evenOnePercent INTEGER NOT NULL, oddOnePercent INTEGER NOT NULL, stringu1 CHAR(52) NOT NULL, stringu2 CHAR(52) NOT NULL, string4 CHAR(52) NOT NULL ) fragment by round robin in dbspace1, dbspace2, dbspace3, dbspace4; create table tenktup2 ( unique1 INTEGER NOT NULL, unique2 INTEGER NOT NULL, two INTEGER NOT NULL, four INTEGER NOT NULL, ten INTEGER NOT NULL, twenty INTEGER NOT NULL, onePercent INTEGER NOT NULL, tenPercent INTEGER NOT NULL, twentyPercent INTEGER NOT NULL, fiftyPercent INTEGER NOT NULL, unique3 INTEGER NOT NULL, evenOnePercent INTEGER NOT NULL, oddOnePercent INTEGER NOT NULL, stringu1 CHAR(52) NOT NULL, stringu2 CHAR(52) NOT NULL, string4 CHAR(52) NOT NULL ) fragment by round robin in dbspace1,
Optimizing Load Performance 3-21
dbspace2, dbspace3, dbspace4;
1.2 To create the database and tables, execute: dbaccess sysmaster wisc_3_1.sql
1.3 Bring up the High Performance Loader (HPL): w
Start exceed for X-windows emulation - the application when running should appear in the task bar
w
Obtain the IP address of your computer
w
•
Click on Start->Run... to bring up the Run window.
•
Type in cmd and click on OK. This will bring up a MSDOS window.
•
Then type in ipconfig. You will see the IP Address displayed
Run the HPL: •
In a Unix window, at the $ prompt, type in export DISPLAY=
:0.0 for example: export DISPLAY=192.147.102.34:0.0
•
In the same window, type in ipload. This will cause the HPL GUI to be displayed on your screen. 1.4 The benchmark uses the Wisconsin database. It has 3 tables: onektup tenktup1
10,000 rows 100,000 rows
tenktup2
100,000 rows
The following load files are stored on the server: /stage1/perf_tune_data/ onektup.all
10,000
onektup.1
2,500
onektup.2
2,500
tenktup.all
100,000
tenktup.1
25,000
tenktup.2
25,000
/stage2/perf_tune_data/ onektup.3
2,500
onektup.4
2,500
tenktup.3
25,000
tenktup.4
25,000
3-22 Optimizing Load Performance
Define three load jobs with the HPL. Load from one file for each table and time the loads: /stage1/perf_tune_data/onektup.all into onektup /stage1/perf_tune_data/tenktup.all into tenktup1 /stage1/perf_tune_data/tenktup.all into tenktup2
Note Since the tenktup1 and tenktup2 tables share the same schema and are both the same size, the same load file is used to load both of theses tables (tenktup.all)
Use HPL to get the load times for the three files: $ time onpload -j <jobname> -fl w
Load Time:__________
Verify the number of rows loaded using dbaccess, by running the SQL command: SELECT count(*) FROM onektup; SELECT count(*) FROM tenktup1; SELECT count(*) FROM tenktup2;
Save a copy of your environment, ONCONFIG file, and load times. 1.5 Recreate tables and load from four files into each table load files /stage1/perf_tune_data/onektup.1 /stage1/perf_tune_data/onektup.2 /stage2/perf_tune_data/onektup.3 /stage2/perf_tune_data/onektup.4 /stage1/perf_tune_data/tenktup.1 /stage1/perf_tune_data/tenktup.2 /stage2/perf_tune_data/tenktup.3 /stage2/perf_tune_data/tenktup.4
table onektup
tenktup1, tenktup2
Use HPL to get the load times for the three files: $ time onpload -j <jobname> -fl w
Load Time:__________
Verify the number of rows loaded using dbaccess, by running the SQL command: SELECT count(*) FROM onektup; SELECT count(*) FROM tenktup1; SELECT count(*) FROM tenktup2;
Save a copy of your environment, ONCONFIG file, and load times.
Optimizing Load Performance 3-23
Indexes, Constraints, Logging and Extent Sizes For optimal load performance: n Drop or disable indexes and constraints w e.g.: disable all objects for orders table: SET CONSTRAINTS, INDEXES, TRIGGERS FOR orders DISABLED; n Make the table type RAW (next slide) or turn off logging for the database n Appropriately size first extents
24
When a table is loaded with indexes enabled, each row requires a write to the data page on which the row will be stored and at least one write to each index that is affected, in order to add the key value to the leaf page. If the addition of the new key value causes the index page to split, multiple pages may have to be written. Constraints that require indexes generate the same overhead. You can speed loads significantly by dropping or disabling all indexes and constraints prior to loading new data. When a table in a logged database is loaded, the new row must be inserted into the data page and a copy of it must be written to the logical log. You can avoid the overhead of the extra write by converting the table to RAW during the load, and then converting it back to STANDARD (see next slide.) Finally, be sure to correctly size extents for the tables being loaded. The HPL’s express load mode builds table extents in memory and moves the data directly from memory into disk extents. A small extent size, such as the default eight pages, may slow down the load process.
3-24 Optimizing Load Performance
Turning Logging Off at the Table Level Existing standard tables can be temporarily converted to raw tables for loading of data: n drop any indexes and constraints n ALTER TABLE customer TYPE(RAW); n ALTER TABLE customer TYPE(STANDARD); n recreate indexes and constraints customer
customer
customer
standard
raw
standard
25
As of IBM IDS 7.31 and 9.21, an additional type of table, a raw permanent table, is available. Raw tables are not logged even though the database has logging. In prior versions of IBM IDS only temporary tables were non-logging in a logged database. A normal table in a logged database is referred to as a standard permanent table. This feature can be used with any data loading utility, but it is particularly useful with SQL Load for loading medium size tables relatively quickly, since SQL LOAD loads tables as one transaction.
Optimizing Load Performance 3-25
Violations and Diagnostics Tables n n n
Disable indexes and constraints during loads - for e.g.: SET CONSTRAINTS, INDEXES FOR customer DISABLED; Use violations and diagnostics tables to correct index, constraint violations when re-enabling the indexes and constraints. Use SQL to locate duplicates for unique indexes in versions prior to 7.10.UD1. row that causes violation
SET CONSTRAINTS enabled
constraints that were violated
one row can have multiple violations
violations table
diagnostics table 26
For optimum performance, always disable or drop constraints and indexes during loads, and enable or create them after all data is loaded. This will speed up the load and the index build. Index builds performed after the load can take advantage of the parallel index build capability of the Informix Dynamic Server. If you load data using the HPL’s express mode, indexes and constraints will be automatically disabled for you, then enabled in filtering mode following the load. If you are using a 7.1 or greater version of IBM IDS, the violations and diagnostics tables provide an easy mechanism for identifying and correcting index and constraint violations that cause index builds to fail. If you are using a version prior to 7.1, you can identify violations using SQL statements. See the examples below:
3-26 Optimizing Load Performance
Example To identify duplicates in a column with a unique index, you may use a self join like the one shown here: SELECT a.col1, a.rowid, b.col1, b.rowid FROM table1 a, table1 b WHERE a.col1 = b.col1 and a.rowid != b.rowid;
You can also use a correlated subquery to find duplicate column values. The next example demonstrates a correlated subquery used to retrieve duplicate values that prevent the successful creation of an unique compound index. SELECT * FROM table1 a WHERE EXISTS ( SELECT col1, col2 FROM table1 b WHERE a.col1 = b.col1 and a.col2 = b.col2 GROUP BY b.col1, b.col2 HAVING count(*) > 1 )
Reminder If you are loading a table that already contains data and determine that it is more efficient to perform the load with indexes and constraints enabled than to rebuild them, be sure to load foreign key rows in sorted order to minimize the number of disk reads performed.
Optimizing Load Performance 3-27
Fragmentation Strategies Loading Data If a fast load is the highest priority, large tables will load more quickly if fragmented round robin
28
When data load speed is the primary performance concern, we can optimize loads by distributing the write activity across as many devices as possible, and keeping fragment evaluation or selection as simple as possible. The simplest way to accomplish this is to use round robin fragmentation. Round robin fragmentation allows the server to randomly place rows in fragments evenly distributing the I/O.
Tip If the data is loaded with an INSERT statement, the server performs a hash function on a random number to determine which fragment the row should be loaded into. If the data is loaded using the HPL, the I/O is equally balanced but the row distribution will be a function of the number of devices and the number and size of the buffers specified in the PLCONFIG
3-28 Optimizing Load Performance
Optimizing Loads Recommended configuration parameters settings for loading data include the following:
NUMCPUVPS
Set to the number of physical processors.
NETTYPE
Configure one poll thread for each CPU VP and be sure to run the poll threads on CPU VPs. Do not specify more user connections than is required as this will cause unnecessary work for the poll threads. Remember that NETTYPE specifies the number of users per poll thread. A NETTYPE (ipcshm,8,10,CPU) will support 80 connections.
29
The IBM IDS configuration parameters that can most significantly affect loading are listed above with recommended settings. If you are using the HPL to load, some CPUS may be reserved for the onpload converter VPS. In this case, subtract the number of converter VPS from the number of physical CPUS to calculate the number of available CPUS.
Optimizing Load Performance 3-29
Optimizing Loads Continued If you are not loading with the HPL EXPRESS mode, modify these configuration parameters
BUFFERS
Maximize.
CLEANERS
Set to the number of chunks allocated for the largest table.
CKPTINTVL
Set to 3000.
LRUS
Allocate 1 queue pair per 500 to 750 buffers.
LRU_MAX_DIRTY
Set to 80.
LRU_MIN_DIRTY
Set to 70.
PHYSFILE
Large enough so checkpoints don’t trigger often.
30
BUFFERS
If data loading is the only activity on the system, do not be afraid to allocate 50% or more of real memory to the resident buffer cache.
LRUS
Configure enough LRU queue pairs to eliminate contention and to provide a manageable queue length. One rule of thumb is to allocate 1 LRU queue pair per 500 to 750 buffers up to the maximum of 128. For example, if you configure 50000 buffers, you may want to configure 100 LRU queue pairs, or for 100000 buffers, 128 LRU queues pairs. To monitor contention and ensure that adequate queues are allocated, use onstat -g spi.
LRU_MAX_DIRTY
Load benchmarks have demonstrated that it is preferable to allow nearly all of the resident segment buffers to fill up before beginning the process of flushing them to disk via the page cleaner threads. Set LRU_MAX_DIRTY to 80 so that page cleaning will begin when the resident buffer pool becomes 80% used (dirty).
LRU_MIN_DIRTY
Stop cleaning at 70% to avoid flushing pages unnecessarily.
CKPTINTVL
Set CKPTINTVL to 3000. A large checkpoint interval will allow the size of the physical log and, as a result, the amount of work accomplished to determine when checkpoints occur.
3-30 Optimizing Load Performance
Other Tips for Loading Data n n n n n n
Distribute input files across multiple devices to eliminate bottlenecks. Load from multiple input files or processes to increase parallelism. Set FET_BUF_SIZE to 32767 (insert cursors only) Force a checkpoint prior to beginning any loads. Use the HPL (onpload) for optimum performance and flexibility. Force a checkpoint immediately following a large load. onmode -c
31
Other tips to optimize your loads are: n
To eliminate I/O bottlenecks, distribute input files across multiple tape or disk devices.
n
If you are loading data using the HPL or an SQL insert cursor, set the environment variable, FET_BUF_SIZE, to 32 KB.
n
If the dbspaces that you are loading data into previously held tables that have been dropped and the dropped tables were logged, be sure to force a checkpoint after dropping the tables and before loading the new data. The checkpoint flushes all modified pages to disk, providing your load with a clean buffer cache to work with.
n
Force a checkpoint immediately after any large load process. This will prevent the possibility that a system failure might require an expensive fast recovery of the load process since the last checkpoint. The last checkpoint may have occurred several minutes or even an hour or more prior to the load, depending on the size of the physical log and the checkpoint interval specified.
Optimizing Load Performance 3-31
Exercise 2 In this exercise, you will attempt to improve load performance by modifying configuration parameters and environment variables and by applying the other tips that we have discussed. Below are some possible assumptions about your hardware environment. w
4 CPU machine
w
kernel AIO
w
128 MB RAM
Assume that loads are being done via the insert cursor program. You may want to use the following environment variable: FET_BUF_SIZE=32767
Note Force a checkpoint after dropping the database and prior to reloading it with: onmode -c Force a checkpoint immediately after the load completes.
2.1 Review load times from the previous exercise. 2.2 To drop your original database and to recreate the database and tables, execute: dbaccess sysmaster wisc_3_1.sql
Make sure you have the statement “ drop database wisc_db;” in the schema file and this will be done when you run it. 2.3 Modify your configuration parameters or environmental variables as desired to improve load performance, and then stop and restart your instance. VPCLASS BUFFERS LRUS CLEANERS LRU_MAX_DIRTY LRU_MIN_DIRTY CKPTINTVL
cpu,num=2 64000 16 16 80 70 3000
3-32 Optimizing Load Performance
2.4 Reload the tables and capture timings Rerun the three load jobs with the HPL. Again, load from one file for each table and time the loads: /stage1/perf_tune_data/onektup.all into onektup /stage1/perf_tune_data/tenktup.all into tenktup1 /stage1/perf_tune_data/tenktup.all into tenktup2
Use HPL to get the load times for the four files: $ time onpload -j <jobname> -fl w
Load Time:__________
Verify the number of rows loaded using dbaccess, by running the SQL command: SELECT count(*) FROM onektup; SELECT count(*) FROM tenktup1; SELECT count(*) FROM tenktup2;
2.5 Save the load times, the schema for the database, including dbspace information and extent sizes if appropriate, a copy of your ONCONFIG file, and your environment. 2.6 Repeat the above sequence of steps, modifying an additional environment variable or changing your configuration. w
Your 2nd change Load Time:__________
w
Your 3rd change Load Time:__________
Optimizing Load Performance 3-33
Load Performance Discussion Discuss changes in performance observed in the various load runs
Data Loading
34
Take this opportunity to discuss the changes in the various loads and the performance impact, and answer student questions.
3-34 Optimizing Load Performance
Module 4 Indexing Performance Issues
Indexing Performance Issues 07-2002 © 2002 International Business Machines Corporation
4-1
Objectives At the end of this module, you will be able to: n Understand the benefits and costs of indexes n Understand how indexes are implemented n Select an appropriate FILLFACTOR for an index n Identify when index clustering is appropriate n Use DBSPACETEMP and PDQPRIORITY to speed index builds
2
This module discusses the benefits and cost of indexes, the impact of referential and entity integrity constraints, and methods to optimize the index build process. To understand each of these issues, however, it is first important to understand what an IBM IDS index is. The next few pages provide a brief overview.
4-2 Indexing Performance Issues
Benefits and Costs of Indexing Benefits n Faster access n Enforce constraints and business rules Costs n Disk space requirements n Update, delete, and insert maintenance customer
3
Indexes are indispensable in OLTP environments where quick access to a single row or small set of rows is required. They are also useful in enforcing uniqueness among rows in a table and maintaining relationships between tables. Associated with these benefits, however, are the costs of creating and maintaining indexes. These are discussed in more detail later in this module.
Indexing Performance Issues 4-3
B+ Tree Indexes An index is arranged as a hierarchy of pages in a data structure called a B+ tree.
Leaf node
TABLE
Branch node
Root node
4
The database server uses an index to find a row quickly. In IBM IDS, indexes are organized in B+ trees, a set of nodes that contain keys and pointers that are arranged in a balanced tree structure hierarchy. The leaves are linked sequentially for rapid sequential access to data. The B+ tree is organized into levels. The leaf nodes contain pointers, or addresses, to the actual data. The other levels contain pointers to nodes on lower levels that contain keys that are less than or equal to the key in the higher level. Data Leaf Node Branch Node 385 Judy 387 >
Root Node
387
294
386
Sue
292
293
387
John
388
Sally
389
Jack
390
Sandy
391
Jill
392
Sam
> 89
4-4 Indexing Performance Issues
Doe
...
The Leaf Node Contains sorted index keys and pointers to table rows
Key value rowid Key value rowid
Key value rowid df Key value rowid df
df df
5
Leaf nodes are the end of the tree hierarchy. Leaf nodes contain index items and pointers to adjacent leaf nodes.
An index item is a key value and one or more rowid and delete flag pairings. The rowid is the pointer to the actual data row. Only leaf nodes point to data rows, and all leaf nodes must be at the same level.
Indexing Performance Issues 4-5
B+ Tree Management Before
After Adams Brown Downing
Adams Downing
Downing >
Johnson Smith
Johnson Smith
6
B+ Tree management requires that nodes are split, merged, and shuffled to maintain an efficient tree while accommodating inserts, updates, and deletes of key items. Informix has built very sophisticated node management techniques to minimize the performance impact of B+ Tree management.
Delete Compression To free index pages and maintain a compact tree, IBM IDS evaluates each node after physical deletes of index items to determine if it is a candidate for compression. If the node is not a root node and it has fewer than three index items, the node becomes a compression candidate.
Merging If either the right or left sibling can accommodate the index keys remaining in the compression candidate, the index items will be merged to the sibling node and the candidate node will be freed for reuse.
4-6 Indexing Performance Issues
Shuffling Otherwise, if neither sibling can accommodate the index items, IBM IDS will attempt to balance the nodes by selecting the sibling node with the most items and shuffling some of those items to the compression candidate so that both nodes have an equal or nearly equal number of keys. Shuffling helps maintain a balanced tree and prevents a node from becoming full if its adjacent nodes are not; this in turn helps reduce the likelihood that node splitting will be required.
Splits When a new key value must be added to a full index node, the node must split to make room for the new key. To perform a node split, the server must write at least three pages and potentially more. The goal of index management is to minimize splits, merges, and shuffles because: n
Extra processing is required w
n
Increased disk space requirements w
n
The affected index pages must be locked and written by the server. After a split, you have new pages that are not full. The partially full nodes increase the disk space requirements for storing your index.
A large number of splits reduce caching effectiveness. w
Partially full nodes do not contain as many key values as full nodes. If one full node held 200 key values, after a split you may have to cache 3 pages in memory to have access to the same 200 keys.
Indexing Performance Issues 4-7
FILLFACTOR FILLFACTOR allows you to specify the percent fullness of each index page created during an index build
INDEX
SELECT only: set FILLFACTOR 100. SELECT and DELETE only: set FILLFACTOR 100. INSERT and UPDATE: set FILLFACTOR 50 to 70. 8
The index fill factor specifies the percent fullness of each node at the time the index is created. FILLFACTOR is not maintained over the life of the index. A high fill factor will initially produce a more compact index, providing more efficient caching, and potentially, reducing the number of page reads required to service an index search. For indexes on read only tables, be sure to use a fill factor of 100. Also, indexes on tables that will receive SELECT and DELETEs only should be created with a fill factor of 100 to reduce the likelihood that merging and shuffling will be required. On the flip side, for tables that grow quickly receiving a high number of inserts, creating indexes with a low fill factor can delay the need for node splits. Use a FILLFACTOR between 50 and 70% for indexes on tables that receive a very large number of inserts.
Note The dbschema utility will not list the fill factor specified for an index.
4-8 Indexing Performance Issues
Pre - 9.x Tablespace Architecture Prior to 9.x the default behavior was for index pages to be created within the same extent as data pages
dbspace2
Page 0 Bitmap
Page 1 Index
Page 2 Index
Page 3 Data
Page 4 Data
Page 5 Index
Page 6 Data
Page 7 Free
9
In versions of IBM IDS prior to the 9.x series, when an index built on a standard (not partitioned or fragmented) table, the index and data pages were allocated in the same tablespace. This type of index for these versions of IBM IDS was said to be 'attached' to the table. The major performance issue is that table scans (when they occur) are slowed down because the increased movement of the electromechanical disk drive heads. The interleaved index pages must be skipped over by the heads, and additional seek time is required to move to the other tracks that contain additional data pages. This effect will become more pronounced with larger table sizes and tables that contain multiple indexes and/or indexes that require more index pages, such as composite indexes. Remember that in an OLTP environment it is not uncommon for there to be as many index pages as there are data pages for a typical table. Thus a large indexed table may be distributed over roughly twice as many tracks on a disk as a table that is not indexed. Furthermore, an active indexed table with numerous inserts, updates and deletions will over time cause the index and data pages to be almost evenly interleaved as newly freed pages are randomly allocated as either a data or index page during these changes to the table. So in addition to the increased total seek time required for a partial or full table scan, there is also increase in the total rotational latency for this operation.
Indexing Performance Issues 4-9
9.x: Separate Tablespaces for Data and Indexes
Page 0 Bitmap
Page 1 Data
Page 2 Data
dbspace2
Page 0 Bitmap
Page 1 Index
Page 3 Data
Page 2 Index
Page 3 Free
Page 4 Data
Page 5 Free
Page 4 Free
Page 5 Index
Page 6 Free
Page 7 Free
Page 6 Free
Page 7 Free
dataspace
indexspace 10
With the introduction of 9.x the default behavior of IBM IDS is to create separate tablespaces (collections of extents) for the data pages (which we can call dataspace) and for the index pages (indexspace) as shown in the slide above. This increases overall throughput for SQL statements, especially when applied against large, heavily indexed tables in an OLTP environment.
Environment Variable ENV DEFAULT_ATTACH The behavior of allocating both data and index pages from the same extent can be accomplished in 9.x by setting this environment variable to 1.
4-10 Indexing Performance Issues
Page 0 Bitmap
Page 1 Data
Page 2 Index
Page 3 Data
Page 4 Data
Page 5 Free
Page 6 Data
Page 7 Free
Explicit Indexes Allows you to balance I/O by specifying a separate dbspace and a different index FILLFACTOR if desired
CREATE TABLE tab1 ( col1 INTEGER, col2 INTEGER, col3 CHAR (25)) IN dbspace1; CREATE INDEX index1 ON TABLE table_name(col1) IN indexdbs1 FILLFACTOR 70;
tab1
dbspace1
indexdbs1 11
Even though the default behavior of IBM IDS now separates the data pages and index pages into their own tablespace, the index will still be created within the same dbspace that the table itself (the data pages) was created in. With explicit index creation, you have additional control over the structure and location of the index: n
You can specify a separate dbspace, on a different disk, in which to store the index tablespace. This will significantly increase overall table performance, especially if the two disk drives each have their own disk controllers.
n
You can specify a FILLFACTOR for the index pages for a specific table that differs from other tables in the database server. w
The FILLFACTOR for the index pages in the example above will be the default value that is specified for the entire instance of the database server as specified in the ONCONFIG file, which is typically 90%.
w
If you have a highly active table with numerous inserts and deletes a FILLFACTOR of 60 to 70% may be preferred for the index pages.
Indexing Performance Issues 4-11
Implicit Indexes n n n
Implicit indexes are created when a constraint is defined that cannot use an existing index. You cannot specify a dbspace location, fragmentation strategy or fill factor for the index. Implicit indexes are created in the database dbspace.
CREATE TABLE tab1 ( col1 INTEGER, col2 INTEGER, col3 CHAR(25), PRIMARY KEY (col1)) IN table1dbs; 12
Recall that constraints defined for a table are implemented by the server through the creation of either unique or duplicate indexes, depending upon the type of constraint. The PRIMARY KEY constraint in the slide above, for example, is actually enforced through the system creation of a unique index on the key col1. Implicit indexes are created when a constraint is defined that cannot use an existing index. Implicit indexes do not allow you to specify a dbspace location or FILLFACTOR for the index (implicit indexes use the default FILLFACTOR). Implicit indexes are created in the dbspace where the database was created, not the dbspace where the table was created. This introduces additional management issues related to disk storage and performance as discussed in the previous slides.
4-12 Indexing Performance Issues
Use Explicit Indexes for Constraints
Syntax CREATE TABLE tab1 ( cola INTEGER, colb CHAR(25)) IN dbspace4; CREATE UNIQUE INDEX cola_ix ON TABLE tab1 (cola) IN dbspace5 FILLFACTOR 70; ALTER TABLE tab1 ADD CONSTRAINT PRIMARY KEY (cola) CONSTRAINT cola_pk;
13
It is recommended to explicitly create indexes that exactly match the referential constraint and then use the ALTER TABLE statement to add the constraint. The constraint will use the existing index instead of implicitly creating a new one. So you should use the procedure demonstrated in the slide above rather than in the implicit index example from the previous slide. Remember, indexes are also used for foreign key and unique constraints.
Indexing Performance Issues 4-13
Clustered Indexes
107 104 109
CREATE CLUSTER INDEX index1 ON TABLE table_a (col1) IN indexdbs1 FILLFACTOR 100;
101 102 103
102
104
110
105
105
n
113 106 101 112 108
n
Creating a clustered index reorders the data rows in the order of the index A table can have only one clustered index
103
106 107 108 109 110 111 112 14
The syntax CREATE CLUSTER INDEX and ALTER INDEX ... TO CLUSTER is somewhat misleading. The real effect of these statements is to cluster or reorder the actual rows in the table. Clustered indexes are most beneficial when a large number of rows are being read for output in sorted order and most of the columns in the tables are needed, eliminating the opportunity to perform a key only read. Data access via a clustered index will: n
exploit the read ahead capability to minimize the total number of disk reads required and maximize the number of data pages read by each read operation
n
allow the query to be resolved without a sort operation
n
allow data to be sequentially scanned
Tip You do not need to create a clustered index if the data has been newly loaded in the order specified by the index. Since the data is already in sorted order, the performance benefits anticipated from a clustered index are already provided.
4-14 Indexing Performance Issues
Monitoring Clustered Indexes smaller numbers in the clust column of the sysindexes table correspond to greater clustering
customer
... 2000 rows
Assuming 2000 customers, n a completely unclustered table: sysindexes.clust = 2000 n a fully clustered table: sysindexes.clust = 20
dbspace2
20 data pages 15
The sysindexs.clust column is an INTEGER that indicates the degree of clustering: smaller numbers correspond to greater clustering. The maximum value corresponds to a table that is not clustered at all, and represents the number of rows in the table. The minimum value corresponds to is the number of data pages in the table. This column is blank until the UPDATE STATISTICS statement is run on the table.
Indexing Performance Issues 4-15
Exercise 1 Syntax Once you have determined that the degree of clustering of your table has deteriorated enough to noticeably impact performance, you can recluster your table using the following syntax: ALTER INDEX index_name TO CLUSTER;
In this exercise, you will drop your database and recreate it as before (fragmented ROUND ROBIN across four dbspaces). But this time you will load it with only 10% of the rows. Next you will create detached indexes on your three tables using the IN DBSPACE X clause. 1.1 Create another dbspace in y our instance, named dbspace5. Make this dbspace 50 MB in size, by specifying a size of 50000 KB, with an offset of 50000 KB. Use the device /dev/rdsk/stuxxxD for this dbspace.
Warning! Do not use the stuxxxA, stuxxxB, stuxxxC or stuxxxF devices - these are already completely allocated to your instance!!! In addition, the first 50 MB of stuxxxD and stuxxxE are also already allocated to your instance!
1.2 Create another dbspace in y our instance, named dbspace6. Make this dbspace 50 MB in size, by specifying a size of 50000 KB, with an offset of 50000 KB. Use the device /dev/rdsk/stuxxxE for this dbspace. 1.3 Run the following command to perform a level 0 backup: ontape -s
1.4 Create an sql file named onektup_ix.sql, and enter the following SQL statements: create index onekidx1 on onektup (unique1) in dbspace5; create index onekidx2 on onektup (unique2) in dbspace5; create index onekidx3 on onektup (onePercent) in dbspace5;
4-16 Indexing Performance Issues
Use time (SUN) to capture timings for the index build. Record the timings for later comparison. (time dbaccess wisc_db onektup_ix.sql) > onektup_ix.time1 2>&1
1.5 Create an sql file named tenktup1_ix.sql, and enter the following SQL statements: create index tenk1idx1 on tenktup1 (unique1) in dbspace5; create index tenk1idx2 on tenktup1 (unique2) in dbspace5; create index tenk1idx3 on tenktup1 (onePercent) in dbspace5;
Use time (SUN) to capture timings for the index build. Record the timings for later comparison. (time dbaccess wisc_db tenktup1_ix.sql) > tenktup1_ix.time1 2>&1
1.6 Create an sql file named tenktup2_ix.sql. Note that this explicit index will go into dbspace6. create index tenk2idx1 on tenktup2 (unique1) in dbspace6; create index tenk2idx2 on tenktup2 (unique2) in dbspace6; create index tenk2idx3 on tenktup2 (onePercent) in dbspace6;
Use time (SUN) to capture timings for the index build. Record the timings for later comparison. (time dbaccess wisc_db tenktup2_ix.sql) > tenktup2_ix.time1 2>&1
Indexing Performance Issues 4-17
Parallel Index Builds Master B-Tree Appender Thread Exchange B-Tree Appender Threads Parallel Sort Threads Exchange Scan Threads
18
The parallel index build operation is broken up into three subtasks. This division of duties is referred to as vertical parallelism. First, scan threads read the data from disk, one scan thread per chunk. Next, the data is passed to the sort threads (see the next slide). Finally, the sorted sets are appended into a single index tree by the b-tree appender thread. Beginning with IBM IDS version 7.2, all indexes are built using the parallel algorithms (i.e. using the bottom up approach). For earlier versions, IBM IDS attempts to perform parallel sorts to speed up the index build process for indexes on any tables containing 50,000 rows or more, even if PDQ is not turned on.
4-18 Indexing Performance Issues
Index Builds Are From the Bottom Up Creating an index on a table that has previously been loaded with data is dramatically faster than loading an empty table which already has indexes on that table
19
Creating an index on a table that has previously been loaded with data is dramatically faster than loading an empty table which already has indexes defined on that table. The primary reason for this is the way in which an index is built on an existing, loaded table. n
During the index build, the data in the table is read and sorted by the key for the index. The location of each row (the rowid) is associated with each key during this process.
n
An index page, either a root node or a branch node, is created. This node will have pointers to index leaf pages, once those leaf nodes are created.
n
Next, index leaf pages are created with the sorted key data, and filled up to the FILLFACTOR. As each leaf page is created, a pointer to this leaf page is then inserted into the root or branch node.
n
If the root or branch node is filled up to the FILLFACTOR with pointers to leaf pages, then another level of the index structure is added. Another branch node is created that will contain the key and rowid data to the data pages, and then a parent root or branch node is created, with pointers now to two sibling branch nodes.
Indexing Performance Issues 4-19
Calculating Parallel Index Build Threads n
n
n
Scan threads w one per chunk w named xchg_1.X Sort threads w btappenders * PSORT_NPROCS (or 2, whichever is greater) w named psortproc Btappender threads w lesser of the (NUMCPUVPS + 1) or the number specified in the table below
20
You can calculate the number of threads for a parallel index build using the following formula: Index build threads = scan threads + sort threads + btappender threads + sqlexec n
scan threads = one per chunk
n
sort threads = btappenders * (the greater of PSORT_NPROCS or 2)
n
btappender threads = the lesser of (NUMCPUVPs + 1) or....
Number of Btappenders
Projected Number of Index Pages
1
< 400
2
400 - 4 K
3
4 K - 40 K
4
40 K - 200 K
5
200 K - 1 million
6
> 1 million
4-20 Indexing Performance Issues
Environment Variable Settings Environment Variables index
data
PDQPRIORITY
Set to 100
PSORT_NPROCS
Do not set (see below)
DBSPACETEMP
To temporarily allocate additional temporary spaces for the index build
sort threads
Sort Buffer Sort Buffer Sort Buffer
tempdbs1
Sort threads read data from the shared memory buffers and put the rows in the sort buffers.
tempdbs2 21
During an index build the keys are sorted prior to insertion into the index. The sort threads read data from the buffer cache and sort the rows in sort buffers located in the virtual segment of shared memory. The fastest parallel sort is obtained when the entire sorting operation can be performed in the virtual segment of shared memory. If the space required exceeds the available virtual shared memory then the sort will overflow to temporary dbspaces on disk, slowing down the sort operation. n
PDQPRIORITY - Set to a value of 100 in the environment of the process spawning the index build. PDQPRIORITY will affect a number of factors including the memory allocated for the parallel sorts. The maximum memory available for the index build sort operations will be divided evenly across the number of sort threads allocated.
n
PSORT_NPROCS - IBM Informix recommends that you not set this environment variable! If you have a host system with multiple CPUs, then IBM IDS will automatically use two threads per sort when it sorts the index keys and PSORT_NPROCS is not set.
n
DBSPACETEMP - You can set this in the environment of the process executing the CREATE INDEX statement if you want to temporarily allocate additional temporary dbspaces beyond those already allocated to the system via the DBSPACETEMP configuration parameter.
Indexing Performance Issues 4-21
Configuration Parameters
DBSPACETEMP
Set to many temporary dbspace names.
PHYSFILE
Set large enough so checkpoints fire infrequently.
BUFFERS
Maximize: set to 25% of available memory.
SHMVIRTSIZE
Maximize: set to 75% of available memory.
LRUS
1 LRU queue pair per 750 to 1000 buffers.
LRU_MAX_DIRTY
Set to 80.
LRU_MIN_DIRTY
Set to 70.
CKPTINTVL
Set to 3000.
22
You may also improve index build performance by setting the above configuration parameters. Additional information is shown below: DBSPACETEMP
You should have a large amount of temporary dbspace allocated over multiple devices. If your IBM IDS port supports kernel AIO, locate these temporary devices on raw devices to take advantage of kernel AIO to speed up the disk I/O. If adequate memory is available and intermediate merging is not required then this temporary dbspace should be adequate
LRUS
Eliminate contention for acquiring a queue on which to place the address of a modified buffer and provide a manageable queue length by configuring one LRU queue pair per 750 to 1000 buffers up to the maximum of 128. To monitor contention and ensure that adequate queues are allocated, use onstat -g spi.
LRU_MAX_DIRTY
Do not initiate page cleaner flushes until 80% of the buffers in a queue have been written to
LRU_MIN_DIRTY
Stop cleaning at 70% to avoid flushing pages unnecessarily.
CKPTINTVL
A large checkpoint interval will allow the size of the physical log and, as a result, the amount of work accomplished to determine when checkpoints occur.
4-22 Indexing Performance Issues
Using Onmode for Index Builds onmode -D onmode -M
Temporarily changes MAX_PDQPRIORITY. Set to 100 prior to the index build Temporarily changes DS_TOTAL_MEMORY. Set to 90% SHMVIRTSIZE prior to the index build
Shared Memory
onmode
90% of virtual memory available for index build 23
You can use the onmode command-line utility to make temporary changes to a number of configuration parameters without having to modify the ONCONFIG file and then restart the database server. The two that are particularly useful for an index build are described below:
onmode -D
Temporarily setting MAX_PDQPRIORITY to 100 allows for the allocation of all available server resources for the index build via a PDQPRIORITY setting of 100: n as an environment variable n
onmode -M
SET PDQPRIORITY 100; (SQL statement)
Temporarily changing DS_TOTAL_MEMORY to 90% of SHMVIRTSIZE will allow the index build to utilize almost all of the virtual memory available on the system. If SHMVIRTSIZE is large it may be possible to execute all of the sorts in memory and not spill over into the temporary dbspaces (which slows down the sort).
Indexing Performance Issues 4-23
Exercise 2 In this exercise, you will use the recommended configuration parameters to improve the performance of your index build processes. 2.1 Drop the indexes created in the previous exercise.
Syntax To drop the indexes, enter the following commands into an SQL file and run them using dbaccess: drop drop drop drop drop drop drop drop drop
index index index index index index index index index
onekidx1; onekidx2; onekidx3; tenk1idx1; tenk1idx2; tenk1idx3; tenk2idx1; tenk2idx2; tenk2idx3;
Create an SQL file, such as drop_indexes.sql, with the above commands in it, and run it: dbaccess wisc_db drop_indexes.sql
2.2 Use ISA and edit the ONCONFIG file to make the following change SHMVIRTSIZE
20480 # note that this is KBytes
Bounce your instance. 2.3 Use onmode to set MAX_PDQPRIORITY to 100 and DS_TOTAL_MEMORY to 90% of virtual memory (18,432 Kbytes = 18 MBytes): onmode -D 100 onmode -M 18432
2.4 Modify your index building SQL files onektup_ix.sql, tenktup1_ix.sql and tenktup2_ix.sql so that the first statement sets the PDQPRIORITY to 100. For example, set PDQPRIORITY 100; create index onekidx1 on onektup (unique1) in dbspace5; create index onekidx2 on onektup (unique2) in dbspace5; ...
4-24 Indexing Performance Issues
2.5 Recreate the indexes and capture timings to new files: w For table onektup: (time dbaccess wisc_db onektup_ix.sql) > onektup_ix.time2 2>&1 w For table tenktup1 (time dbaccess wisc_db tenktup1_ix.sql) > tenktup1_ix.time2 2>&1 w For table tenktup2 (time dbaccess wisc_db tenktup2_ix.sql) > tenktup2_ix.time2 2>&1
Indexing Performance Issues 4-25
How Did Index Creation Times Change? Discuss any observed changes in performance.
Data Loading
26
Students may present a brief presentation comparing performance of the first index build to the second.
4-26 Indexing Performance Issues
Module 5 Optimizing Update Statistics
Optimizing Update Statistics 02-2002 © 2002 International Business Machines Corporation
5-1
Objectives At the end of this module, you will be able to: n Understand data distributions, resolution and confidence n Describe the update statistics modes. n List the new performance enhancements for the most recent version of IBM IDS.
2
5-2 Optimizing Update Statistics
The Query Optimizer and UPDATE STATISTICS The IBM IDS query optimizer is a cost-based optimizer to determine how to process a query most efficiently. It utilizes the following: n Data dictionary information w table names, column names and types, available indexes, restrictions, triggers n Table statistics w number of rows, btree depth, leaf pages, unique values, max/min column values w data distributions UPDATE STATISTICS is the command used to populate the system catalog tables! 3
The query optimizer utilizes data dictionary from the following system catalog tables: n
systables
n
syscolumns
n
sysindexes
In additon the query optimizer utilizes the data-distribution statistics that are stored in the sysdistrib system catalog table to determine the lowest-cost query plan.
Optimizing Update Statistics 5-3
Important! TO ACHIEVE AND MAINTAIN GOOD PERFORMANCE IN IBM IDS YOU MUST RUN UPDATE STATISTICS ON A PERIODIC BASIS! systables
sysindexes
sysdistrib
e at cs d i up tist a st
number of customers
syscolumns
00001
19999
39999
59999
79999
99999
UNITED STATES ZIP CODES
4
Make sure that you keep these statistics up-to-date so that the optimizer can select the best query plan. Whenever tables are updated after they are created and populated, or updated by the addition of many rows in a table load, run UPDATE STATISTICS as part of the update process. If tables are updated by insertions, deletions, or update of individual rows, run UPDATE STATISTICS on a periodic basis.
Note To generate a data distribution for storage in the sysdistrib system catalog, you must run UPDATE STATISTICS MEDIUM or UPDATE STATISTICS HIGH. Running only UPDATE STATISTICS will not generate a data distribution!
5-4 Optimizing Update Statistics
Data Distributions
number of customers
Column values are extracted from tables and sorted into bins to create a statistical distribution of the data across its domain.
00001
19999
39999
59999
79999
99999
UNITED STATES ZIP CODES 5
Data distributions are generated by the server for individual columns of a table, and are stored as an encoded histogram in the encdat column of the sysdistrib system catalog. Each row in this table stores a 256-byte “chunk” of the encoded histogram, so for any but the smallest tables multiple rows are required to store the full histogram. The sequence column is used to track entries in the sysdistrib table when more than one row is required to hold the column information. You should create data distributions on columns which you use in the WHERE clauses of your SQL statements.
Optimizing Update Statistics 5-5
Resolution
number of customers
This is the percentage of data that is placed within each bin - the lower the resolution, the greater the number of bins.
00001
19999
39999
59999
79999
99999
UNITED STATES ZIP CODES 6
You can use the resolution to specify the number of bins to use. The formula for calculating the number of bins is: 100/resolution = number of bins
A resolution of 1 means that one percent of the data goes into each bin, up to a maximum of 100 bins (100/1 = 100 bins are created). A resolution of 10 means that ten percent of the data goes into each bin (10 bins are created). You cannot have fewer than 10 bins. The default resolutions for UPDATE STATISTICS are shown in the table below: MODE
Default Resolution
Number of Bins
LOW (default)
n.a.
n.a.
MEDIUM
2.5
40
HIGH
0.5
200
The resolution can be as low as 0.005, to give you a maximum of 20,000 bins (although you cannot specify a number less than 1/(rows in the table)). In addition, an OVERFLOW bin is created for values that have a large number of duplicates. For example, a column with values “M” and “F” would cause the optimizer to pull these values into overflow, and not into the data distribution bins specified for that column.
5-6 Optimizing Update Statistics
More Accurate Data Distributions
number of customers
The lower the resolution, the closer the data distribution in the system catalogs approaches the actual distribution of the data.
00001
19999
39999
59999
79999
99999
UNITED STATES ZIP CODES 7
The distribution for the zipcode column for our customer table might appear as shown in the slide above if we ran the following SQL command from dbaccess: UPDATE STATISTICS MEDIUM FOR TABLE customer(zipcode) RESOLUTION 2.5
The default number of bins may be adequate if the data is uniformly distributed across the domain of values for that column. However, if the data is highly skewed, then a larger number
of bins (smaller RESOLUTION) is required to statistically capture this skew within the system catalogs. For example, a company selling a specialized product might find that they have thousands of customers in a single zipcode located in a large urban center on the west coast for every few customers in a single zipcode located in a sparsely populated area of the far west. UPDATE STATISTICS MEDIUM FOR TABLE customer(zipcode) RESOLUTION .05
Optimizing Update Statistics 5-7
Confidence Confidence is a statistical measure of the reliability of the sample for UPDATE STATISTICS MEDIUM. UPDATE STATISTICS MEDIUM FOR TABLE customer RESOLUTION .05 .99
The confidence is expressed as a value between .80 and .99. The default value is .95.
8
UPDATE STATISTICS MEDIUM uses a statistical sampling method to create distributions. Confidence is used when sampling the column data in the table. Both the resolution and confidence are used by the database server to determine the sample size. By increasing the confidence or decreasing the resolution value, the sample size increases.
5-8 Optimizing Update Statistics
Update Statistics Modes HIGH MEDIUM
LOW
Update statistics and create distributions representing every value in the column. Update statistics and create distributions representing an 85 to 99% accurate representation of the values in the column. Update statistics in the systables, syscolumns and sysindexes tables only. Does not build data distributions.
Updating the database statistics is the single most important statement for improving performance. This populates the database system catalog tables with information required by the optimizer to determine the lowest costs access path for retrieving data.
9
Update statistics should be run after initially loading the data and creating indexes. It should also be run after any significant changes to the database tables including any large scale insert, update and delete operations. If you do not update statistics the optimizer will have inaccurate data to determine access paths.
Important! The UPDATE STATISTICS command default mode is LOW which only updates statistics in systables, syscolumns and sysindexes.
It does not build data distributions! The HIGH and MEDIUM modes also create distributions and populate the sysdistrib table. Data distributions are helpful when you cannot determine an evenly distributed expression based fragmentation scheme. Data that is initially loaded using a round robin fragmentation scheme can now be analyzed using the data distributions. Once UPDATE STATISTICS are run the distributions may be obtained by using the -hd option of the dbschema utility. For example: dbschema -d database_name -hd table_name
Optimizing Update Statistics 5-9
T
Sample Size Sample size is controlled by the DBA: n For the HIGH mode the sample size is equal to the number of rows in the table. n For the MEDIUM mode sample size is controlled by confidence level and RESOLUTION only (it is independent of table size.) n Increasing the sample size increases the overall accuracy of the data distribution.
10
Data distributions are generated when the server samples the data in a column, sorts the data, builds distribution bins and then inserts the sorted samples into the sisdistrib system catalog. If we have set the resolution to 200 bins (RESOLUTION 0.5, the default for HIGH mode), and if the table has one hundred million rows then UPDATE STATISTICS will use the following sample sizes for any one column.
Resolution
Confidence
MODE
Sample Size
0.5
.95
MEDIUM
74,064
0.5
.99
MEDIUM
106,276
0.5
1.00
HIGH
100,000,000
Note that with skewed data with a large domain (range of possible values) we can get a more accurate distribution of data by increasing the number of bins by another order of magnitude. With 2,000 bins (RESOLUTION 0.05) the sample sizes would be as follows:
5-10 Optimizing Update Statistics
Resolution
Confidence
MODE
Sample Size
0.05
.95
MEDIUM
7,406,375
0.05
.99
MEDIUM
10,627,600
0.05
1.00
HIGH
100,000,000
For a column with skewed data values our MEDIUM sample with 2,000 bins and a 99% confidence level is probably much more accurate than our HIGH sample discussed on the previous page with the default RESOLUTION of 0.5 (200 bins) and a 100% confidence. And note that 90% fewer values were required to be read, sorted and stored with our MEDIUM sample. The following table summarizes the sample size utilized by the server for six different resolutions and two different confidence levels:
Resolution
Confidence
Sample Size
2.5
.95
2,963
2.5
.99
4,273
1.0
.95
18,516
1.0
.99
26,569
0.5
.95
74,064
0.5
.99
106,276
.25
.95
296,255
.25
.99
425,104
.1
.95
1,851,593
.1
.99
2,656,900
.05
.95
7,406,375
.05
.99
10,627,600
Example One IBM Informix customer with a large installation found that by setting RESOLUTION to .05 (2000 bins) with a .98 confidence level they obtained significantly greater performance of their queries compared with a RESOLUTION of 1 (100 bins) with a .95 confidence level.
Optimizing Update Statistics 5-11
New Performance Enhancements Update statistics performance changes were introduced into versions 7.31.UD2 and 9.30.UC2: n throughput n memory utilization n information reported by the engine
Sort Buffer Sort Buffer Sort Buffer
12
A number of significant internal algorithm modifications to improve update statistics performance have been introduced into the above listed versions of IBM IDS.
5-12 Optimizing Update Statistics
Improved Throughput Reducing the number of table scans required to generate the data distributions for the sysdistrib catalog provided the largest performance improvement. Table Scan Column 1
Parallel Sort
Column 2
... Column N
Parallel Sort
Parallel Sort
Build & Insert 13
A single table scan is now able to produce data distributions for several columns in a table. The table scan parcels out the data extracted from each column in the table that is being processed to an independent invocation of the parallel sort package. The second throughput feature can be utilized by those customers who have fragmented indexes and run UPDATE STATISTICS without the MEDIUM or HIGH qualifier (i.e, LOW). IBM IDS 9.3 will scan each fragment in parallel and then sum up each set of statistics. The percentage of index fragments that will be scanned at one time is 10 times the PDQPRIORITY up to a maximum of 100%.
Example A PDQPRIORITY of 0 will perform a serial scan while a PDQPRIORITY of 5 will scan 50% of the index fragments at one time (i.e. 5 *10 = 50%). Any PDQPRIORITY of 10 or higher will cause all index fragments to be scanned at once. Finally, light scans are used to build the data distributions: the column data is scanned asynchronously into private buffers, and set-oriented reads allow blocks of rows to be processed at one time.
Optimizing Update Statistics 5-13
Tuning Memory Utilization n n
n
The default sort memory for update statistics has been increased from 4 MB to 15 MB. The environment variable DBUPSPACE has been expanded to include control over this default sort memory (up to 50 MB) when PDQPRIORITY is not set. More than 50 MB of sort memory can be allocated by using PDQPRIORITY. Sort Buffer Sort Buffer Sort Buffer
Sort Buffer Sort Buffer
Sort Buffer
Sort Buffer
Sort Buffer Sort Buffer
14
In order to improve the speed at which data distributions are built it is imperative to tune the availability of memory for sorting of data, especially on large tables. The default sort memory for update statistics was has been raised from 4MB to 15MB, and this may be increased up to 50 MB by use of the environment variable DBUPSPACE. Previously DBUPSPACE controlled only the maximum amount of disk space used by update statistics. Now DBUPSPACE will also control the default memory used by update statistics when PDQPRIORITY is not set. The format of the DBUPSPACE environment variable is {max disk space}:{default memory}. To increase the memory to 50 MB the user would set DBUPSPACE to the value of 0:50.To give update statistics more than 50MB of memory, set PDQPRIORITY, allowing update statistics to use the MGM memory that you have configured. It is recommend to always use the “FOR TABLE” clause when building distributions to avoid accidentally compiling your stored procedures with the PDQPRIORITY turned on1. The following statement will build data distributions for all tables in the entire database: UDPATE STATISTICS MEDIUM FOR TABLE;
1. If you run UPDATE STATISTICS with PDQPRIORIY set to 50 and build data distributions without specifying FOR TABLE, the stored procedures will execute with the compiled time PDQPRIORITY and not the PDQPRIORITY specified by the user who later executes the procedure.
5-14 Optimizing Update Statistics
Information Reported to DBA Sample output with set explain enabled: Table: dbs1.cust_tab Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data 101.4 MB Sort memory granted 15.0 MB Estimated number of table scans 5 PASS #1 c9, c8, c10, c5, c7 PASS #2 c6, c1 PASS #3 c3 PASS #4 c2 PASS #5 c4 ...
Ten column table, 215,000 rows, varying column sizes
15
Allowing the DBA to view what update statistics is doing is an important improvement that aids in performance tuning. When update statistics is executing and set explain is enabled, the scan plan and resource information will be placed into the sqexplain.out file. An example of this IBM IDS 9.3 output is shown in the above slide, illustrating how the new update statistics will perform on a table with 215,000 rows and varying column sizes. From the output we can see that UPDATE STATISTICS HIGH was executed on the table cust_tab where update statistics has 101MB of data to sort and 15MB of memory to use. Based on the different column sizes in the table update statistics will try and group columns together in order to maximize the sort memory for a single scan of the table. Update statistics determined that 5 scans of the table would be optimal for the 10 columns in the table. In the example above the first scan of the table processes columns named c9, c8, c10, c5, and c7, while the second scan of the table processes columns named c6 and c1. The last three columns are done individually as their size prevents them from being combined with other columns.
Important! If your table is very large, UPDATE STATISTICS with the HIGH mode can take a very long time to execute, and will require large amounts of storage space for the sysdistrib system catalog! Optimizing Update Statistics 5-15
Guidelines n n
n
Run UPDATE STATISTICS (LOW) when amounts of data or data distributions change from 10 to 25%. Create low RESOLUTION (.1 or .05) data distributions for columns involved in queries by using the MEDIUM or HIGH mode: w Columns which head an index and join columns. w Columns queried with equality filters (=). Factor in table size when choosing HIGH vs. MEDIUM mode for the above mentioned columns: w For relatively small tables HIGH mode may be used. w For larger tables (especially large to very large tables) use mode MEDIUM, confidence .98 or .99 16
Running UPDATE STATISTICS or UPDATE STATISTICS LOW will collect basic table statistics for the entire database, such as number of rows, number of pages to store the data, second largest and second smallest value for a column, etc. In addition, this command should be run at the table level any time that significant changes to that table have been made. Once UPDATE STATISTICS is run for a static table it does not need to be run again. For example, assume the following tables have been created for a DSS environment and then loaded with 500 million rows each: CREATE TABLE table_a (col1 INTEGER, col2 INTEGER, col3 CHAR(25)) IN dbspace1; CREATE TABLE table_b (col1 INTEGER, col2 INTEGER, col3 CHAR(25)) IN dbspace2;
The following command would create basic table statistics: UPDATE STATISTICS FOR TABLE;
5-16 Optimizing Update Statistics
If these two tables were subject to various ad-hoc queries that might join any column or combination of columns from these tables, then the following command would generate data distributions in addition to the basic table statistics mentioned above: UPDATE STATISTICS MEDIUM FOR TABLE table_a; UPDATE STATISTICS MEDIUM FOR TABLE table_b;
Note that this will require the server to sample, sort and store about 3,000 data points per column, and will keep this data in forty “buckets” per table (the default RESOLUTION). The default confidence level of 95% (which we get by not specifying a confidence level) should be sufficient for this purpose.
Now assume that these tables are part of an OLTP system and that performance critical queries are executed against columns 1 and 2 in table_a and column 1 in table_b, and that 100 million rows have been loaded into table_a and 50 thousand rows have been loaded into table_b: CREATE INDEX ix1 ON TABLE table_a(col1) IN idx1dbs; CREATE INDEX ix2 ON TABLE table_a(col2) IN idx2dbs; CREATE INDEX ix3 ON TABLE table_b(col1) IN idx3dbs;
In this case you might want to give the optimizer a much larger number of bins (1000) to increase the accuracy of the distribution, and perhaps also increase the confidence level to 99%. Taking into account the table sizes, you should choose MEDIUM mode for table_a. UPDATE STATISTICS MEDIUM FOR TABLE table_a(col1) RESOLUTION .1 .99 UPDATE STATISTICS MEDIUM FOR TABLE table_a(col2) RESOLUTION .1 .99 UPDATE STATISTICS HIGH FOR TABLE table_b(col1) RESOLUTION .1
Each of the first two commands would sample, sort and store 2.66 million data points from the 100 million rows in table_a, and the third command would sample all 50 thousand data points from the 50 thousand rows in table_b. Note that HIGH mode has a default resolution of .5, which creates 200 bins (100/ RESOLUTION). Increasing the number of bins from the default of 40 (MEDIUM mode) or 200 (HIGH mode) to 1000 or 2000 bins (RESOLUTION .1 or .05) is far more effective in creating an accurate data distribution for highly skewed data than increasing the confidence from 95% (MEDIUM default) to 100% (HIGH default).
Optimizing Update Statistics 5-17
Tuning Parameters for Updating Statistics n
Configuration parameters:
BUFFERS SHMVIRTSIZE RA_PAGES RA_THRESHOLD n
n
Distribute the available memory between buffer cache (BUFFERS) and the initial virtual portion of shared memory (SHMVIRTSIZE). Set to 32 (16 for 4K page systems) Set to 30 (15 for 4K page systems)
Environment variables: w Set PDQPRIORITY to 100; onmode: w use onmode -D 100 to maximize MAX_PDQPRIORITY w use onmode -M 90_percent_SHMVIRTSIZE to change DS_TOTAL_MEMORY 18
Distribute the available memory between the buffer cache and the initial virtual segment. Keep in mind that the tables are scanned and pages read into the buffer cache but the sort process is performed in the virtual segment. Configure a large amount of temporary dbspace designated by the DBSPACETEMP configuration parameter. You should have many temporary dbspaces of equal size, on separate devices if possible.
5-18 Optimizing Update Statistics
Exercises
Optimizing Update Statistics 5-19
Exercise 1 1.1 Select the values nrows, leaves, npused, and levels from the systables and sysindexes tables for the table onektup in the wisc_db database. You may use the following SQL: select nrows, npused, leaves, levels from systables, sysindexes where tabname = “onektup” and systables.tabid = sysindexes.tabid;
Note the values for each of the columns. nrows ________________________ npused _______________________ levels ________________________ leaves ________________________ 1.2 Execute the following query: set explain on; select * from onektup where unique1 < 20;
1.3 Review the optimizer’s access path choice. Did the optimizer choose a sequential or index path? QUERY: -----select * from onektup where unique1 < 20 Estimated Cost: 353 Estimated # of Rows Returned: 2667 Maximum Threads: 4 1) stu202.onektup: SEQUENTIAL SCAN (Parallel, fragments: ALL) Filters: stu202.onektup.unique1 < 20
5-20 Optimizing Update Statistics
1.4 Using the following command, update statistics for the onektup table: UPDATE STATISTICS HIGH FOR TABLE onektup;
1.5 Repeat the query in step two. Did the access path change? Why? QUERY: -----select * from onektup where unique1 < 20 Estimated Cost: 5 Estimated # of Rows Returned: 20 Maximum Threads: 1 1) stu202.onektup: INDEX PATH (1) Index Keys: unique1 (Parallel, fragments: ALL) Upper Index Filter: stu202.onektup.unique1 < 20
Optimizing Update Statistics 5-21
Exercise 2 2.1 Run UPDATE STATISTICS for the remaining two tables, tenktup1 and tenktup2. UPDATE STATISTICS MEDIUM FOR TABLE tenktup1; UPDATE STATISTICS MEDIUM FOR TABLE tenktup2;
2.2 Run dbschema -d wisc_db -hd onektup > onektup.out. 2.3 Review the output.
5-22 Optimizing Update Statistics
What Is the Impact On Performance? In Exercise 1, why are all the values zero or null?
UPDATE STATISTICS
23
In Exercise 1, why are the values of nrows, npages, npused, and levels all zero or null for a table that you know contains thousands of rows? Why did the access path for the same query change on subsequent executions?
Optimizing Update Statistics 5-23
5-24 Optimizing Update Statistics
Module 6 OLTP Performance Tuning
OLTP Performance Tuning 07-2002 © 2002 International Business Machines Corporation
6-1
Objectives At the end of this module, you will be able to: n Identify the performance tuning issues for OLTP environments n Understand how to perform checkpoint tuning n Tune systems to maximize transaction processing throughput n Monitor tuning adjustments
2
6-2 OLTP Performance Tuning
The OLTP vs. DSS Spectrum
OLTP n Many users n Reads and writes n Few rows read per transaction
DSS n Few users n Read only n Many rows read per query
3
Very few systems service purely on-line transaction processing (OLTP) or decision support (DSS) applications. On-line transaction processing applications usually have a batch component with DSS characteristics, and DSS systems must be refreshed requiring mass updates, inserts and deletes on a regular basis. However, to optimally tune IDS systems, you need to identify the predominant application characteristics, whether OLTP or DSS. For systems that run both OLTP and DSS applications at different times, you may use different ONCONFIG files to configure the system for each type of processing. For systems that run OLTP and DSS processes concurrently, you must try to balance the available resources for both of your processing requirements. This module discusses various tuning tips for optimizing OLTP type activities.
OLTP Performance Tuning 6-3
OLTP Performance Tuning To properly tune OLTP systems, you must address: n Read and write cache rates n LRU queue management n Checkpoint activity n Efficient use of physical and logical log buffers n Read-ahead usage
4
OLTP systems present several challenges. Because small but often overlapping sets of data are requested by very large numbers of users, you must provide a way to efficiently use and share the available memory resources among these users. Because most of the user requests will modify data, you must allow these modifications to be done very quickly and efficiently, while ensuring that all users have consistent and up to date copies of the data. Additionally, since unavailability of data often means lost revenue, the system must provide around the clock availability and be capable of performing most administrative tasks in an on-line mode. The IBM Informix Dynamic Server offers sophisticated features that allow you to accomplish each of these tasks and achieve record setting performance in OLTP environments.
6-4 OLTP Performance Tuning
Optimizing Read Caching The IBM IDS data buffer cache allows all users to share the same set of data, drastically reducing physical disk reads. n Tune for a read cache rate of 95% or better
IBM IDS Shared Memory data
index
5
In traditional database systems, when user A executes a SELECT statement to read a row from the database, row 100 for example, a copy of the page containing the row is brought into memory for use by user A. If 200 other users also request row 100, 200 additional copies of the page containing row 100 are brought into memory, one for each user, generating 200 additional disk reads. By using data caching, IBM IDS allows the page containing row 100 to be brought into the shared buffer pool for access by all users, reducing to 1 disk read what could have taken 200. For OLTP applications where there are many users reading small sets of data, the goal is to achieve a read cache rate of 95% or better. In other words, for every 100 reads, only five of them should require physical reads from disk. You can increase the read cache rate by increasing the number of data buffers available using the BUFFERS parameter. The buffer cache (or buffer pool) is located in the resident segment of the IBM IDS shared memory. Use onstat -p to monitor read caching percentages. The first row of the output from this command appears similar to that shown below:
dskreads pagereads
bufreads
%cached
dskwrits
pagwrits
bufwrits
%cached
450
1893
97.62
586
1217
8076
92.74
1083
OLTP Performance Tuning 6-5
Write Cache Rate Tune for a write cache rate of 85% or better
IBM IDS Shared Memory data
index
6
Write cache rate is the ratio of write requests serviced in memory to the write requests requiring a physical I/O. In OLTP environments, your target should be a write cache rate or percentage of 85% or better. You can tune write cache rates using the following configuration parameters: n
LRUS
n
LRU_MAX_DIRTY
n
LRU_MIN_DIRTY
n
CKPTINTVL
n
PHYSLOG
n
CLEANERS
Use onstat -p to monitor write caching percentages. The first row of the output from this command appears similar to that shown below:
dskreads
pagereads
bufreads
%cached
dskwrits
pagwrits
bufwrits
%cached
450
1083
1893
97.62
586
1217
8076
92.74
6-6 OLTP Performance Tuning
How Many Buffers?
Read Cache Percentage
The goal is to find the point on this curve where adding more BUFFERS only marginally improves performance
MAX
MIN 0.2K
100K
200K
300K
400K
500K
600K
Number of BUFFERS 7
It is important to right size the buffer cache. The goal is to find the point on this curve where adding more buffers doesn't offer any more significant performance, yet removing buffers will hurt performance. Having too few buffers will cause read cache rates to remain low and so will read performance. Having too many buffers will waste memory which may be used else where. Also, grossly overstated buffers will cause checkpoint problems with the LRU_MIN_DIRTY and LRU_MAX_DIRTY percentages.
OLTP Performance Tuning 6-7
LRU Queues (LRUs) Buffers 1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11
12
5
4
8
3
1
6
9
5
4
8
3
1
6
9
5
4
8
3
1
6
7
9
5
4
8
3
1
7
9
6
7
7
8
The Least Recently Used queues (LRU queues) manage access to the buffers in the resident segment buffer pool. If the ONCONFIG file has LRUS set to 4, then at initialization IBM IDS will create 4 LRU queue pairs. Each pair contains a free (clean) queue and a modified (dirty) queue. The queues are actually linked lists containing the addresses of the buffers in the buffer pool. A buffer page that has not been used, or that has been used but has not been modified, will be assigned to the free queue. At initialization, all the buffers are assigned to free queues. As buffer pages are modified, the address of the modified buffer is moved to the modified queue. As a good rule of thumb, configure one LRU queue pair per 750 to 1000 buffers. Remember that a CPUVP will place a mutex on the LRU queue being accessed, and having too few queues increases the possibility of contention.
6-8 OLTP Performance Tuning
Managing LRU Queues 0f 1m
1600 unmodified buffers 400 modified buffers
0f 1m
1920 unmodified buffers 80 modified buffers buffer cache
cleaner thread kaio queue
kaio thread 9
The configuration parameters, LRU_MAX_DIRTY and LRU_MIN_DIRTY, allow you to specify when modified buffers should be cleaned (written to disk) and returned to the free queue. LRU_MAX_DIRTY specifies the percentage of buffers listed in a queue pair at which point the page cleaner threads should start cleaning the modified queue. If your IBM IDS system has 200000 buffers and LRUs is 100, each LRU queue pair will be responsible for approximately 2000 buffers. If LRU_MAX_DIRTY is 20, then when any queue pair has 400 buffers in its modified queue, the page cleaner threads, called flush_sub(n), will be alerted to begin placing write requests on the I/O queues. After the page cleaner threads place the write requests in the I/ O queues, the kaio or aio threads are responsible for writing the data to disk. Note that only the queue that has reached the LRU_MAX_DIRTY value is “cleaned”. LRU_MIN_DIRTY specifies the percentage of buffers listed in a queue pair at which point the page cleaner threads should stop cleaning the modified queue. During normal page cleaning operations, you may not want to write the most recently modified pages to disk because it is very likely that additional changes (updates, inserts, deletes) are still being made to those pages. Writing the page to disk prematurely will reduce write cache rates and induce more physical writes. If for example, there are 2000 buffers per queue pair and LRU_MIN_DIRTY is set to 4, IBM IDS will stop flushing modified pages when the modified queue has only 80 buffer addresses, or 4% of the buffers assigned to the queue pair. OLTP Performance Tuning 6-9
Monitoring LRU Queues onstat -R #f/m 0 f
pair
total 2500
1 m 2 f
2500
% of
length
LOW
MED_LOW
MED_HIGH
HIGH
68.0%
1700
100
1400
200
0
32.0%
800
0
800
0
0
64.0%
1600
100
1500
0
0
36%
900
0
900
0
0
68.0%
1700
100
1600
0
0
32.0%
800
0
800
0
0
68.0%
1700
100
1500
100
0
32.0%
800
0
800
0
0
3 m 4 f
2500
5 m 6 f
2500
7 m
10
Use onstat -R to monitor queue lengths. Note that the queues are arranged in pairs of free (f) and modified (m) queues, with their numbering starting at zero. The total column shows how many buffers (actually, pointers to buffers) have been allocated to each queue pair. The “% of” column is the most important of the columns. It shows the distribution in percentages of the buffers allocated to the queue pair between free or unmodified and dirty or modified.
LRU 0
5
4
8
3
1
LRU 1
7
9
6
LRU 2
2
18 21 91 82
LRU 3
53 14
free modified free modified
At the end of the onstat -R output is a summary section. An example is shown below: 6400 dirty, 20000 queued, 20000 total, 256 hash buckets, 2048 buffer size start clean at 60% (of pair total) dirty, or 1500 buffs dirty, stop at 50% 0 priority downgrades, 0 priority upgrades
6-10 OLTP Performance Tuning
Monitoring LRU Writes Usually limited to determination that there are too many foreground writes: use onstat -F to monitor onstat -F
Fg Writes
LRU Writes
Chunk Writes
11
318
201
Client 11
Use onstat -F to monitor write request activity by the page cleaner thread. Write requests generated by the LRU_MAX_DIRTY threshold will be incremented in the column labeled LRU Writes. Note that LRU writes are done one page at a time. A foreground write (under the column FG Writes) occurs when an sqlexec thread searches through the LRU queues on behalf of a user but cannot locate an empty or unmodified buffer. To make space, the sqlexec thread flushes pages, one at a time, to disk. If not minimized, this can have a negative impact on system performance. Chunk Writes are described in the slide “Full (Sync) Checkpoints” later in this module.
OLTP Performance Tuning 6-11
Checkpoints Cleaner Threads
kaio thread
I/O Request Queue
12
During checkpoints, page cleaners behave somewhat differently. First, a page cleaner thread, flush_sub(n), is assigned to each chunk in the system. The page cleaner thread reads through the buffer header table (located in the resident segment of IBM IDS memory) and locates all modified pages for its chunk. Then, a page cleaner places a write request for each modified buffer on the I/O queue, sorted by disk page number. This sorting eliminates unnecessary disk seek time by writing pages to disk in physical order.
6-12 OLTP Performance Tuning
Fuzzy Checkpoints The frequency at which fuzzy checkpoints occur is determined by: n the CKPTINTVL configuration parameter. n the size of the physical log.
physical log
CKPTINVTL 300
75% full
13
During a fuzzy checkpoint, those pages that were modified by insert, update and delete operations on standard built-in data types are not copied to disk, whereas all other modified pages are copied to disk. Fuzzy checkpoints will occur when: n
the number of seconds specified by the CKPTINTVL configuration parameter has elapsed, and one or more modifications have occurred since the last checkpoint
n
the physical log becomes 75% full
n
the next logical-log file to become current contains the most recent checkpoint record
n
the DBA enters the command onmode -c fuzzy
n
a chunk or dbspace is added
Because the physical log fills up as pages are updated, allowing the physical log to initiate checkpoints guarantees that enough work has occurred to justify a checkpoint. Additionally, sizing the physical log according to your checkpoint requirements ensures that you are utilizing disk efficiently. You can determine what is triggering checkpoints by monitoring the checkpoint interval using onstat -m and using onstat -l to monitor the fullness of the physical log. You can disable fuzzy checkpoints so that only full checkpoints are performed by the database server by setting the ONCONFIG parameter NOFUZZYCKPT to 1.
OLTP Performance Tuning 6-13
Full (Sync) Checkpoints
buffer cache
cleaner thread kaio queue chunk1
kaio thread
onstat -F
Fg Writes
LRU Writes
Chunk Writes
11
318
201
14
The database server performs a full checkpoint when: n
the server is shut down with onmode -ky
n
the checkpoint is manually initiated with onmode -c
n
a backup or restore is done with ontape or ON-Bar
n
at the end of fast recovery or full recovery
Full checkpoints have one major performance drawback. During a checkpoint, certain update activities are blocked. The blocking during checkpoints may be perceived by users as a system lock-up. In OLTP systems, you may want to minimize the blocking effect of checkpoints even if the result is less efficient writing. You can minimize the blocking by minimizing the duration of checkpoints. Page cleaner write requests during checkpoints are called chunk writes and can be monitored using onstat -F. Because of the sorting, and because chunk writes use big buffers to write (eight pages per I/O instead of one), chunk writes are much more efficient than LRU writes.
6-14 OLTP Performance Tuning
Checkpoint Duration The duration of a checkpoint is affected by: n the number of modified pages that must be written to disk n the number of page cleaners configured n the CPU resources available for processing kaio or aio write requests n the speed and number of disk drives available $ onstat -m 10:53:54 Checkpoint Completed: duration was 2 seconds 10:53:52 Checkpoint loguniq 1, logpos 0xc0
15
Checkpoint duration is a function of several factors, as shown in the above slide. Each of these factors and the methods for monitoring them will be discussed on the next few pages. Tips for minimizing the duration of checkpoints in order to minimize their impact on users will also be introduced.
OLTP Performance Tuning 6-15
Disk and Controllers n
n
Monitor disk performance using: w sar w iostat If disks are100% busy or the controller is saturated: w reorganize the database to distribute I/O more evenly w install additional disks and/or controllers
16
Use Unix commands such as sar and iostat to monitor disk capacity and performance. If disks are 100% busy, you may try to resolve the bottleneck by distributing the I/O more evenly by reorganizing the data in the database, or you may increase the number of disks and controllers on the system.
6-16 OLTP Performance Tuning
Balancing I/O Across Disks $ onstat -g iof Informix Dynamic Server Version x.xx.xxx -- On-Line -AIO global files: gfd
pathname
totalops
dskread
dskwrite
io/s
3
/dev/rdsk/dev201
85
43
42
0.0
4
/dev/rdsk/dev109
14
3
11
0.0
5
/dev/rdsk/dev101
556
369
187
0.1
6
/dev/rdsk/dev107
1072
595
477
0.2
7
/dev/rdsk/dev109
2
2
0
0.0
system read calls
system write calls
17
If the number of read and write calls for particular chunks are significantly higher than the others, try to distribute the I/O across multiple chunks on multiple disks. If the chunk contains multiple tables, distribute the tables. If the chunk contains a single table, the table may be a candidate for fragmentation. You can also monitor reads and write to open tables using onstat -g ppf.
Note Make sure the ONCONFIG parameter TBLSPACE_STATS is set to 1 for the onstat -g ppf command to work.
Once physical I/O has been distributed evenly across the available disks, you can monitor and tune IBM IDS I/O activity.
OLTP Performance Tuning 6-17
Monitor IBM IDS I/O Activity onstat -g ioq q name/id kio 0 adt 0 msc 0 aio 0 pio 0 lio 0 gfd 3 gfd 4
len
maxlen
totalops
dskread dskwrite
dskcopy
0
16
23753730
23720733
32997
0
0
0
0
0
0
0
0
1
21
0
0
0
0
1
387
348
2
0
0
0
0
0
0
0
0
1
1
0
1
0
0
6
102
96
6
0
0
1
3
3
0
0
18
Monitor IBM IDS I/O activity using onstat -g ioq. The output lists one queue per kaio thread, one queue per aio VP, and one queue per chunk. When using IBM IDS AIO instead of kernel AIO, a long maximum queue length (> 25) or a consistent queue length (> 10) for the gfd(n) queues indicates that I/O requests are not being serviced fast enough. The maximum queue length is shown in the onstat column maxlen. A length of 32 for a kaio queue is not as meaningful because IBM IDS frequently places big buffer requests (32 pages) on the queue. If you have adequate disk and CPU resources, adding additional AIOVPs may allow requests to be serviced more quickly and reduce the length of the queues. The aio and gfd queues are used for writes to cooked chunks and writes to raw chunks when kernel aio is not supported for the platform or is turned off using the KAIOOFF environment variable. If the AIO queues are keeping up with requests and the disks are not saturated, but performance is less than desirable, the problem may be that I/O requests are not being delivered to the I/O queues quickly enough. The page cleaner threads are responsible for placing requests on the queues. If onstat -F indicates that the cleaners are all active, you may want to configure more cleaners. Note that an additional benefit of kernel asynchronous IO is that there are fewer oninit processes and therefore less CPU and memory resource contention (fewer context switches, etc.) 6-18 OLTP Performance Tuning
Tuning LRU Queues Finally, to minimize checkpoint duration, you must minimize the writes that are performed by checkpoints using: n LRU_MAX_DIRTY n LRU_MIN_DIRTY 0f
1600 unmodified buffers
1m
400 modified buffers
0f 1m
1920 unmodified buffers 80 modified buffers 19
Once you have addressed disk and CPU capacity and the Dynamic Server’s physical I/O layer, you can attempt to minimize checkpoint duration by performing more writes between checkpoints. You can do this by lowering the settings of LRU_MAX_DIRTY and LRU_MIN_DIRTY.
Example If BUFFERS is 30000, LRU_MAX_DIRTY is 60, and LRU_MIN_DIRTY is 50, a checkpoint may force as many as 18000 pages to be written to disk. By lowering LRU_MAX_DIRTY and LRU_MIN_DIRTY to 10 and 5, you reduce the checkpoint workload to between 1500 and 3000 page writes.
OLTP Performance Tuning 6-19
AIO vs. Kernel Asynchronous I/O if NOT found
gfd request queue CPU VP
if found
buffer cache
kaio queue
sleep queue
AIO VP
CPU VP
ready queue sqlexec threads
AIO thread
kaio thread Kernel
cooked chunk
raw chunk 20
Informix supports two methods of I/O, IBM Informix Asynchronous I/O (AIO), and kernel asynchronous I/O (KAIO).
Asynchronous I/O When an sqlexec thread running on a CPU receives an SQL request, it will parse and optimize the request, then check to see if the data requested is available in the shared memory buffer pool. If the data is already in memory, the sqlexec thread returns it to you, completing the request. If the data is not in memory, the sqlexec thread must place an I/O request on the AIO request queue for the appropriate chunk. There will be one request queue for every chunk; use onstat -g ioq to view the request queues. Next, the sqlexec thread will put itself to sleep on the CPUVP sleep queue. The CPUVP may now continue work on other ready requests. An AIOVP will pick up the request from the AIO request queue, service the I/O request, and bring the data into memory. Once the data has been read into memory, the AIOVP will signal the sqlexec thread that submitted the request to be moved to the CPU ready queue. The sqlexec thread can now be picked up by the next available CPUVP to complete processing. AIO is always used to write to the message log and to write to cooked chunks.
6-20 OLTP Performance Tuning
Kernel Asynchronous I/O Kernel AIO is coordinated by the Unix system kernel. All requests are passed to the kernel by the kio thread. If your system supports kernel aio, onstat -g ath will show one kio thread per CPUVP. There will also be one kio request queue per thread, as shown by onstat -g ioq. When a user session submits a request for data that is not in the buffer cache, the request is placed on the kio request queue. Next the kio thread, which like the sqlexec thread runs on the CPUVP, will send the request to the operating system. The kaio thread will then yield control of the CPUVP, but will continue to poll the kernel to determine when the operation is complete. When the read is complete, the kio thread will place the data pages in the buffer cache and wake up the sqlexec thread.
Environment Variable ENV
KAIOOFF On platforms that support kernel AIO, CPUVPS and kaio threads will handle all reads and write to raw chunks. To disable kernel aio, set this environment variable prior to starting the server.
Note Kernel AIO is only available on some IBM IDS platforms. To find out if it is available on your platform, consult your product release notes.
OLTP Performance Tuning 6-21
Monitoring Disk & Buffer Activity Use onstat -g ppf to monitor both disk and buffer activity n columns isrd and iswrt for isam reads and writes n bfrd and bfwrt for buffer reads and writes onstat -g ppf partnum 0x100001 0x100002 0x100003 0x100004 0x100005 0x100006
lkrqs 0 31 807 43 7 5
... isrd 0 15 350 20 3 2
iswrt 0 1 0 0 0 0
isrwt 0 0 0 0 0 0
isdel 0 0 0 0 0 0
bfrd 0 38 924 49 135 6
bfwrt 0 3 41 1 59 0
22
You can monitor system input/output by unsing the onstat -g ppf command. The output above shows system reads and writes within two of the database server software layers. Note that as each layer presents a simplified interface to the layer directly above it, there is increasing activity as you descend to lower software levels in the system hierarchy.
DELETE FROM customer WHERE city = “Palo Alto”;
SQL Layer
ISAM Layer
chunk data
6-22 OLTP Performance Tuning
Configuring AIOVPs
kernel I/O? no
yes
all cooked chunks? yes
no
2 AIOVPs
+
1 per active chunk
23
Remember, the goal in configuring AIOVPs is to ensure that the I/O request queues are consistently short (len < 10), which indicates that I/O requests are being processed as fast as they occur. You can monitor these values with onstat -g ioq.
Tip Monitor the length of the aio queues to ensure that adequate AIOVPs are configured.
Generally, it is not detrimental to allocate too many AIOVPs because AIOVPs place very little demand on the CPU resources. In rare cases, where the ratio of disks to CPUs is very high, you may want to reduce the number of AIOVPs from the recommendations given here. You can correlate the line in “onstat -g ioq” with the actual chunk because the chunks are in the same order as in the “onstat -d” output.
OLTP Performance Tuning 6-23
Tuning the Log Buffers
Logical Log Buffers
Physical Log Buffers
Logical Log File 1 Logical Log File 2 Physical Log File
Logical Log File 3 Logical Log File 4
24
The logical and physical log buffers improve performance by reducing the number of physical writes to disk required for transaction logging and the recording of before images of pages that provide fault tolerance. Your goal is to buffer as many writes as possible, without wasting memory on unnecessarily large buffers. To determine how well you have sized these log buffers, you can use onstat -l. The first part of the onstat -l command displays information about the physical log buffers: Physical Logging Buffer bufused P-1 0 phybegin 100107
bufsize 16 physize 1000
numpages 265 phypos 266
numwrits 21 phyused 0
pages/io 12.62 #used 0.00
The second part of the onstat -l command displays information about the logical log buffers: Logical Logging Buffer bufused bufsize numrecs numpages numwrits recs/pages pages/io L-2 0 16 6766 394 28 17.2 14.1 ...
6-24 OLTP Performance Tuning
A physical log buffer is flushed when it is full, before a page it contains is written to disk by an LRU write, and at each checkpoint. The logical log buffer will be flushed when a buffer fills (for databases with buffered logging) or when a commit or rollback record is written (for mode ANSI databases or databases with unbuffered logging.) Your goal is to ensure that your physical and logical log buffers are about 75% full when they are flushed. In other words, the average pages per I/O should be about 75% of the buffer size.
Configuration
If the ratio of page per I/O is greater than 75% of the buffer size, you may be able to increase the buffer size and reduce physical I/O. If the ratio is less than 75%, you can reduce the size of the buffers and free up unused memory.
OLTP Performance Tuning 6-25
Read-Ahead RA_PAGES = 32 RA_THREASHOLD = 28
n n
2)
Next set of 32 pages read into pool when processing reaches here
buffer cache 1
1)
First group of 32 pages read into buffer pool up to here
3)
Second set of 32 pages read into pool triggered by RA_THRESHOLD read into pool starting here
12 28
32
1
24 4 16 28
32
26
The read-ahead parameters affect users that are performing sequential reads of data. For example, an index build or sequential scan by a SELECT statement will cause a read-ahead to occur. The RA_PAGES configuration parameter specifies how many pages will be read ahead if IBM IDS detects a sequential read. The RA_THRESHOLD parameter specifies when the next read ahead will occur. For example, if RA_PAGES is configured at 128 and RA_THRESHOLD is 100, the next read ahead will be performed when 28 pages from the previous read ahead have been processed. RA_PAGES and RA_THRESHOLD also determine the number and size of the light scan buffers allocated in virtual memory, as described earlier in this manual. For OLTP systems that perform light scans, do not set RA_PAGES higher than 32. (Light scans are generally for DSS type large reads.) The values of RA_PAGES and RA_THRESHOLD are typically configured to be fairly close. For example, you might set RA_PAGES to 32 and RA_THRESHOLD to 28, but if bufwaits (onstat -p) increase, reduce the value of RA_THRESHOLD to allow more pages to be processed before performing the next read.
6-26 OLTP Performance Tuning
Key-Only Read-Aheads Read-aheads with key-only searches and sequential scans that use an index work differently than read-aheads with sequential scans. Read-ahead begins when the database server starts at the page above the leaf node, and reads all the leaf nodes that page points to. The read-ahead will be completed when one of the following occurs: n
The database server reaches the end of the key range for the query.
n
The database server reads ahead all of the leaf pages pointed to by the node above the leaf node.
n
The limit of the number of pages to be read ahead is reached. This limit is defined as the maximum of one of the following calculations: w
(# buffers / #sessions)
w
75% of the total number of buffers
If both calculations yield a number less than what RA_PAGES is set to, then RA_PAGES will be used as the number of pages to read-ahead.
Read-Aheads with Scans Using an Index Read-aheads with sequential scans that use an index have to read ahead index pages and data pages. Read-ahead begins when the database server starts at the leaf page and reads all the data pages the leaf page points to. The read-ahead will be completed when one of the following occurs: n
The database server reaches the end of the key range for the query.
n
The database server has read all the data pages pointed to by the current leaf page.
n
The limit of the number of pages to be read ahead is reached. This limit is defined as the maximum of one of the following calculations: w
(# buffers / #sessions)
w
75% of the total number of buffers
If both calculations yield a number less than what RA_PAGES is set to, then RA_PAGES will be used as the number of pages to read-ahead. To turn off read-ahead, set RA_PAGES to 0.
OLTP Performance Tuning 6-27
Monitoring Read-Aheads, onstat -p Lower read cache can mean read-aheads are too high. Profile dskreads
pagreads bufreads
%cached
dskwrits
pagwrits
bufwrits
%cached
1867691
1911069
83167604
97.75
326759
468993
5194276
93.71
....
....
....
....
....
....
....
....
bufwaits
lokwaits lockreqs
deadlks
dltouts
ckpwaits
compress
seqscans
89640
0
181975387
0
0
167
78715
97451
ixda-RA
idx-RA
da-RA
RA-pgsused lchreqs
461764
906
1065206
1508234
Monitor bufwaits for any unusual increase.
lchwaits
Totally effective read-aheads mean RA-pgsused is close to the sum of (ixda-RA + idx-RA + da-RA) 28
There are several ways to monitor the effectiveness of the read-ahead configuration parameters: n
If read-ahead parameters are set too high, you may see a decrease in the read cache rate. Too many pages read from disk for one user could cause pages needed by other users to be flushed.
n
If the read-ahead operation is still being performed by the AIO subsystem when the user needs a specific page, the bufwaits field onstat -p is incremented. If you see the bufwaits field unusually high, the RA_PAGES parameter may be too high, or the difference between RA_PAGES and RA_THRESHOLD may be too small.
n
Totally effective read-aheads mean that all pages read ahead from disk are used by the session. If this is true, then RA_pgused (pages used by sessions) would be equal to the sum of the ixda-RA, idx-RA, and da-RA fields. In reality, you may see RA_pgsused lower than that.
6-28 OLTP Performance Tuning
Exercises
OLTP Performance Tuning 6-29
Exercise 1 This exercise demonstrates techniques for LRU, checkpoint, and physical log tuning. You will run a program to simulate an OLTP environment, and you will monitor the IBM IDS instance for 10 minutes. The tuning goals are: n
Read cache percentage of 98% or greater
n
Checkpoint interval of 5 minutes
n
Checkpoint duration of 0 seconds (less than one second) 1.1 Restart your IBM IDS instance, with the following configuration: PHYSFILE VPCLASS
2400 cpu,num=1
BUFFERS NUMAIOVPS CLEANERS
1500 2 1
CKPTINVL LRUS LRU_MAX_DIRTY
300 8 60
LRU_MIN_DIRTY
50
1.2 Run the following SQL command to grant database access for the wisc_db database to all student ids: GRANT DBA TO PUBLIC
1.3 Create a script to run the exercise, named something like “run_it.sh”: #!/bin/ksh start_all sleep 600 kill_all
1.4 Be sure to run your script as your student_id, not as user informix. 1.5 Review the following statistics to see if the end user requirements have been met: Read cache percentage Physical log % used at checkpoint time Checkpoint interval Checkpoint duration Simulation start time Number of loops executed
6-30 OLTP Performance Tuning
Tip The relationship between perceived user performance and overall application efficiency can be seen by comparing the number of loops per second for each run. 1.6 Modify one of the parameters above and restart the server. 1.7 Repeat steps two through six until all end user requirements are met.
OLTP Performance Tuning 6-31
6-32 OLTP Performance Tuning
Module 7 Fragmentation Strategies
Fragmentation Strategies 07-2002 © 2002 International Business Machines Corporation
7-1
Objectives At the end of this module, you will be able to: n Efficiently plan expression based fragmentation n Use fragmentation to improve load performance n Use fragmentation to improve data availability n Monitor fragmentation utilization and effectiveness n Optimize fragmentation for OLTP environments n Optimize fragmentation for DSS environments
2
7-2 Fragmentation Strategies
What is Fragmentation? Fragmentation is the distribution of data from one table across multiple dbspaces
3
IBM IDS supports intelligent horizontal table and index partitioning. The IBM Informix term for this is table and index fragmentation. Fragmentation allows you to create a table that is treated as a single table in SQL statements, but is actually created as multiple tablespaces on multiple dbspaces.
Fragmentation Strategies 7-3
Advantages of Fragmentation Partition (fragment) tables to improve: n DSS performance n OLTP performance n Data availability n Load performance n Granularity of backup and recovery
4
The ability to partition different tables in different ways across multiple dbspaces allows potentially dramatic increases in total throughput and/or greatly improved response times to many types of queries.
dbspace1
dbspace2
dbspace3
Database1
Database2
Table 1
Table3a
Table3b
Table 2a
Table 2b
Table 2c
7-4 Fragmentation Strategies
Basic Fragmentation Issues n
n n
Exploiting performance benefits of fragmentation requires investment in fixed storage (the more drives, the better) w Plan on placing each fragment in a separate dbspace, and each dbspace preferably on a different device (disk) Within the same database, different tables may require different fragmentation strategies (next slide) Storage requirements for fragmented tables w Size extents appropriately for the fragment size, not the table size w Fragmented indexes generally require eight bytes for the pointer (fragment ID and ROWID) plus one byte for the delete flag, for each row 5
Some additional details: n
Remember that each fragment is its own tablespace. Fragmentation allows you to create one logical table (for the purposes of SQL ) which really consists of multiple physical tables sharing an identical schema. Each fragment (physical table) has its own tablespace and its own tablespace (fragment) id.
n
Size extents appropriately for the fragment size, not the table size. When sizing table extents, remember that index pages for fragmented tables are never stored in the same extent as the data pages. For attached indexes (indexes created without the IN DBSPACE or FRAGMENT BY clauses), the index pages will reside in index extents within the same dbspace as the rows. For detached indexes, index pages will be created in index extents in the dbspace specified. The index extent size is calculated internally based on the key size and table size.
n
Fragmented indexes require (usually) eight bytes per key value for storage of the fragment id and rowid combination. When allocating dbspace for indexes, eight bytes for each key value is required for the storage of the rowid and fragment id combination, plus one byte for the delete flag.
Fragmentation Strategies 7-5
Distribution Schemes n
n
Round Robin Fragmentation w This type of fragmentation places rows one after another in fragments, rotating through the series of fragments to distribute the rows evenly Expression-based Fragmentation w This type of fragmentation puts rows that contain specified values in the same fragment - you specify a fragmentation expression that defines criteria for assigning a set of rows to each fragment w Most appropriate for tables in which only a portion of the rows are read
6
One example of a useful application of round robin fragmentation is for smart large objects: you can specify multiple sbspaces in the PUTclause of the CREATE TABLE or ALTER TABLE statement to distribute smart large objects in a round-robin distribution scheme so that the number of smart large objects in each space is approximately equal. n
For INSERT statements, the database server uses a hash function on a random number to determine the fragment in which to place the row.
n
For INSERT cursors, the database server places the first row in a random fragment, the second in the next fragment sequentially, and so on. If one of the fragments is full, it is skipped.
For expression-based fragmentation, you can specify a remainder fragment that holds all rows that do not match the criteria for any other fragment. However, a remainder fragment reduces the efficiency of the expression-based distribution scheme.
7-6 Fragmentation Strategies
Round Robin Fragmentation Guidelines n
Tables in which all the rows are consistently read, fragment by round robin. w balance I/O across all fragments (reads and writes) w enable parallel scans
7
Ideally, table fragmentation will provide a linear speed-up of all table scans, but this speed-up is dependent on three hardware factors: n
The number of disks available
n
The number and speed of the CPUs
n
The bus bandwidth
To fully exploit Informix’s fragmentation capabilities, you must first determine the capabilities of your hardware and use this information in conjunction with the requirements of your data.
Fragmentation Strategies 7-7
Expression Fragmentation Guidelines n
n
n
n
Place fragments from tables that are consistently joined together on separate devices Keep fragmentation expressions simple - arrange them from highest to lowest selectivity Place the most restrictive part first to reduce the number of expression evaluations Create non-overlapping, contiguous expressions to achieve fragment elimination
customer2 customer1
dbspace1 dbspace1
orders2 orders1
dbspace8 dbspace8 8
Fragments for a given table should be placed on different devices as well as fragments from tables that are frequently joined together. By examining the WHERE clause of critical queries you can determine the correct columns to use in the expression and which table fragments should be separated on different devices. Arrange the expression from highest to lowest selectivity and place the most restrictive part first to reduce the number of expressions that must be evaluated in most cases. Instead of this: x >= 1 and x x > 10 and x x > 20 and x remainder in
<= 10 in dbs1, <= 20 in dbs2, <= 30 in dbs3, dbs4
Do this: x <= 10 and x >= 1 in dbs1, x <= 20 and x > 10 in dbs2, x <= 30 and x > 20 in dbs3, remainder in dbs4
Also, try to avoid any expression that must perform a data type conversion. For example, a DATE data type must be converted to an integer internally.
7-8 Fragmentation Strategies
Fragmentation and DSS DSS Environment Characteristics: n Low volume of long running queries n Queries commonly perform multiple table joins n Queries access most of the rows in each table n Very few indexes are generally required n Preferred access method is sequential scan n Preferred join method is the hash join The goal is to scan in parallel while eliminating data which isn’t necessary to read as well as reducing contention by not placing fragments accessed at the same time on the same device.
9
When using fragmentation in a DSS environment with Parallel Data Query (PDQ), the IBM IDS server can read the required fragments in parallel. A scan thread is started for each fragment that is read. In addition, any fragments containing data that is not required to satisfy the query are eliminated from the query plan. By examining the SET EXPLAIN output you can determine the access method and identify which fragments have been eliminated from the query plan. Fragment ID numbers begin at 0. When using fragmentation with DSS, be sure to conifugre the use of light scans. The overall goal is to use the proper number of fragments per table to reduce contention by not placing fragments that are frequently accessed together on the same device. Initially, this may take a bit of guessing. So it is very important to monitor the I/O and make adjustments accordingly. To aid in monitoring these tables, isolate the fragments in their own dbspaces. Use the onstat -D output to see the location and volume of I/O. You want to see balanced I/O across all fragments in the table. Use sar -d or a similar utility to monitor I/O activity at the system level. Look for very busy drives with low transfer rates. This may indicate that multiple queries are accessing the drives causing an increase in seek time and reducing the data transfer rate. Investigate whether tables accessed at the same time are located on the same drives and separate them.
Fragmentation Strategies 7-9
DSS Fragmentation Matrix Use the DSS fragmentation matrix to determine when to fragment by round robin and when to fragment by expression Table 1 - # rows Query 1
Table 2 - # rows
Table 3 - # rows
% of rows accessed
Query 2 Query 3 Query 4
10
When building a fragmentation scheme for our DSS applications, you must consider how each table will be accessed. n
Fragment by round robin when queries consistently read all or most of the rows in the table.
n
Fragment by expression when not all rows are read.
n
When fragmenting by expression, keep expressions simple and list conditions from most to least restrictive.
n
When fragmenting by expression, use data distribution information to distribute I/O and eliminate hot spots.
7-10 Fragmentation Strategies
DSS Queries n
n
Query 2 SELECT customer_num, order_num, item_num FROM customer, orders, items WHERE customer_num = orders.customer_num AND orders.order_num = items.order_num AND zipcode between “94025” and “94086”; Query 4 SELECT quantity, total_price, manu_name FROM customer, orders, items, manufact WHERE customer_status = “preferred” AND customer.customer_num = orders.customer_num AND orders.order_num = items.order_num AND items.manu_code = manufact.manu_code; 11
From the previous slide, let us assume that queries 2 and 4 are as shown in the slide above. In query 4, assume that customer_status is a 10-byte character column with a very small domain. The only legal values for customer_status are “preferred”, “delinquent” and NULL.
Fragmentation Strategies 7-11
DSS Fragmentation Example Consider the following example: Customer - 1M
Orders - 4M
Query 1
100
100
Query 2
20
60
Query 3 Query 4
40
70
Items - 6M
Manufact -tiny
70 100
100
80
100
(table entries are in percentages)
12
Queries 2 and 4 indicate that the customer table could be fragmented by expression. To determine the expression conditions, look at the WHERE clauses for the queries. The WHERE clauses must use filter conditions to limit the number of rows returned. The filter columns are the columns that you will use in your fragmentation expressions.
7-12 Fragmentation Strategies
DSS Fragmentation Example, Continued Given this example, you might choose to fragment your table by zipcode only: FRAGMENT BY EXPRESSION zipcode <= “33333” AND zipcode > “00000” IN dbs1, zipcode <= “74005” AND zipcode > “33333” IN dbs2, zipcode <= “99999” AND zipcode > “74005” IN dbs3;
13
Because query 2 filters on zipcode and query 4 filters on customer_status, you could build a fragmentation expression on both zipcode and customer_status; but because customer_status has a very small domain, the additional overhead of evaluating a more complex expression may offset any benefits gained. Instead, choose a simple expression based on the zipcode value only.
Fragmentation Strategies 7-13
Fragmentation and OLTP OLTP Environment Characteristics: n High volume of short transactions n Each transaction accesses a few rows n Index access method is used For this environment: n n
Fragment the data by round robin or expression. For large tables that receive a large percentage of transaction activity fragment the indexes using an expression based fragmentation strategy.
14
OLTP environments are characterized by a large number of users performing a large number of short transactions. The key to data layout is to distribute the data and indexes to eliminate I/O bottlenecks an maximize throughput. Fragment the data by round robin or expression. For indexes, you will want to create detached indexes or use an expression based fragmentation scheme. Create expression based fragmented indexes for large tables that receive a large percentage of activity. Fragmenting the indexes of large tables can reduce query time by shortening the number of index pages that must be searched. Limit the number of index fragments to reduce the overhead of expression evaluation. Indexes fragmented by round robin are not supported by the CREATE INDEX statement and do not make sense. An expression based fragmented index is created in the dbspaces specified in the FRAGMENT BY EXPRESSION clause.
7-14 Fragmentation Strategies
OLTP Fragmentation Matrix Use this OLTP fragmentation matrix to determine how to distribute activity and separate join columns: Access Type Table1 Query1
S
Query2
S
Query3 n n n
Table2
Table3 I
S SU
S - Select U - Update I - Insert 15
In planning fragmentation for your OLTP and batch applications, you can build a fragmentation matrix similar to the one above. The matrix will identify the type of activity that will be performed on each table by each query. Using the visual representation of transaction activity, you can easily distribute the work across your available devices. Additionally, you can identify tables that will be joined and place them on different devices for optimum performance.
Fragmentation Strategies 7-15
OLTP Query Example SELECT * FROM customer, orders, items WHERE customer_num = 105 AND customer.customer_num = orders.customer_num AND orders.order_num = items.order_num;
customer orders
dbspace1 dbspace1
dbspace2
items
dbspace3
16
For example, consider the above queries. From this query, you can determine that you should place your customer, orders, and items tables on different disks to speed up joins. If you plan to fragment the tables, you need to review the WHERE clauses of these and other queries to determine if fragmentation by expression is appropriate.
7-16 Fragmentation Strategies
OLTP Example n
Matrix entries for these queries would look like: Customer - 1M Query 1
S
Orders 4M S
Items 12M
Stock 10K
Catalog 10K
S
Query 2
S
I
17
Perhaps, many of our queries, like query 1, contain a filter on the customer_num column, and our goal is to evenly distribute the customer rows across available disks, while maintaining the ability to eliminate fragments that are not needed by a query. We could accomplish this using a hash function on customer_num, such as: FRAGMENT BY EXPRESSION MOD (customer_num, 3) = 0 IN dbspace1, MOD (customer_num, 3) = 1 IN dbspace2, MOD (customer_num, 3) = 2 IN dbspace3...
Note Hash functions are an excellent way to evenly distribute serial and integer column values while maintaining the ability to eliminate fragments. The server will not eliminate fragments from tables fragmented by round robin.
Fragmentation Strategies 7-17
Attached Indexes Attached indexes are indexes which are created when the storage option clause of the create index statement is not used Characteristics: n Index fragment(s) will always be in the same dbspace(s) as the table fragment(s) n Indexes will have a four byte pointer to the corresponding data row (row-id only) plus delete flag n An index fragment will only point to data in the table fragment occupying the same dbspace
18
To create an attached index, you do not specify the fragmentation strategy or a storage option in the CREATE INDEX statement:
Syntax CREATE TABLE tbl_one (a int) FRAGMENT BY EXPRESSION (a <= 100000) IN dbspace1, (a <= 200000 AND a > 100000) IN dbspace2, (a <= 300000 AND a > 200000) IN dbspace3; . . CREATE INDEX idx1_tbl_one ON tbl_one (a);
7-18 Fragmentation Strategies
Detached Indexes Detached indexes are indexes which are created when the storage option clause of the create index statement is used Characteristics: n Index fragment(s) may be in different dbspace(s) from the table fragment(s) n Indexes may have an eight byte pointer to the corresponding data row (fragment ID and ROWID) plus the delete flag n An index fragment may point to data in the table fragment(s) occupying different dbspace(s)
19
A detached index is an index with a fragmentation strategyis that is separate from the fragmentation strategy for the dbspaces. The index fragementation strategy is set up explicitly with the CREATE INDEX statement:
Syntax CREATE TABLE tbl_one (a int) FRAGMENT BY EXPRESSION (a <= 100000) IN dbspace1, (a <= 200000) AND (a > 100000) IN dbspace2, (a <= 300000) AND (a > 200000) IN dbspace3; CREATE INDEX tbl_one_idx1 ON tbl_one (a) FRAGMENT BY EXPRESSION (a <= 100000) IN dbspace_ix1, (a <= 200000) AND (a > 100000) IN dbspace_ix2, (a <= 300000) AND (a > 200000) IN dbspace_ix3;
Fragmentation Strategies 7-19
Fragmented Index Scenarios n n n n n n
Attached index on a non-fragmented table Attached index on a fragmented table Detached index on a non-fragmented table Detached fragmented index on a non-fragmented table Detached index on a fragmented table Detached fragmented index on a fragmented table
20
We will look at these six index types in detail in the next several pages. We will give an example of how to create each of the above indexes along with the performance reasons for using or avoiding each one.
7-20 Fragmentation Strategies
Attached Index on a Non-fragmented Table Small tables in a DSS or OLTP environment. n No parallelism n No fragment elimination n Smaller index pointer (ROWID only)
Bit Map Data
Bit Map Index
21
By default tables are created non-fragmented. If a table is non-fragmented and an index is created on that table, and the index statement has no storage clause, an attached index is created. This type of index is usually created on small tables. The attached index fragment resides in the same dbspace as the corresponding table data, but in it’s own separate tblspace. Why might you want this index? It takes up less space than a detached index, four less bytes per row.
Example Create table customer (....) in dbspace1; CREATE INDEX cust_idx ON customer(customer_no);
Fragmentation Strategies 7-21
Attached Index on a Fragmented Table Large table DSS or OLTP environment. n index parallel scans n index fragment elimination and smaller btrees n scans on data pages in parallel n Balanced I/O for indexes and data pages
22
If a table is created with a fragmentation strategy then any index which is created on that table by default follows the fragmentation strategy. The index pages are stored in their own extents but in the same dbspace as the table fragments. These index fragments are tightly coupled to the data fragments they point to. The index pointer will always point to a data row which exists in the same dbspace. Advantages of this index type: If the index column(s) is the same as the fragmented column(s) then fragment elimination will occur at the index level. This will make the index smaller and the optimizer will be more incline to use it. This type of index is generally used for large OLTP and DSS environments when the index fragmentation isn't required to be different from the tables fragmentation.
Example CREATE TABLE customer (...) FRAGMENT BY EXPRESSION customer_no BETWEEN 1 AND 1000 IN dbspace1, customer_no BETWEEN 1001 AND 2000 IN dbspace2; CREATE INDEX cust_idx ON customer(customer_no);
7-22 Fragmentation Strategies
Attached Index on a Fragmented Table - Example 1 Week 1
Week 2
Week 3
Week 4
X
X
X
X
X
X
X
X
Week 3
Week 4
Week 1
Week 2
Week 5
Week 5 23
Assume we have a sales tables fragmented by week. There are 100 weeks with 1 million rows of data per week. There are 50 evenly distributed departments.
Example
CREATE TABLE sales ( week int dept int qty int price decimal) FRAGMENT BY EXPRESSION week = 1 in dbs1, week = 2 in dbs2...;
SELECT SUM (price * qty) FROM sales WHERE week = 2 AND dept = 10;
If we scan this table we will eliminate 99 fragments. This will still leave us 1 fragment with 1 million rows to filter. We add an index fragmented the same way as the table: Create index dept_idx on sales (dept);
This will cause index fragment elimination of 99 index fragments, leaving us with 1 index fragment to search for dept. #10. Seeing as this is 2% of the index we are only searching for 2% of 1% of the data which is 20,000 index lookups. Significantly fewer than 1 million rows scanned.
Fragmentation Strategies 7-23
Attached Index on a Fragmented Table - Example 2 Week 1
Week 2
Week 1
Week 2
Week 3
Week 3
Week 4
Week 4
Week 5
Week 5 24
Assume we have a sales tables fragmented by week. There are 100 weeks with 1 million rows of data per week. There are 50 evenly distributed departments.
Example
CREATE TABLE sales ( week int dept int qty int price decimal) FRAGMENT BY EXPRESSION week = 1 in dbs1, week = 2 in dbs2...;
SELECT SUM (price * qty) FROM sales WHERE dept = 10;
If we scan this table we will eliminate no fragments. This will still leave us 100 fragment with 100 million rows to filter. We add an index fragmented the same way as the table: CREATE INDEX dept_idx ON sales (dept); This will create an index where each fragment contains key values for a particular week, but each fragment contains all departments activities for that week. This will cause parallel index scans of 100 index fragments. Seeing as we are looking for 2% of the data we do 2 million index lookups. These happen in parallel across 100 fragments. Significantly fewer than 100 million rows scanned.
7-24 Fragmentation Strategies
Detached Index on a Non-fragmented Table Small tables in DSS or OLTP environment with high hit rates on the index and data pages at the same time n Separate index and data page hits on different disks
25
Detached indexes which are not fragmented may be placed in any dbspace including the one which contains the non fragmented table. Advantages of this index type: The index and data pages can be placed on separate disks if different dbspaces are used and different disks are assigned to those dbspaces. This will help performance if the index and data pages are often used together, especially for index scans involving data page reads. This index type may also be advantageous because the index will look less attractive to the optimizer than a fragmented index. Therefore, when scanning is desired over index lookups, the optimizer is more likely to ignore the index. Generally used if index and data pages are required to be separated for performance reasons.
Example
CREATE TABLE customer (...) IN dbspace1; CREATE INDEX cust_idx ON customer (customer_no) IN dbspace2;
Fragmentation Strategies 7-25
Detached Fragmented Index, Non-fragmented Table OLTP environment with high index hits vs. data page hits n index scans in parallel n index lookups, fragment elimination, smaller btrees
26
Detached fragmented indexes pointing at a non fragmented table will also require an 8 byte pointer from the index page to the data page. This type of index gives the same types of benefits that an attached index would on a fragmented table. If the index is fragmented on the same column(s) that is the key value for the index then fragment elimination can occur. If the index is fragmented on a different column than the index key value, then parallel index scans can occur. Fragment elimination could still occur if the where clause of the query contains a filter on the fragmentation column(s). This index type might be used in an OLTP environment with key only reads or a low number of data page reads vs. index page reads. It can be used to spread the I/O of the index across multiple disks.
Example
CREATE TABLE customer (...) IN dbspace1; CREATE INDEX cust_idx ON customer(customer_no); FRAGMENT BY EXPRESSION customer_no BETWEEN 1 AND 1000 IN dbspace1, customer_no BETWEEN 1001 AND 2000 IN dbspace2;
7-26 Fragmentation Strategies
Detached Index on a Fragmented Table DSS environment with some selective queries. n data pages scaned in parallel n serial index read
27
This index type could be used in a DSS environment when an index is needed for some very selective queries, but generally scans are preferred. The table looks attractive for scanning because it can be done in parallel. Index scans will be less attractive because the btree is large and no parallelism can occur. An eight byte index pointer (fragid, rowid) is required because rows could exist in any one of many dbspaces.
Example CREATE TABLE customer (...) FRAGMENT BY EXPRESSION customer_no BETWEEN 1 AND 1000 IN dbspace1, customer_no BETWEEN 1001 AND 2000 IN dbspace2; CREATE INDEX cust_idx ON customer(customer_no) IN dbspace_idx1;
Fragmentation Strategies 7-27
Detached Fragmented Index on a Fragmented Table Mixed OLTP and DSS environments with data fragmented for DSS and index fragmented for OLTP or Selective queries and nonselective queries on different columns in a DSS environment. n index parallel scans. n index fragment elimination and smaller btrees. n scans on data pages in parallel. n Balanced I/O for indexes and data pages.
28
Example CREATE TABLE customer (...) FRAGMENT BY EXPRESSION customer_no BETWEEN 1 AND 1000 IN dbspace1, customer_no BETWEEN 1001 AND 2000 IN dbspace2; CREATE INDEX cust_idx ON customer(zipcode) FRAGMENT BY EXPRESSION zipcode BETWEEN 1 AND 1000 IN dbspace1, zipcode BETWEEN 1001 AND 2000 IN dbspace2;
7-28 Fragmentation Strategies
Detached Index on a Fragmented Table - Example Week 1
Week 2
X Dept 1
Week 3
X Dept 2
Dept 3
Week 4
Week 5
X
X
Dept 4
Dept 5
29
Assume we have a sales tables fragmented by week. There are 100 weeks with 1 million rows of data per week. There are 50 evenly distributed departments.
Example
CREATE TABLE sales ( week int dept int qty int price decimal) FRAGMENT BY EXPRESSION week = 1 in dbs1, week = 2 in dbs2...;
SELECT SUM (price * qty) FROM sales WHERE dept = 2; Create index dept_idx on sales (dept) fragment by expression dept = 1 in dbs1; dept = 2 in dbs2...;
If we scan this table we will scan all fragments. This will still leave us 100 million rows to filter. If we add an index fragmented differently from the table as shown in the example above this will cause index fragment elimination of 99 index fragments, leaving us with one index fragment to search for dept #10. Seeing as this is 2% of the index we are only searching through two million key values. Significantly fewer t han 100 million rows scanned.
Fragmentation Strategies 7-29
Planning Expression Based Fragmentation 1. Load the table by round robin or into a single dbspace. 2. Run UPDATE STATISTICS MEDIUM to build distributions. 3. Execute dbschema -d database -hd tablename. 4. Use distribution information and SQL to determine the best fragmentation strategy. 5. Know your application and know your data.
30
If your SQL statements frequently use filters (<, >, <=, >=, IN, BETWEEN), you may want to use the following technique to come up with an appropriate fragmentation strategy. First, load each table into a single dbspace, or into multiple dbspaces using round robin fragmentation. Next, UPDATE STATISTICS MEDIUM on each filter column to produce data distribution information. Once you have updated statistics for the table, execute the dbschema command with the -hd option to produce output detailing the data values for each column. Finally, use the distribution information in conjunction with the SQL statements that will access the table to determine an appropriate fragmentation strategy for the table. Remember, your goals when fragmenting a table are to: n
Reduce the number of pages read to resolve a query by eliminating fragments.
n
Evenly distribute disk reads and writes across all available disks.
7-30 Fragmentation Strategies
Fragment Elimination Non-overlapping Overlapping or fragments on a non-contiguous single column fragments on a single column.
Non-overlapping or fragments on multiple columns.
Range Expression.
Fragments can be eliminated.
Fragments cannot be eliminated.
Fragments cannot be eliminated.
Equality Expression.
Fragments can be eliminated.
Fragments can be eliminated.
Fragments can be eliminated.
31
The optimal fragmentation strategy is to create non-overlapping fragments on a single column so fragments can be eliminated with either a range or equality expression. An example of overlapping fragments are: unique1 <= 2000 in dbs1, unique1 <= 6000 in dbs3,
unique1 <= 4000 in dbs2, unique1 <= 8000 in dbs4;
To create non-overlapping fragments bind the values on both sides: unique1 <= 2000 and unique1 >= 1 in dbs1, unique1 <= 4000 and unique1 > 2000 in dbs2, unique1 <= 6000 and unique1 > 4000 in dbs3, unique1 <= 8000 and unique1 > 6000 in dbs4; A non-contiguous fragmentation scheme means that some data values cannot be accounted for. Fragments can be eliminated on an equality search but not on a range search. Examine the SET EXPLAIN output to determine if your fragmentation strategy is effective in eliminating fragments from the query plan. Again, it is important to monitor the I/O activity to verify your strategy. The output for onstat -D and onstat -g iof displays I/O statistics by chunk and is only meaningful if you created a single fragment per chunk. The onstat -g ppf command displays the number of read and write calls per fragment.
Fragmentation Strategies 7-31
Improving Data Availability To improve data availability through fragmentation on very large tables: n leave indexes attached to fragmented tables n when applications can use specific subsets of data if the entire table is not available, fragment by expression to group those subsets on the same disks n use the DATASKIP configuration parameter n use the SET DATASKIP SQL statement n for tables the are primarily insert only, use round robin fragmentation
32
Configuration
The DATASKIP configuration parameter allows you to specify which dbspaces, if any, queries can skip when those dbspaces are unavailable as the result of a disk failure. You can list specific dbspaces and turn data skipping on or off for all dbspaces.
7-32 Fragmentation Strategies
Avoid Deletes with ALTER FRAGMENT Often in OLTP and DSS environments, large amounts of data must be purged from existing tables. n In cases when the where condition used to purge the data matches one of the conditions used to fragment the table, you can avoid the costly and invasive delete operation by using the ALTER FRAGMENT...DETACH syntax. Q4_99db
Q1_00db
Q2_00db
Q3_00db
Q4_00db
+ 33
The DETACH clause is used to separate a table into two tables. Once a fragment is detached, the table that is created my be dropped. This is particularly useful in situations where a rolling set of fragments is being maintained over time with new fragments being added and old fragments being removed. Index rebuilds on the original table will not be necessary if the index fragmentation strategy of the detached fragment is identical with the table fragmentation. In this case, the index fragments corresponding to the detached fragment will simply be dropped. Consider the following: a company keeps orders in their system for one year, and every quarter they add an additional dbspace to their system for the coming quarter, and then detach the oldest dbspace from the table once the new quarter has started. Near the end of the third quarter of the year 2000 the DBA creates another dbspace named Q4_00db: ALTER FRAGMENT ON TABLE orders ADD order_date >= “10/01/99” AND order_date < “01/01/2000” IN Q4_00db BEFORE Q3_00db;
Once into the fourth quarter the DBA detaches the Q4_99db fragment into a separate table: ALTER FRAGMENT ON TABLE orders DETACH Q4_99db old_q4_99_orders;
Fragmentation Strategies 7-33
7-34 Fragmentation Strategies
Exercises
Fragmentation Strategies 7-35
Exercise 1 The purpose of this exercise is to compare query performance between two different types of table fragmentation. However, to notice a difference we will need to load a much larger number of rows into a single table. Therefore you will rerun the wisc_3_1.sql schema to drop the database and recreate it (the tables still being fragmented ROUND ROBIN), and then load 400,000 rows into the tenktup1 table only. You will create a detached index on this table, and then run a modified version of the q2.sql query and capture the timing. Next you will drop the database and recreate the tenktup1 table fragmented by expression across the same four dbspaces, load 400,000 rows again, build an attached index on this table, and then run the same modified version of the q2.sql query and capture the timing. 1.1 Run the wisc_3_1.sql schema again as previously: dbaccess sysmaster wisc_3_1.sql
1.2 Load the tenktup1 table with 400,000 rows: load_tab 400000 0 tenktup1
1.3 Create a detached index tenk1idx1 on the tenktup1 table: create index tenk1idx1 on tenktup1 (unique1) in dbspace5;
1.4 Copy your q2.sql query file into a new file, q2a.sql, and modify it as follows: SELECT tenPercent, SUM (unique1), SUM (unique2), count(*) FROM tenktup1 WHERE unique1 BETWEEN 0 AND 100000 GROUP BY 1;
1.5 Execute q2a.sql and capture timings. For example: (time dbaccess wisc_db q2a.sql) > q2a.time1 2>&1
1.6 Since the tenktup1 table is filtered primarily on the unique1 column, your next step will be to recreate the tenktup1 table, but this time fragment it by expression on unique1. Make a copy of your wisc_3_1.sql file and name it wisc_6_1.sql. Leave onektup and tenktup2 table fragmentation as is. Fragment tenktup1 by expression across four fragments in the same four dbspaces (dbspace1, dbspace2, dbspace3 and dbspace4) as shown below: drop database wisc_db; create database wisc_db; create table tenktup1
7-36 Fragmentation Strategies
( unique1 INTEGER NOT NULL, unique2 INTEGER NOT NULL, two INTEGER NOT NULL, four INTEGER NOT NULL, ten INTEGER NOT NULL, twenty INTEGER NOT NULL, onePercent INTEGER NOT NULL, tenPercent INTEGER NOT NULL, twentyPercent INTEGER NOT NULL, fiftyPercent INTEGER NOT NULL, unique3 INTEGER NOT NULL, evenOnePercent INTEGER NOT NULL, oddOnePercent INTEGER NOT NULL, stringu1 CHAR(52) NOT NULL, stringu2 CHAR(52) NOT NULL, string4 CHAR(52) NOT NULL ) fragment by expression unique1 <= 100000 in dbspace1, unique1 <= 200000 and unique1 > 100000 in dbspace2, unique1 <= 300000 and unique1 > 200000 in dbspace3, unique1 <= 400000 and unique1 > 300000 in dbspace4;
Note The fragments must not overlap.
1.7 Run this file: dbaccess sysmaster wisc_6_1.sql;
1.8 Execute the following: load_tab 400000 0 tenktup1
1.9 Now create an attached index on the table tenktup1. Note that the index will be created as an attached index because we have not specified any dbspaces in the index creation sql. Run the following from inside dbaccess: create index tenk1idx1 on tenktup1 (unique1);
1.10 Execute the following: (time dbaccess wisc_db q2a.sql) > q2a.time1 2>&1
1.11 Did the query execution time change? Why?
Fragmentation Strategies 7-37
7-38 Fragmentation Strategies
Module 8 Managing Query Resources
Managing Query Resources 07-2002 © 2002 International Business Machines Corporation
8-1
Objectives At the end of this module, you will be able to: n Optimize parallel query operations n Ensure that inserts are performed in parallel n Use the Memory Grant Manager to monitor and tune parallel queries
2
8-2 Managing Query Resources
Parallel Database Query Parallel Database Query divides large queries into multiple parallel tasks: n n n n n n
Scans Joins Sorts Aggregates Group by Inserts
Exchange
Exchange
3
Parallel Database Query (PDQ) improves the performance of large complex queries by allocating a significant amount of memory and allowing a single query to utilize multiple CPU VPs. PDQ consists of the five parallel components shown in the above slide. To perform a parallel query, multiple threads are started for a parallel component, each performing a subset of the work. These threads can run on different CPU VPS, which, in turn, can run on different physical CPUs. This allows multiple physical processors to perform work on behalf of a single request. In addition to query parallelization, parallel inserts are performed for statements of the form SELECT ... INTO TEMP and INSERT INTO ... SELECT FROM. The scan portion of a DELETE statement will be performed in parallel if the table is not referenced by a foreign key constraint with the ON DELETE CASCADE feature. Parallel operations will not be performed in the following cases: n
The query has no parallel PDQ operations, as listed above.
n
The query is optimized with CURSOR STABILITY isolation mode.
n
The query uses a cursor declared FOR UPDATE or WITH HOLD.
n
The query is a parent query of a correlated subquery.
n The query contains an SPL call; for example: SELECT col FROM tab WHERE col2 = PROC(col3);
Managing Query Resources 8-3
How to Turn on PDQ To identify a query as a DSS query to the IBM IDS server: n Use the SET PDQPRIORITY SQL statement PDQPRIORITY
Description
0
Default. Will be run as an OLTP type query with no parallelism.
1
Parallel scan threads only.
2 - 100
Will be run as a DSS type query and set the degree of parallelism. Parallelism is determined by the number of PDQ threads and the amount of DS_TOTAL_MEMORY granted to the query.
4
Parallel Data Query (PDQ) can be turned on for a particular query or set of queries using the SQL SET PDQPRIORITY statement. For example, SET PDQPRIORITY 85;
Also, it may be turned on for all queries run by particular users using the PDQPRIORITY environment variable: export PDQPRIORITY=85
It is recommended that you not do this, since running all queries at this same PDQPRIORITY level for a single user is likely to result in a unfair allocation of computer resources. While the parallel capabilities of the Informix Dynamic Server are extremely powerful, there is associated overhead that can slow down traditional OLTP queries. For that reason, Informix allows you to enable PDQ only when appropriate. Typically, you want to enable PDQ for large batch operations, such as loads and index builds, and for all DSS queries. The administrator may limit PDQ priority for all users using the MAX_PDQPRIORITY configuration parameter. A user’s effective priority is: (pdqpriority/100) * (MAX_PDQPRIORITY/100)
8-4 Managing Query Resources
where pdqpriority is the value set by either the SET PDQPRIORITY statement or the PDQPRIORITY environmental variable.
Example It is recommended that when writing SPL routines, you should turn PDQ priority off when you enter a procedure, turn it on for specific statements, and then turn it off again (see WARNING below): CREATE PROCEDURE my_proc (a INT, b INT, c INT) Returning INT, INT, INT; SET PDQPRIORITY 0; ... SET PDQPRIORITY 85; (big SELECT statement) SET PDQPRIORITY 0; ... ;
The SQL statement SET PDQRIORITY takes precedence over the environment variable, PDQPRIORITY. Parallel secondary threads (join, insert) will only be spawned when the PDQ priority for the query is greater than 1.
Warning! Do NOT have PDQPRIORITY set when running UPDATE STATISTICS for stored procedures or the procedure will compile and run with that PDQRIORITY setting! This is another reason to avoid using the PDQPRIORITY environment variable.
Managing Query Resources 8-5
PDQ and Parallel Sorts Parallel sorts typically require n PSORT_NPROCS set equal to 0 or greater than 1 n PDQPRIORITY > 1
Sort Threads Exchange
6
To ensure that multiple sort threads will service the query, be sure to set the environment variable, PSORT_NPROCS, and set PDQPRIORITY for the query to a value greater than 0. Assuming that adequate memory is allocated, IDS will spawn parallel sort threads according to the following guidelines for ordinary queries: n
If PSORT_NPROCS is set to 0, IDS will spawn three sort threads.
n
If PSORT_NPROCS is set to a value greater than 1, IDS will spawn that number of sort threads.
n
If PSORT_NPROCS is not set, parallel sorts will not be performed for queries.
Warning! While parallel sorts are not explicitly disabled by a PDQPRIORITY of 0, the maximum memory granted for sorts will be 128 KB. The limited memory will usually prevent more than one sort thread from being spawned.
8-6 Managing Query Resources
Exercise 1 1.1 Run the wisc_3_1.sql schema again as previously: dbaccess sysmaster wisc_3_1.sql
1.2 Load the tables with 10,000 rows, 100,000 rows and 100,000 rows using your shell script from a previous exercise: ./loadtables_3_1.sh onmode -c
1.3 Build your indexes as in a previous exercise; dbaccess wisc_db onektup_ix.sql dbaccess wisc_db tenktup1_ix.sql dbaccess wisc_db tenktup2_ix.sql
1.4 Turn off PDQ. This can be done from the command line by setting MAX_PDQPRIORITY to 0: onmode -D 0
1.5 Verify that PDQ is off by looking at the Memory Grant Manager output: onstat -g mgm | more
1.6 Make a copy of the Wisconsin benchmark query file q3.sql called q3a.sql, and modify q3a.sql to include the SQL statement SET EXPLAIN ON for optimizer information in the sqexplain.out file: SET EXPLAIN ON; SELECT tenktup1.tenPercent, SUM (tenktup1.unique1), SUM (tenktup2.unique2), count(*) FROM tenktup1, tenktup2 WHERE tenktup1.unique1 = tenktup2.unique1 AND tenktup1.unique1 BETWEEN 0 AND 80000 GROUP BY 1;
1.7 Run the query and capture timings: (time dbaccess wisc_db q3a.sql) > q3a.time1 2>&1
1.8 Turn on PDQ, setting MAX_PDQPRIORITY to 100. This can be done from the command line by setting MAX_PDQPRIORITY to 100: onmode -D 100
1.9 Verify that PDQ is on by looking at the Memory Grant Manager output: onstat -g mgm | more
1.10 Prepare to rerun the query with a pdqpriority of 100, by modifying the q3a.sql file to include the statement “SET PDQPRIORITY 100;”. Be sure to use SET EXPLAIN ON Managing Query Resources 8-7
to capture the optimizer information to the sqexplain.out file. SET EXPLAIN ON; SET PDQPRIORITY 100; SELECT tenktup1.tenPercent, SUM (tenktup1.unique1), SUM (tenktup2.unique2), count(*) FROM tenktup1, tenktup2 WHERE tenktup1.unique1 = tenktup2.unique1 AND tenktup1.unique1 BETWEEN 0 AND 80000 GROUP BY 1;
1.11 Run the query and capture timings: (time dbaccess wisc_db q3a.sql) > q3a.time2 2>&1
8-8 Managing Query Resources
PDQ Administration PDQ administration involves managing resources allocated for parallel operations through the Informix Dynamic Server’s: n n n
Configuration parameters Memory Grant Manager onmode command
Queries:
Active
Ready
Maximum
3
0
5
Memory:
Total
Free
Quantum
(KB)
4000
3872
800
Scans:
Total
Free
Quantum
10
8
1
9
The Informix Dynamic Server system administrator is responsible for managing the system resources used by the parallel features of the engine. The administrator may control the amount of memory and CPU resources available to a query by setting the PDQ limits in the ONCONFIG file. Once these limits are set, they may be changed dynamically using the onmode command. The Memory Grant Manager allows the administrator to monitor how PDQ queries are being managed under the current resource limits. The ability to control and manage PDQ resources allows the administrator to optimize parallel operations and to tune and intelligently balance the resources available when both OLTP and DSS queries are run simultaneously. The ability to accommodate hybrid DSS and OLTP environments significantly differentiates the Informix Dynamic Server from other RDBMS servers.
Managing Query Resources 8-9
Allocating Shared Memory for PDQ Shared Memory
Decide on percentage of virtual shared memory to be used for PDQ
Resident Portion Virtual Portion
OLTP Queries
DSS Queries = DS_TOTAL_MEMORY
10
Decision support queries use large amounts of the virtual portion of shared memory to perform join, sort and group operations. Considering the virtual shared memory available on your system, you must decide what portion is to be used for PDQ. If you must balance resources for OLTP and DSS environments, use the DS_TOTAL_MEMORY configuration parameter to limit the virtual shared memory available for decision-support queries. Monitor the virtual shared memory used by the OLTP queries. If you determine that the OLTP queries do not fully utilize the allocated virtual-portion memory, then consider configuring shared memory for your DSS queries. For DS_TOTAL_MEMORY, the following settings are recommended for the range in operational environments from OLTP to DSS: Environment
DS_TOTAL_MEMORY
OLTP
50% of virtual memory size
Hybrid
Between 50 and 80% of virtual memory size
DSS
90% of virtual memory size
8-10 Managing Query Resources
PDQ Configuration Parameters
MAX_PDQPRIORITY
Limits the value of PDQPRIORITY. This is a governor for the value of PDQPRIORITY supplied by the end user. DS_TOTAL_MEMORY Maximum amount of virtual memory available for decision support queries. DS_MAX_QUERIES Maximum number of concurrent decision support queries. DS_MAX_SCANS The maximum number of concurrent scan threads for decision support queries. It is currently not implemented.
11
The administrator will determine system wide limits on the amount of resources dedicated to parallel operations using PDQ configuration parameters.
Managing Query Resources 8-11
Changing Parameters Dynamically n n n n
Dynamically change DS_TOTAL_MEMORY: onmode -M kbytes Dynamically set DS_MAX_QUERIES: onmode -Q max_queries Dynamically set MAX_PDQPRIORITY: onmode -D priority Dynamically set DS_MAX_SCANS: onmode -S max_number_scan_threads
12
You can change the PDQ configuration parameters dynamically by using the onmode utility options listed above. The DS_TOTAL_MEMORY and DS_MAX_QUERIES parameters will not be re-initialized with new values until all parallel database queries currently running are complete. Once the queries have completed, the new parameters will take effect. Any new parallel queries that register with the MGM will be prevented from running until the reinitialization of the new parameter is complete.
Note These changes are not written to the ONCONFIG file and will not be maintained when the instance is shut down and restarted.
8-12 Managing Query Resources
Light Scan Buffer Size and PDQ Shared Memory Virtual Portion
Read ahead parameters determine the number of light scan buffers!
Light Scan Buffer Light Scan Buffer Light Scan Buffer
light_scan_buffers = roundup( (RA_PAGES + RA_THRESHOLD)/(MAXAIOSIZE/PAGESIZE)) # System Configuration
RA_PAGES RA_THRESHOLD
128 120
#... #... 13
Increasing the values of the read ahead parameters may increase performance of parallel data queries by enlarging the size of light scan buffers in the virtual portion of shared memory. The MAXAIOSIZE is a constant determined when IBM IDS is ported to a specific platform. Example: RA_PAGES = 128, RA_THRESHOLD = 120, PAGESIZE = 2K and MAXAIOSIZE = 60: light_scan_buffers = roundup((128 + 120)/(60/2)) x KB = roundup(248/30) KB = roundup(8.26666) KB = 9 KB
Monitor lights scans using the onstat -g lsc command. The output will display a line for each light scan buffer and the pages left column will display the number of pages to be read. Light scans can significantly improve the speed of large PDQ queries, but remember that light scans do not depend on PDQPRIORITY. Light scans can occur even if PDQPRIORITY is set to 0. To force light scans, you may set the environment variable, LIGHT_SCANS: export LIGHT_SCANS=FORCE
Managing Query Resources 8-13
The Memory Grant Manager (MGM) The Memory Grant Manager (MGM) controls and reserves resources used by parallel database queries: n The number of concurrent parallel database queries n The number of scan threads started for the IBM IDS system, and the number of scan threads started for each parallel database query n The number of PDQ threads that are started for each query n The amount of virtual memory and number of CPU VPs available for each query
14
The Memory Grant Manager (MGM) is a database server component that coordinates the use of memory, CPU virtual processors (VPs), disk I/O, and scan threads among decision-support queries. The MGM uses the DS_MAX_QUERIES, DS_TOTAL_MEMORY, DS_MAX_SCANS, and MAX_PDQPRIORITY configuration parameters to determine the quantity of these PDQ resources that can be granted to a decision-support query. When your database server system has heavy OLTP use, and you find performance is degrading, you can use the MGM facilities to limit the resources committed to decision-support queries. During off-peak hours, you can designate a larger proportion of the resources to parallel processing, which achieves higher throughput for decision-support queries. The MGM grants memory to a query for such activities as sorts, hash joins, and processing of GROUP BY clauses. The amount of memory that decision-support queries use cannot exceed DS_TOTAL_MEMORY.
8-14 Managing Query Resources
How Memory is Granted n
Memory is granted in units called a quantum. w Effective PDQPRIORITY = (PDQPRIORITY / 100) * (MAX_PDQPRIORITY / 100) w Memory for Query = DS_TOTAL_MEMORY * Effective PDQPRIORITY rounding down to the nearest quantum. w A quantum is calculated as follows: quantum = DS_TOTAL_MEMORY/DS_MAX_QUERIES
15
Memory is granted to a query by the Memory Grant Manager for activities such as sorts, hash joins, and GROUP BY operations. The cumulative amount of memory granted to all concurrent parallel database queries cannot be more than DS_TOTAL_MEMORY. The MGM grants memory to queries in quantum units. When a query registers with the MGM, the MGM reserves a percentage of DS_TOTAL_MEMORY for the query. The amount of memory reserved for the query is influenced by several factors, but cannot be more than: DS_TOTAL_MEMORY * (PDQPRIORITY / 100) * (MAX_PDQPRIORITY / 100) rounded down to the nearest quantum, but not less than one quantum.
Managing Query Resources 8-15
Scan Threads and Secondary Threads n
n
The number of scan threads allocated will be the minimum of either: w the number of fragments being scanned, or w ((PDQPRIORITY/100) * (MAXPDQPRIORITY/100) * DS_MAX_SCANS) The number of secondary threads allocated is determined by: w ((PDQPRIORITY/100) * (MAXPDQPRIORITY/100) * NUMCPUVPS
16
For very large systems, the ability to limit the number of PDQ threads allocated may be very important. Managing threads requires thread context switching by the CPUVPs. Additionally, a thread control block must be allocated in virtual memory for each thread. In situations where tables with hundreds of fragments must be scanned, generating thousands of concurrent PDQ threads, the system may be overwhelmed by the thread switching and memory requirements.
8-16 Managing Query Resources
Monitoring MGM onstat -g mgm ... MAX_PDQ_PRIORITY: DS_MAX_QUERIES: DS_MAX_SCANS: DS_TOTAL_MEMORY:
Queries: Memory: (KB) Scans:
Active 3 Total 4000 Total 10
100 5 10 4000 KB
Ready 0 Free 3872 Free 8
All parallel database queries with a PDQPRIORITY > 0 must pass through the MGM
Maximum 5 Quantum 800 Quantum 1 17
The first part of the output shows the values of the PDQ configuration parameters. The second part describes MGM internal control information, in four groups:
Scans
Memory
Queries
Column
Description
Active
Number of PDQ queries that are currently executing.
Ready
Number of user queries ready to run but whose execution the database server deferred for load-control reasons.
Maximum
Maximum number of queries that the database server permits to be active. Reflects DS_MAX_QUERIES.
Total
KB of memory available for use by PDQ queries. Reflects DS_TOTAL_MEMORY.
Free
KB of memory for PDQ queries not currently in use.
Quantum
KB of memory in a memory quantum.
Total
The total number of scan threads reflects DS_MAX_SCANS
Free
Number of scan threads currently available for DSS queries.
Quantum
The number of scan threads in a scan-thread quantum.
Managing Query Resources 8-17
onstat -g mgm continued Load Control:
(Memory) Gate 1
(Scans) Gate 2
(Priority) Gate 3
(Max Queries) Gate 4
(Reinint) Gate 5
(Queue Length)
0
0
0
0
0
Active Queries:
Session
Query
Priority
Thread
Memory
Scans
Gate
7 7 9
a3d0c0 a56eb0 a751d4
1 1 0
a8adcc ae6800 96b1b8
0/0 0/0 16/16
1/1 1/1 0/0
-
Ready Queries: None ...
18
The next part of the output descibes MGM Load Control: Column
Description
Memory
Number of queries that are waiting for memory.
Scans
Number of queries that are waiting for scans.
Priority
Number of queries that are waiting for queries with higher PDQ priority to run
Max Queries
Number of queries that are waiting for a query slot
Reinit
Number of queries that are waiting for running queries to complete after an onmode -M or -Q command
The next part of the output describes Active Queries. This describes the MGM active and ready queues, and the output shows the number of queries waiting at each gate.
(Memory)
(Scans)
8-18 Managing Query Resources
(Priority)
(Max Queries) (Reinint)
The description of these columns is shown below: Column
Description
Session
The session ID for the session that initiated the query.
Query
Address of the internal control block associated witht the query
Priority
PDQ priority assigned to the query
Thread
Thread that registered the query with MGM
Memory
Memory currently granted to the query or memory reserved for the query (Unit is MGM pages = 4KB)
Scans
Number of scan threads currently used by the query or number of scan threads allocated to the query
Gate
Gate number at which query is waiting
Managing Query Resources 8-19
onstat -g mgm (continued) Free Resource: -------------Memory: Scans:
Average # ---------489.2 +- 28.7 8.5 +- 0.5
Minimum # ---------400 8
Queries ------------Active Ready
Average # ------------1.7 +- 0.7 0.0 +- 0.0
Maximum # ---------3 0
Total # --------23 0
Resource/Lock Cycle Prevention count: 0
20
Free Resources: this part of the output provides statistics for MGM free resources.
Column
Description
Average
Average amount of memory in KB and average number of scans
Minimum
Minimum available memory (KB) and minimum number of scans
Queries: this last portion provides statistics concerning MGM queries.
Column
Description
Average
Average active and ready queue lengths
Minimum
Minimum active and ready queue lengths
Total
Total active and ready queue lengths
The numbers shown in the two tables above reflect statistics since system initialization, or the last onmode -Q, -M or -S command.
8-20 Managing Query Resources
Concurrent Query Environments What are we timing? n Time to finish all queries as quickly as possible.
Q1 Q2 Q3
Q1 Q4 Q2 Q3 Q4 n
Finish as many queries as possible in a time period.
Q2
Q4
Q3
Q1 21
PDQPRIORITY can be used to gate queries. If the timing is to see all queries to finish as quickly as possible, then giving the longer running queries more PDQPRIORITY and the quicker queries less will speed up the long running ones and slow down the fast running ones. The goal would be to have all queries finish at the same time. If the goal is to have as many queries finish in a certain amount of time, then gating the longer running queries behind the faster queries will run the shorter queries before the longer ones. A combination of these two goals may be used to run the faster ones as quickly as possible while the long running query slowly runs in the background.
Managing Query Resources 8-21
Factors to Consider in Tuning PDQ n n n n n
How many users will be running decision support queries at one time? Do some queries have higher priority than others? Should all DSS queries run at once or should some wait? Is OLTP SQL running at the same time as DSS queries? How much memory is available?
22
Before deciding how to set the PDQ parameters, consider some of the following factors: n
How many users will be running decision support queries at one time? By estimating how many users are running DSS queries at any one time, you can make better decisions on how to set the PDQ parameters.
n
Do some queries have higher priority than others? Less important queries (e.g. queries that are run in the background) can be run with a lower PDQPRIORITY.
n
Should all DSS queries run at once, or should some wait? By increasing the PDQPRIORITY of some queries, you may cause other queries to wait. Because the running queries will have more resources, they may run quicker. However, the performance of waiting queries may suffer. To determine the best mix of running to waiting queries, consider running a benchmark test, varying the PDQ parameters to determine the best settings.
n
Is OLTP SQL running at the same time as DSS queries? If so, does OLTP SQL have a higher priority? DSS queries can use a large amount of system resources, which can decrease performance of OLTP SQL. If OLTP SQL is important, decrease DS_TOTAL_MEMORY and DS_MAX_SCANS.
8-22 Managing Query Resources
Exercises
Managing Query Resources 8-23
Exercise 2 The following exercise is intended to familiarize you with the onstat -g mgm output and help you to understand and anticipate how the Memory Grant Manager will affect parallel queries. 2.1 Run the wisc_3_1.sql schema again as previously: dbaccess sysmaster wisc_3_1.sql
2.2 Load the tables with 10,000 rows, 100,000 rows and 100,000 rows using your shell script from a previous exercise: ./loadtables_3_1.sh
2.3 Using onmode, set MAX_PDQPRIORITY to 100 and DS_MAX_QUERIES to 2. onmode -D 100 onmode -Q 2
2.4 Open three additional terminal windows. 2.5 In each window, use dbaccess to run the following query: SET PDQPRIORITY 50; SELECT * FROM onektup ORDER BY string4;
2.6 Are any of the queries gated? Why or why not? _____________________________________________________________ 2.7 How much memory does the Memory Grant Manager report as being allocated and used by each query? Why? ______________________________________________________________ 2.8 Use onmode to set MAX_PDQPRIORITY to 33. onmode -D 33
2.9 Restart the queries in all four windows. How many queries run concurrently? Why? _____________________________________________________________ _____________________________________________________________ 2.10 Using onmode, set DS_TOTAL_MEMORY to 1024 KB and DS_MAX_QUERIES to 5. Note that DS_TOTAL_MEMORY must be at a minimum a multiple of 128 KB times MAX_QUERIES. onmode -M 1024 onmode -Q 5
2.11 Open a fourth window. 2.12 In window 1, run the query with a PDQPRIORITY of 20. 2.13 In window 2, run the query with a PDQPRIORITY of 40. 2.14 In window 3, run the query with a PDQPRIORITY of 40.
8-24 Managing Query Resources
2.15 In window 4, run the query with a PDQPRIORITY of 10. 2.16 What happens? ____________________________________________________________ ____________________________________________________________ 2.17 Open another window and run the query with PDQPRIORITY 50. 2.18 What does the onstat -g mgm output show? ____________________________________________________________ ____________________________________________________________
Managing Query Resources 8-25
8-26 Managing Query Resources
Module 9 Tuning Client Server Communication
Tuning Client Server Communication 07-2002 © 2002 International Business Machines Corporation
9-1
Objectives At the end of this module, you will be able to: n Identify techniques for reducing message passing between the client and server n Appropriately configure poll and listen threads n Understand how to configure a multiple subnet LAN n Understand how to monitor network communications using onstat
2
9-2 Tuning Client Server Communication
Efficient Client Server Communication n n n n n
Prepare SQL statements that are used many times. Use stored procedures for tasks that have multiple SQL statements executed as a group. Tune the SQL Statement Cache (IBM IDS 9.21 or higher) where multiple sessions are executing identical statements. Use referential integrity and triggers to reduce SQL messages sent from the client to the server. Return only the data (rows, columns) necessary in the application.
3
Prepare SQL statements that are used many times. When an SQL statement is not prepared, it must be parsed and optimized each time it is executed. The work of parsing and optimizing requires messages to be passed between the client and the server. When a statement will be executed multiple times, you can reduce communications between the client and the server by preparing the SQL statement. The PREPARE statement allows the server to parse and optimize the statement once, then reuse the prepared statement for successive executions, reducing the total number of communications required. Use stored procedures for tasks that have multiple SQL statements executed as a group. Using stored procedures enables you to reduce the number of messages passed between the application and the server by grouping SQL statements together. Stored procedures are also more efficient because the SQL statements do not have to be dynamically parsed and optimized at each execution. Configure the SQL Statement Cache where multiple sessions are executing identical statements. With IBM IDS 9.x, the engine allows sessions to share the SQL statement information that is generated. Therefore, multiple sessions that are executing identical statements may share the information stored in internal data structures. This feature reduces memory utilization at the session level and eliminate the need to re-parse the statement.
Tuning Client Server Communication 9-3
SQL Statement Cache The SQL Statement Cache (SSC) improves OLTP performance where multiple-sessions are executing identical statements. Tunable configuration parameters include: n STMT_CACHE w 0 = Disabled (default) w 1 = Enabled, Sessions OFF w 2 = Enabled, Sessions ON n STMT_CACHE_SIZE w 512 KB Default
4
If the ONCONFIG parameter STMT_CACHE = 1 (enabled), then the SQL statement cache: n
can be controlled at the session level using an environment variable: export STMT_CACHE={0|1}
n
can be controlled at the application level with an SQL statement set STMT_CACHE ON; set STMT_CACHE OFF;
Use the ONCONFIG parameter STMT_CACHE_SIZE to change the size from the default value of 512 KB. Note that new statements will cause the cache to grow beyond this size, but SLQ statements that are no longer being used will be flushed from the cache until STMT_CACHE_SIZE is no longer exceeded. Use onstat to monitor the SQL statement cache: onstat -g cac stmt
Once your production system has been running for a while, and the cache has settled down to a stable size, (under POOLSIZE), you can use this value for STMT_CACHE_SIZE.
9-4 Tuning Client Server Communication
Using Cursors n
Use select and insert cursors to manipulate sets of rows w When multiple rows are involved, cursors are more efficient than individual SQL INSERT and SELECT statements
n n
Use EXECUTE INTO, rather than a cursor, when selecting one row. This will avoid the overhead of opening/closing a cursor
5
If a possibility exists that a query could return more than one row, the application must execute the query differently. Multirow queries are handled in two stages. First, the program starts the query, but does not return any data at this point. Then the program requests the data, one row at a time. These operations are performed using a special data object called a cursor. A cursor is a data structure that represents the current state of a query. The following list shows the general sequence of program operations: 1. 2. 3. 4. 5.
The program declares the cursor and its associated SELECT statement, which allocates memory to hold the cursor. The program opens the cursor and starts the execution of the associated SELECT statement. Any errors in the SELECT statement are detected at this time. The program fetches a row of data into host variables and processes it. The program closes the cursor after the last row is fetched. When the cursor is no longer needed, the program frees the cursor to deallocate the resources it uses.
These operations are performed with SQL statements named DECLARE, OPEN, FETCH, CLOSE, and FREE.
Tuning Client Server Communication 9-5
FET_BUF_SIZE n
n
n
FET_BUF_SIZE determines the size in bytes of the fetch buffer used internally for tuples: setenv FET_BUF_SIZE 16384 One payroll application benchmark encountered a 5% performance improvement with: w FET_BUF_SIZE 32767 w OPTOFC 1 NOTE: In order to change the size of the communication buffer (amount of bytes which are transferred in one packet between client and server), you need to adjust the b-option in the sqlhosts file.
6
The FET_BUF_SIZE parameter allows you to override the size of the fetch buffer that is used for all data except BLOB data. If you are transferring many large rows, a larger FET_BUF_SIZE might improve performance slightly. However, if memory is constrained, the extra memory allocations can deteriorate performance. And if memory is constrained and the application must open a large number of cursors, setting FET_BUF_SIZE to a value less than 4096 may free memory and improve performance. Occasionally, when memory is available and a cursor is used to retrieve very large amounts of data, increasing FET_BUF_SIZE above 4096 may improve performance. You must test your environment to determine the most efficient settings. The default FET_BUF_SIZE is 4096. If you set FET_BUF_SIZE lower than the default value, it is ignored. Valid FET_BUF_SIZE values are small integer values (maximum value is 32767).
9-6 Tuning Client Server Communication
OTPOFC n n n n n n
OPTimize Open, Fetch, Close Reduces client server round trip messages by two Does not open cursor until first fetch Automatically closes cursor at last fetch export OPTOFC=1 Especially helpful for PeopleSoft client server performance! Requires that SELECT is prepared
7
OPTOFC means “OPTimize Open, Fetch, Close.” Set this in the application’s environment to reduce communication overhead for cursors. If OPTOFC is enabled, the server will open a cursor on the first fetch and close the cursor after the last row is fetched, eliminating message passing for the OPEN and CLOSE statements. export OPTOFC=1
The OPTOFC variable causes execution of the OPEN CURSOR statement to be delayed until the first FETCH is executed. Because both OPEN and FETCH are reduced to a single clientserver communication, network traffic is reduced and performance might benefit. This variable is useful for queries executed by way of cursors. OPTOFC requires Dynamic Server, Version 7.14 or greater, and I-Connect, 5.01.WJ2 or greater.
Tuning Client Server Communication 9-7
IFX_DEFERRED_PREPARE n n n
Allows the PREPARE statement to be executed just before the cursor is opened Reduces network messages export IFX_DEFERRED_PREPARE=1 Improves performance for active client-server environments
8
To enable the deferred-prepare feature for ESQL/C applications that contain dynamic SQL statements with PREPARE, DECLARE, and OPEN statement blocks, set the IFX_DEFERRED_PREPARE environment variable to 1. This will defer execution of the PREPARE statement until the cursor is opened, reducing the number of round trip messages transferred between the client and the server. The deferred prepare feature is useful for preparing: n
SELECT statements (select cursors)
n
EXECUTE FUNCTION statements (function cursors)
n
INSERT statements (insert cursors)
IFX_DEFERRED_PREPARE is available with I-Connect and Dynamic Server, Version 7.23 and greater.
9-8 Tuning Client Server Communication
OPTMSG Enables and disables optimized message transfers
SELECT VERSION FROM MYLOCK; SELECT CURRENT YEAR TO FRACTION(3) FROM MYCLOCK; OPTMSG 1
SELECT VERSION FROM MYLOCK; SELECT CURRENT YEAR TO FRACTION(3) FROM MYCLOCK
9
OPTMSG allows a client application to chain small messages together, reducing the number of communications packets transferred between the client and the server. Enabling OPTMSG will improve the efficiency of network utilization for clients and is likely to improve performance. OPTMSG requires Dynamic Server, Version 7.22 or greater, and I-Connect, 7.20 or greater.
Tuning Client Server Communication 9-9
Client Server Communication Understanding how messages are communicated between your applications and the IDS server is helpful in tuning application response time.
IBM IDS
application
application
application
application
application
oninit oninit oninit oninit oninit
10
Understanding how your application processes communicate with the IDS server can help you do a better job of tuning. In the rest of this module, you will review each of the mechanisms that IDS uses to retrieve messages and return data, and discuss how to monitor and adjust each of the necessary parameters to improve the efficiency of these communications.
9-10 Tuning Client Server Communication
Poll Threads and Connection Requests The poll thread receives the connection request from the client application. application
Receive Connection Request
Poll thread
In a typical server configuration, there are at least two poll threads: n one listens to the tcp port n one monitors shared memory
11
The listen thread will open the port and request a poll thread for the appropriate interface/ protocol (for example, tcp/ip). The poll thread will be responsible for monitoring that port for client requests. Poll threads can run in one of two ways: n
in-line on CPU virtual processors - in general, and particularly on a single-processor computer, this is more efficient.
n
on network virtual processors (depending on the connection type) - this may be more efficient on a multiprocessor system with a large number of remote users.
Tuning Client Server Communication 9-11
Poll Threads and Listen Threads The poll thread then passes on the connection request that it received from the client to the listen thread for the port. application
Receive Connection Request
Poll thread
Listen thread
Start sqlexec thread
Accept connection 12
When a request for a new client connection is received, the poll thread retrieves the message and notifies the listen thread. The listen thread will perform user authentication, set up the session and thread control blocks, and start an sqlexec thread for the new session.
9-12 Tuning Client Server Communication
Monitoring Listen Threads onstat -g ath 8 9 10 11 12 13 14 15 16 ...
a2370a8 a237780 a23a338 a23aa10 a23b138 a23bea0 a1eddf8 a24ee70 a24f4f8 ...
0 0 0 0 0 0 0 0 0 ...
2 2 2 3 3 3 2 2 3 ...
running running running sleeping sleeping sleeping sleeping sleeping sleeping ...
forever forever forever forever secs: 1 forever
1cpu 3cpu 4cpu 1cpu 1cpu 1cpu 3cpu 3cpu 1cpu ...
tlitcppoll sm_poll ipcstrpoll tlitcplst tlitcplst tlitcplst sm_listen sm_discon ipcstrlst ...
Above, threads 11, 12, 13, 14 and 16 are the listen threads: 13
When you start the IDS server, the oninit process starts a listen thread for each DBSERVERNAME and DBSERVERALIAS listed in the ONCONFIG file. Each name and alias entry must have a corresponding entry in the sqlhosts file to specify the protocol, host, and port for the listen thread. Each listen thread will be assigned to a port defined by the unique combination of a host name and service name entry in the sqlhosts file.
tlitcplst # ONCONFIG DBSERVERNAME durgashm DBSERVERALIASES durgasoc
sm_listen
ipcstrlst
Tuning Client Server Communication 9-13
Listen Thread and sqlexec Thread The listen thread: n authenticates the user n establishes a connection to the database server n starts an sqlexec thread
Receive Connection Request From Poll Thread
Listen thread
Start sqlexec thread
sqlexec thread
14
The listen thread will perform user authentication, set up the session and thread control blocks, and start an sqlexec thread for the new session.
9-14 Tuning Client Server Communication
Poll and sqlexec Threads The poll and sqlexec threads work together reading from and writing to the client. application Send data to application
Read data from client
Poll thread
Pass request and data to sqlexec thread
sqlexec thread
process
Wait for client request 15
Once a user is connected and a session control block and thread control block have been set up, the poll threads and sqlexec threads will service all requests. Network poll threads will check for incoming messages on the specified port, place the message in the global pool of the IBM IDS virtual segment, and then signal the sqlexec thread. The global pool stores structures that are global to the database server, including the message queues where poll threads for network communications deposit messages from clients. Since shared memory connections allow the application to execute a system call and attach directly to the IDS message segment, shared memory poll threads simply poll the appropriate shared memory addresses for messages and signal the sqlexec thread. Once the requested data is retrieved, the sqlexec thread is generally responsible for writing the data to the client, without the help of the poll thread.
Tuning Client Server Communication 9-15
Shared Memory Connections For ipcshm connections, the client application attaches directly to the shared memory message portion
Shared Memory
application process
Message Segment
application process
16
If IBM IDS is configured to allow shared memory connections (ipcshm), at start up a message segment is allocated with public read and write permissions. 1. 2. 3.
Each client is allocated a limited number of buffers in the Message Segment. The server is allocated the same number of buffers for communication with the client. The size of the shared memory segment is determined by the number of users configured per poll thread. The number of message segments is determined by the number of shared memory poll threads configured.
9-16 Tuning Client Server Communication
In-Line Poll Threads Shared Memory
application process sqlexec thread
global pool
(NET)VP oninit process
application process sqlexec thread
message segment
in-line poll thread poll thread
CPUVP oninit process 17
A poll thread may be run on a CPUVP or a NET class VP. The NET class VPs are SHMVP, TLIVP, STRVP, and SOCVP. Each VP may run only one poll thread, so if you configure 4 poll threads to be run on NET class poll threads for tli communications, IDS will start 4 TLIVPs. When a poll thread is run on a CPU class VP, it is called an in-line poll thread. If multiple CPUVPs are configured (NUMCPUVPS > 1), multiple poll threads may be run inline. Poll threads run in-line may sometimes be more efficient because the CPUVP may be able to receive the message and perform the thread switch without having to signal another process. (Remember, each VP is a process at the Unix level.)
Note The poll threads that run on CPUVPs (in-line poll threads) can be all of the same protocol or of different protocols. If you configure more poll threads to run on CPUVPs then there are CPUVPs, the engine will automatically run them on NET class VPs.
Tuning Client Server Communication 9-17
Tune Poll Threads For OLTP Applications For OLTP applications, ensure that you have configured adequate poll threads for your users.
database
18
For OLTP applications, you want to ensure that you have configured adequate poll threads for your users. OLTP applications generally send a large number of messages and put more demands on the poll thread. For OLTP applications, one poll thread should be configured for each CPUVP and that poll thread should be run in-line, that is on a CPU class VP. For example, a server with 4 CPUVPs and supporting 200 users connecting via shared memory, might have the NETTYPE configuration: NETTYPE
ipcshm,4,50,CPU
For DSS applications, which generally submit a very small number of very large requests, one poll thread is generally adequate. Remember, the setting for NETTYPE represents the communication protocol, the number of poll threads, the number of users per poll thread, and the VP class on which to run the poll threads.
Tip
In the example above, 4 poll threads, each accommodating 50 users, meet the requirement of 200 users.
9-18 Tuning Client Server Communication
Adding Multiple Listen Threads $INFORMIXDIR/etc/sqlhosts or INFORMIXSQLHOSTS=/home/informix/sqlhosts1
durga_tcp1
ontlitcp
denver1
port1
durga_tcp2
ontlitcp
denver1
port2
durga_tcp3
ontlitcp
denver1
port3
$INFORMIXDIR/etc/onconfig.durga /etc/services DBSERVERNAME DBSERVERALIASES
durga_tcp1 durga_tcp2,durga_tcp3
19
Occasionally, if a large number of clients attempts to connect concurrently through a single network port, the listen thread may not be able to service all the new connection requests quickly enough to satisfy user expectations. To resolve this problem, you can start multiple listen threads to service new connections. To start additional listen threads, you must add multiple DBSERVERALIASES for that protocol and corresponding entries in your sqlhosts file. Each sqlhosts entry will include a unique DBSERVERNAME and port number. The preceding example demonstrates the setup. Notice in the example that even though three ports are listed, they are all associated with the same host name and protocol.
Tuning Client Server Communication 9-19
Exercise 1 1.1 Modify your configuration and start a poll thread for each protocol supported by your instance. 1.2 Run one poll thread in-line and the others on NET class VPs. 1.3 Restart your instance and use onstat -g glo to determine the total number of VPs running. How many are NET class VPs? _____________________________________________________________ 1.4 Use onstat -g ath to determine the total number of network related threads running. List them below. _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ 1.5 Modify your configuration and run all poll threads in-line. 1.6 Using the reference materials provided, identify the onstat options for monitoring client server communications. List them below. _____________________________________________________________ _____________________________________________________________ _____________________________________________________________
9-20 Tuning Client Server Communication
Remote Client Server Challenges In remote environments, network capacity introduces additional opportunities for performance tuning: n Is network capacity sufficient? n Can additional network cards and subnets be added? n Are multiple poll and/or listen threads required?
21
In remote client server environments, the physical capacity of the network constrains the performance potential. First, you must understand how to monitor network capacity, and second, how to optimize the network setup for communications between the Informix Dynamic Server and the client applications.
Tuning Client Server Communication 9-21
Monitoring Network Capacity Use the UNIX netstat utility to monitor network: n Collisions >= 1% of output packets indicates too high a collision rate. Identify ways to reduce or redistribute network traffic or increase bandwidth. n Input errors greater than 0.25% of input packets may indicate faulty hardware. n Output errors greater than 0.25% of output packets may indicate a problem with the network interface of the machine on which netstat was run.
22
An overloaded network can delay messages passed between the client application and database server. When a network is overloaded, systems will most likely try to send packets over the network at the same time, causing collisions. If the network software detects a collision, it will resend the packet. A certain amount of collisions within a network is normal. However, if network collisions are excessive over a long period of time, consider reconfiguring your network. You can use the Unix netstat utility to monitor network traffic. Check your Unix system documentation for the correct syntax. The tuning items to monitor are input packets (ipkts), output packets (opkts), input errors (ierrs), output errors (oerrs), and collisions (coll). Unix netstat statistics are cumulative since the last system boot. To calculate ratios for tuning purposes, you will need to compare before and after netstat statistics for the time period being monitored.
Tip On Sun Solaris platforms, you can use netstat -a to verify established connections. Use netstat -i to monitor collisions, input errors, and output errors.
9-22 Tuning Client Server Communication
Monitoring Network Activity for User/Threads onstat -g ntu global network information: #netscb connects read write 9/ 19 5311 218021 208021
q-free 10/ 13
q-limits ... 135/ 10 ...
Individual thread network information (basic): netscb type thread name sid fd poll reads c107880 tlitcp sqlexec 5223 7 5 1128 be91e20 tlitcp sqlexec 5096 4 5 60516 d0fbaa0 tlitcp sqlexec 4990 6 5 139 be9dbd0 tlitcp sqlexec 4676 9 5 1539 communication protocol for this thread
user session id
writes 1132 60516 139 1545
... ... ... ... ...
number of network packets to/from IDS 23
The command onstat -g ntu displays network activity for active user threads. By monitoring the output over time, you can determine the high activity users and protocols.
Tuning Client Server Communication 9-23
onstat -g ntt onstat -g ntt global network information: #netscb connects read write 8/ 8 23 2649 2561
q-free 2/ 2
q-limits ... 135/ 10 ...
Individual thread network information (times): netscb thread name sid open read a4ddfe0 sqlexec 35 12:35:23 12:40:35 a4f3a88 sqlexec 34 12:35:15 12:35:16 a4db7c8 sqlexec 28 12:34:31 12:34:31 a4dc250 sm_discon 8 08:20:46
write 12:40:35 12:35:16 12:34:31
user session id
most recent read or write from/to server
sorted column: shows when a client connected to the server
address
24
If a client contacts the system or database administrator and reports that the system appears “frozen”, you can use onstat -g ntt to investigate. The output shows when (open) a client (sid) connected, and when the last communication (read and write) from this client occurred. The output will show if the client is currently communicating, or that the client has not communicated with the database server for a while. The second observation might mean a dropped connection, or that the client application has gone into an infinite loop. And because the display is sorted current to oldest, it is easy to determine which connections are the most current.
Note Two other useful onstat options show the number of buffers allocated and the number of reads performed by the client or session: n
onstat -g nsc for monitoring shared memory communications by client, and
n
onstat -g nss for monitoring shared memory communications by session.
9-24 Tuning Client Server Communication
Checking For IBM IDS Reject Entries onstat -g ntd Client Type sqlexec ... oncheck ... srvstat listener onutil Totals
Calls yes ... yes ... yes yes yes
Accepted 5300 ... 1 ... 10 0 0 5311
specifies the type of client connection
Rejected 0 ... 0 ... 0 0 0 0
Read 212694 ... 6 ... 10 5311 0 218021
Write 205363 ... 12 ... 20 0 0 205395
specifies the number of accepted/rejected requests 25
You can use onstat -g ntd to check for a large number of reject entries. These occur when: n
The user cannot connect because of a user table overflow. You may verify user table overflow by looking for ovuserthreads greater than 0 in onstat -p output.
n
The connection receives a network time-out error.
If network time-outs are occurring because the network is busy, you may increase the number of times that the application will retry the connection by setting the following variables in the client’s environment prior to the attempt to connect: INFORMIXCONTIME
specifies the time limit, in seconds, during which an application will try to connect to the server. The default value is 15 seconds.
INFORMIXCONRETRY
specifies the number of times the application will retry the attempt to connect.
The environment variables INFORMIXCONTIME and INFORMIXCONRETRY values are used together. For example, if the values are set to the following: export INFORMIXCONTIME=600 export INFORMIXCONRETRY=10
The client will retry 10 times, once every 60 seconds.
Tuning Client Server Communication 9-25
Network Configuration Options If network capacity has been reached, consider using subnets:
26
One way to reduce network congestion is to distribute the network traffic across multiple subnets. This configuration can also improve total fault tolerance by providing multiple access paths to the host. Connecting multiple subnets usually requires adding multiple ethernet cards to the server. Once the subnets have been added to the configuration, you have two options for configuring IBM IDS to communicate across all available subnets. The next two pages provide details on each option.
9-26 Tuning Client Server Communication
Multiple Network Card Configuration: Method 1 $INFORMIXDIR/etc/sqlhosts or INFORMIXSQLHOSTS=/home/informix/sqlhosts1
durga_tcp1
ontlitcp
denver1
port1
durga_tcp2
ontlitcp
denver2
port2
/etc/hosts /etc/services $INFORMIXDIR/etc/onconfig.durga
DBSERVERNAME DBSERVERALIASES
durga_tcp1 durga_tcp2
$INFORMIXSERVER=durga_tcp1 27
The first method is required for setting up communications through multiple network cards for versions of IBM IDS prior to 7.10.UD1. Using this method, the DBA must: n
Put one entry per network card in /etc/hosts with a separate address. 192.146.101.1 hostname1 192.146.102.2 hostname2
Additionally, IBM IDS will require a unique service to be associated with each host name: n
Add one entry per ethernet card in /etc/services. service1 2100/tcp service2 2101/tcp
Next, associate each service and host name combination with an IBM IDS connection. To do this: n
Add an entry to the sqlhosts file for each subnet. Refer to the preceding example.
n
In the ONCONFIG file, enter the DBSERVERNAME for one of the sqlhosts entries and a DBSERVERALIASES for the other sqlhosts entry.
Once this configuration is in place, the application will communicate through the ethernet card that is assigned to the dbservername given by the INFORMIXSERVER environment variable. The primary benefit of this method is that one listen thread is automatically started for each port. Tuning Client Server Communication 9-27
Multiple Network Card Configuration: Method 2 Use * to instruct IDS to make a TCP system call to find all IP addresses to be polled. $INFORMIXSQLHOSTS or $INFORMIXDIR/etc/sqlhosts stu201_tcp
ontlitcp
*192.146.101.1
1525
28
The method above is only supported with releases 7.10.UD1 and greater. While setup for multiple ports is much simpler with this method, it has one significant drawback: only one listen thread will be started. Since the listen thread is responsible for user authentication and setup of the session control block and thread control block in the virtual segment of shared memory, a single listen thread may not be sufficient for a large multi-user environment. To start multiple listen threads, one for each network card, use the method shown on the previous slide.
9-28 Tuning Client Server Communication
Tuning Network Buffers Tuning the network buffer(s): n sqlhosts file stu201 stu201_tcp n
onipcshm ontlitcp
train5 train5
stu201 stu201_tcp
b=8192
Environment variables: w IFX_NETBUF_SIZE w IFX_NETBUT_PVTPOOL_SIZE Be sure to configure the same size communications buffer on both the client and server 29
Beginning with the Informix Dynamic Server version 7.10.UD1, you may specify the size of the network buffer used by the client and server using the TCP/IP protocol. A fifth, optional field in the sqlhosts file specifies the buffer size in bytes. The default size is 4096, which is generally sufficient. However, for applications that retrieve many BLOBS or large rows, you may improve performance by increasing the network buffer size. For example, an application that selects many BLOBS stored in 10 KB blobpages might benefit from a buffer setting of 10240. Note that you may also increase the size of the network buffer with the IFX_NETBUF_SIZE environment variable. The number of network buffers can be increased above the default value of one with the IFX_NETBUF_PVTPOOL_SIZE environment variable. The advantages of multiple private network buffers include less contention for the common network buffer pool, and fewer CPU resources required in the allocation and deallocation of network buffers from the common network buffer pool for network data transfers Be aware that the network buffer is allocated for each user. Specifying a network buffer of 16384, or 16 KB, on a system with 1000 users might cause the server to allocate up to 16000 KB, nearly 16 MB, just for network communications. If insufficient memory is available to allocate the buffer when a user requests a connection, the user may receive a connection-reject error. For many Unix systems, the maximum supported TCP/IP buffer size is 16 KB. Do not set the network buffer size larger than the maximum supported by your system.
Tuning Client Server Communication 9-29
Exercise 2 As you recall, the FET_BUF_SIZE environment variable determines the size of the fetch buffer allocated for prepared cursors. A larger FET_BUF_SIZE may improve performance slightly for cursors that select many very large tuples, but too large a FET_BUF_SIZE may offset performance gains by utilizing too much memory. The default size of the cursor fetch buffer is 4096 bytes. In this exercise, you will use the utility prepmem to test the memory utilization for various settings of the FET_BUF_SIZE environment variable. Prepmem allows you to specify an SQL statement to prepare and the number of times to prepare it. Prepmem calculates the amount of memory that will be allocated in both the client application and the server based on various settings of FET_BUF_SIZE. 2.1 Create an instance of the demonstration database with logging using this syntax: dbaccessdemo7 stores7 -dbspace dbspace1 -log
When prompted to create sample sql scripts, type n. 2.2 Execute the following command: prepmem -d stores7 -s “select * from customer” -c -n30
This will prepare the SQL statement, “select * from customer”, 30 times using the default fetch buffer size of 4096. 2.3 Record the output from prepmem. 2.4 Increase the FET_BUF_SIZE to 8192. If you are using ksh, you may execute: export FET_BUF_SIZE=8192
2.5 Repeat steps 1 and 2. 2.6 Increase the FET_BUF_SIZE to 16384. export FET_BUF_SIZE=16384
2.7 Repeat steps 1 and 2. 2.8 Review the output from each successive execution of prepmem. 2.9 How does the amount of memory allocated by the application and the amount of memory allocated by the server differ? 2.10 When would increasing FET_BUF_SIZE above the default value of 4096 make sense? 2.11 When would a larger FET_BUF_SIZE not be a good idea?
9-30 Tuning Client Server Communication
Exercise 3 In this exercise, you will add additional listen threads for the TCP/IP connection to your server. 3.1 Your instructor will provide you with two additional TCP/IP services or port numbers: _____________________________________________________ 3.2 3.3 3.4 3.5 3.6
_____________________________________________________ Add two new entries to your $INFORMIXSQLHOSTS file referencing these ports or services. Add corresponding DBSERVERALIASES entries to your $ONCONFIG file. Restart you server. Use onstat -g ath to verify that three tcp/ip listen threads are running. Execute the command onstat -g ntt. What can you tell from the output?
Tuning Client Server Communication 9-31
Distributed Database Applications
CONNECT TO storesA@siteA_tcp; SELECT * FROM customer, storesB@siteB_tcp:orders WHERE customer.customer_num = orders.customer_num;
storesA@siteA:customer
storesB@siteB:orders
32
Distributed database communications offer special performance challenges. The coordinating site (the site that originally received the query) is responsible for parsing the query and passing the portion referencing the remote site(s) to that site for processing. For optimum performance in distributed database environments: n
Optimize table and database distribution by locating tables at the site where they are most frequently referenced. For example, if the distributed databases manage equipment rentals, localize the tables such that each location tracks the equipment stored locally. This way a distributed join or select is only required when the equipment is not available at the current location.
n
Duplicate highly active reference tables across sites to reduce distributed join overhead. Given the equipment rental example above, even if equipment rental rates are the same nationwide, replicate the rate table at each site to eliminate a distributed join for every rental item. The new Enterprise Replication features of the Informix Dynamic Server can simplify the task of synchronizing the replicated tables.
9-32 Tuning Client Server Communication
Exercises
Tuning Client Server Communication 9-33
Exercise 4 4.1 Pairing team 1 with team 2 and team 3 with team 4, execute the following commands. (Note: Each team must build the stores7 database in their instance using the command dbaccessdemo7 team# so that each database has a unique name.) set explain on; select company, order_num, sum (total_price) order_amt from customer c, orders o, @:items i where c.customer_num = o.customer_num and o.order_num = i.order_num group by order_num, company;
4.2 What does the output of the SET EXPLAIN command tell you about how the query will be processed?
9-34 Tuning Client Server Communication
Solutions
Tuning Client Server Communication 9-35
Solution 2 2.1 Create an instance of the demonstration database with logging using this syntax: dbaccessdemo7 stores7 -dbspace dbspace1 -log
When prompted to create sample sql scripts, type n. 2.2 Execute the following command: prepmem -d stores7 -s “select * from customer” -c -n30
This will prepare the SQL statement, “select * from customer”, 30 times using the default fetch buffer size of 4096. 2.3 Record the output from prepmem. The sample output shown here is from a test conducted by an Informix ATG member with version 7.1. Your results may vary slightly FET_BUF_SIZE = 4096 N
sbrk
Bytres
K bytes
10
4001D000
45056
44.00
0.04
76936
20
40028000
90112
88.00
0.09
150904
29
40033000
135168
132.00
0.13
215320
Average bytes per application prepare
4505
Average bytes per engine prepare
7177
M bytes Sess Bytes
2.4 Increase the FET_BUF_SIZE to 8192. If you are using ksh, you may execute: export FET_BUF_SIZE=8192
2.5 Repeat steps 1 and 2. FET_BUF_SIZE = 8192 N
sbrk
Bytres
K bytes
10
40029000
94208
92.00
0.09
76936
20
4003E000
180224
176.00
0.17
150904
29
40052000
262144
256.00
0.25
215344
Average bytes per application prepare Average bytes per engine prepare
9-36 Tuning Client Server Communication
8738 7178
M bytes Sess Bytes
2.6 Increase the FET_BUF_SIZE to 16384. export FET_BUF_SIZE=16384
2.7 Repeat steps 1 and 2. FET_BUF_SIZE = 416384 N
sbrk
Bytres
K bytes
10
40041000
184320
180.00
0.18
76936
20
4006A000
352256
1344.00
0.34
150904
29
40090000
507904
496.00
0.48
215344
Average bytes per application prepare Average bytes per engine prepare
M bytes Sess Bytes
16930 7178
2.8 Review the output from each successive execution of prepmem. 2.9 How does the amount of memory allocated by the application and the amount of memory allocated by the server differ? The memory allocated in the server is consistent, but memory allocated in the application increases with each increase in FET_BUF_SIZE. 2.10 When would increasing FET_BUF_SIZE above the default value of 4096 make sense? When an application uses a cursor to retrieve many large rows. 2.11 When would a larger FET_BUF_SIZE not be a good idea? When memory on the computer executing the application is limited and the application maintains many open cursors. Typically, the default buffer size will provide the optimal performance.
Tuning Client Server Communication 9-37
9-38 Tuning Client Server Communication
Module 10 Managing Large Object Space
Managing Large Object Space 07-2002 © 2002 International Business Machines Corporation
10-1
Objectives At the end of this module, you will be able to: n Understand the three different types of large object storage space n Understand the basics of Smart Large Objects and how they are processed by the database server n Understand the structure of sbspaces n Understand how to monitor sbspaces
2
Extensibility, which includes smart large objects, was introduced with IBM Informix Dynamic Server 9.x.
10-2 Managing Large Object Space
Information Explosion Business, Government and Educational institutions now being overwhelmed with complex digital computer files containing documents, graphics, spreadsheets, HTML pages, video clips, etc. Traditional databases not well suited to storing these as objects that can be accessed easily.
n
n
3
The introduction of relatively inexpensive computer systems into an information-based economy has led to an explosion of digitally-encoded computer files for storing a plethora of types of data: n
Documents in numerous incompatible formats: w
n
n
Electronic spreadsheets: w
MS Excel
w
Lotus 1-2-3
Graphic images: w
n
MS Word, MS Power Point, Adobe FrameMaker
JPEG, GIF
Video clips, w
MPEG, MPEG-II and MPEG-III files
Managing Large Object Space 10-3
Large Object Storage Options n
n
Large objects can be stored in any combination of the following: w File system of host operating system w Blobspaces: TEXT and BYTE w Sbspaces: CLOB and BLOB Requirements will determine your choice w features w load balancing w system throughput w average response times w storage efficiency
4
IBM IDS 9.x is a highly flexible Database Management System that provides a number of different mechanisms for storing digitally-encoded computer files that we will refer to as objects. This powerful capability not only allows for the efficient management of the current wide diversity of different types of objects currently in use today, but also provides a flexible mechanism for incorporating types of objects that will be developed in the future. n
You can identify an object as a file and provide a pathname to it
n
You can identify a file as a simple large object and store it either within the pages of a database table or in a separate type of dbspace specifically designed for this purpose (blobspace).
n
You can identify an object as a smart large object and store it in a dbspace specifically designed for this purpose (sbspace). Smart large objects are a category of large objects, including BLOB and CLOB, that store text and images, are stored and retrieved in pieces, and have database properties such as crash recovery and transaction rollback.
10-4 Managing Large Object Space
Host File System Example Large, non-volatile binary files may be efficiently stored in host file system n
n n
University of Pennsylvania Online Books Page stores 16,000 books in electronic form w PDF of Charles Dickens “Bleak House” is 3.7 MB; “Pickwick Papers” 3.44MB; “Great Expectations” 2.1 MB Book files are searchable by standard table attributes: Author, Title, Subject Content of book files are not searchable 5
Large files stored on the file system of the host operating system can be useful when files are: n
large
n
static, rarely or never updated
n
content of files do not require search capability
n
long-lived: generally not subject to deletion
Disadvantages: n
File system space generally managed by system administrator, not the database administrator
Diagram at right shows structure of files under Unix. Large files generally will be allocated in several data blocks on disk.
Managing Large Object Space 10-5
Simple Large Objects TEXT for storing characterbased information: n Source code n Other flat ASCII files
BYTE for storing non-characterbased information: n Digitized images n Digitized sound n Spreadsheets n Any file of arbitrary byte values
6
Simple large ob jects, or binary large objects (blobs), are streams of bytes of arbitrary value and length. A simple large object might be a digitized image or sound, an object module or a source code file. Anything that you can store in a file system of a computer can be stored in a simple large object. The theoretical limit to their size is 2 gigabytes; this size is based on the highest value that can be held in a 4-byte integer. There are two types of simple large objects: TEXT and BYTE. The TEXT data type is used for the storage of printable ASCII text such as source code and scripts. The BYTE data type is used for storing any kind of binary data such as saved spreadsheets, program load modules, and digitized images or sound.
10-6 Managing Large Object Space
Blobspaces A blobspace is a logical collection of chunks
Blobspace
n
Chunk Chunk
n n
A pool of disk space that tables created by the server can use to store simple large objects w TEXT: ASCII file, such as source code w BYTE: binary encoded file, such as an MSWORD or JPEG file A blobspace must have a least one chunk A blobspace can have many chunks
7
When using BYTE and TEXT data types in the server, you have the option of locating them in blobspaces. Blobspaces are similar to dbspaces in that they consist of chunks. The difference is that instead of storing rows and index pages, blobspaces are dedicated to storing simple large objects. A blobspace can hold blobs from one or more tables, just as dbspaces can hold tblspaces from one or more tables. When you create a blobspace, you specify the size of all the blobpages within that blobspace. Thus, all the blobpages in a blobspace are the same size. Storing simple large objects in blobspaces is generally more efficient because I/O is done as a group or block of pages, rather than as individual server pages.
Managing Large Object Space 10-7
Blob Pages 36-byte blobpage header
blob page size = 20KB (10 x 2K system pages)
blob data = 17KB
unused space
8
A blobpage is the basic unit of storage for blob data types stored in a blobspace. The size of the blobpage can be configured to be a multiple of the system page size; that is, a single blobpage can be made up of several system pages. Because blobpages are allocated as contiguous space, it is more efficient to store blob data types in blobpages that are as close to the size of the blob data as possible. This can decrease the amount of I/O required to read the blob. When a blob is stored, as many blobpages as needed to hold that blob value are allocated. A blob value on disk can be spread across many blobpages. Only the space within a single blobpage is guaranteed to be contiguous; if multiple blobpages are needed to store a blob value, the entire value might not be stored contiguously on disk. At most only one object can be stored per blob page (or part of one object if many blob pages are required to store the object.)
10-8 Managing Large Object Space
Blobspace Storage Sizing Blobspaces have fixed page sizes: use different blobspaces to handle different size ranges of blobs 8 KB Thumbnail
photo_spc1 photo_spc2
50 KB Standard Image Size (average)
photo_spc3
120 KB (800 x 600 Size Image) (average) 9
When you are evaluating blobspace storage strategy, you can measure efficiency by two criteria: n
Blobpage fullness
n
Blobpages required per simple large object
Blobpage fullness refers to the amount of data within each blobpage. Individual TEXT and BYTE objects stored in a blobspace cannot share blobpages. Therefore, if a single simple large object requires only 20 percent of a blobpage, the remaining 80 percent of the page is unavailable for use. However, avoid making the blobpages too small. When several blobpages are needed to store each simple large object, you increase the overhead cost of retrieval time, because a separate I/O request is required for each blob page.
Managing Large Object Space 10-9
Blobspace Page Sizing Size the blobspace so that 90+% of the objects in the same size range can be stored in one blob page
1 standard deviation
Number of Images
2000
2 standard deviations 10 28K
36K
44K
52K
60K
68K
76K
Medium Range Images: Image Size in KBytes 10
Once you have determined the general size ranges of your dbspaces, you should calculate a blob page size that will allow a single page to accommodate a large majority of the objects within that size range. There are statistical programs that will read an input file of numbers and generate the mean and standard deviations for this sampling. You could write an awk script to generate these numbers from the Unix ls -al command against a directory that contains your binary objects. The slide above shows an example of what might be computed for a directory and its subdirectories of about 20,000 images. A statistical analysis determines that the mean is 52KB and the standard deviation is 8KB. n
If you choose a blob page size = mean + 1 standard deviation (52KB + 8KB) or 60KB then approximately 2/3rds of your objects would fit on one blob page and the remaining 1/3 would fit in two blob pages. This is a relatively efficient use of storage space but has a performance penalty of a second I/O request for one third of the objects.
n
If you choose a blob page size = mean + 2 standard deviations (52KB + 16KB) or 68KB then perhaps 95% of your objects would fit on one blob page. This would result in an overall performance improvement but would come at an increase in storage costs.
10-10 Managing Large Object Space
Storing TEXT and BYTE Data You can store data for a TEXT or BYTE column with the table or in a separate blobspace: CREATE TABLE employee_resume (
fname lname
CHAR(15), CHAR(15),
phone recd_date contact_date
CHAR(18), DATETIME YEAR TO HOUR, DATETIME YEAR TO HOUR,
comments vita photo
VARCHAR(250, 100), TEXT IN TABLE, BYTE IN photo_spc2
dbspace1
) IN dbspace1;
11
A TEXT or BYTE value is always stored apart from the rows of the table; only a 56-byte descriptor is stored with the row. However, a simple large object occupies at least one disk page. The simple large object to which the descriptor points can reside in the same set of extents on disk as the table rows (in the same tblspace) or in a separate blobspace. When simple large objects are stored in the tblspace, the pages of their data are interspersed among the pages that contain rows, which can greatly increase the size of the table. When the database server reads only the rows and not the simple large objects, the disk arm must move farther than when the blobpages are stored apart. For best performance, store a simple-large-object column in a blobspace in either of the following circumstances: n
When single data items are larger than one or two pages each
n
When the number of pages of TEXT or BYTE data is more than half the number of pages of row data
Managing Large Object Space 10-11
Blobpage Storage Statistics oncheck -pB stores_demo:sriram.catalog BLOBSpace usage: Space Page Percent Full Name Number Pages 0-25% 26-50% 51-75% 76-100% ------------------------------------------------------------blobPIC 0x300080 1 x blobPIC 0x300082 2 x --------Page Size is 61443 bspc1 0x2000b2 2 x bspc1 0x2000b6 2 x --------Page Size is 20484
12
The oncheck -pB command displays statistics that describe the average fullness of blobpages. These statistics provide a measure of storage efficiency for individual simple large objects in a database or table. If you find that the statistics for a significant number of simple large objects show a low percentage of fullness, the database server might benefit from changing the size of the blobpage in the blobspace. Both the oncheck -pB and onstat -d update outputs display the same information about the number of free blobpages. For information about onstat -d update, see managing disk space in the Administrator’s Guide. Execute oncheck -pB with either a database name or a table name as a parameter. The example shown in the slide above retrieves storage information for all simple large objects stored in the table sriram.catalog in the stores_demo database.
10-12 Managing Large Object Space
Using oncheck -pT oncheck -pT mydemo:sriram.catalog Type Free ... Data (Home) Data (Remainder) Tblspace BLOBs
Pages 7
Empty
9 0 0 5 0 --------24
Semi-Full Full
Very-Full
0 0
0 4
0 1
Total Pages Unused Space Summary Unused data bytes in Home pages Unused data bytes in Remainder pages Unused data bytes in Tblspace Blob pages
3564 0 1430
13
Use oncheck -pT to monitor dbspaces to determine the number of dbspace pages that TEXT and BYTE data use. This command takes a database name or a table name as a parameter. For each table in the database, or for the specified table, the database server displays a general tblspace report. Following the general report is a detailed breakdown of page use in the extent, by page type. See the Type column for information on TEXT and BYTE data.
Managing Large Object Space 10-13
Smart Large Objects Smart Large Objects include: 1. CLOB and BLOB built-in types 2. Third party indexes 3. User-defined data types (UDTs)
Logical Network Bill of Materials
Graphical Node Location
Documents 14
There are three primary uses for smart large objects in IBM IDS: n
n
n
The CLOB and BLOB types are user-defined data types for storing a large amount of data. These types are created automatically for each database and are stored in the sysxtdtypes catalog. Third party applications might use smart large objects as a place to store data for an index. This index would only be manipulated by the datablade that created it. The new byte level locking makes this an even more viable use for smart large objects. Smart large objects provide an ideal storage method for large user-defined data types (UDTs).
10-14 Managing Large Object Space
Smart Large Object Processing Shared Memory Resident Portion
Buffer Cache Smart Large Object Space
15
Prior to IBM IDS 9.x, large digitized assets such as documents or images were usually stored in BLOBSPACE chunks as opaque objects, and information about these objects (metadata) was limited to that which could be stored as column values in standard data tables. With IBM IDS 9.x, the digitized large object can now be stored within a special type of dbspace called a a smart space (sbspace). In this case the large object is treated much like the actual data within the rows of a table and hence is “visible” or transparent to the server. The object is read into the server’s buffer cache in pages that are the same size as standard database tables. As such the internal contents of the object can be indexed, scanned or otherwise examined in detail by the server. A Smart LO page has the same format as a dbspace data page that contains rows of data. It has a twenty-four-byte header and a four-byte timestamp trailer. A smart LO can span multiple pages within the sbspace. The LO is simply stored in pieces that fit on a page. The pages are read and written by the smart LO subsystem via the dynamic buffer manager. I/O for the metadata subsystem, on the other hand, is handled by the regular RSAM I/O subsystem.
Managing Large Object Space 10-15
Sbspace An sbspace is a logical collection of chunks
SBSPACE
n
Chunk Chunk
n n
A pool of disk space that tables created by the server can use to store smart large objects w CLOB: ASCII file, such as source code w BLOB: binary encoded file, such as a PDF or GIF file w User-defined data types An sbspace must have a least one chunk An sbspace can have many chunks 16
Like other storage spaces in Dynamic Server, sbspaces are comprised of one or more chunks.
10-16 Managing Large Object Space
Advantages of Sbspaces n n n n
They have read, write, and seek properties similar to a standard Unix file They are recoverable They obey transaction isolation modes Smart large objects within table rows do not need to be retrieved in one statement
17
Sbspaces have a number of advantages over blobspaces: n
Sbspaces have read, write, and seek properties similar to a standard UNIX file, so programmers can use functions similar to UNIX and Windows functions to read, write, and seek smart large objects. Dynamic Server provides this smart-large-object interface in the following: w
DataBlade API (see the DataBlade API Function Reference manual)
w
ESQL/C programming interface (see the IBM INFORMIX-ESQL/C Programmer’s Manual).
n
Sbspaces are recoverable, so you can log all write operations on data stored in sbspaces. You can commit or rollback changes if a failure occurs during a transaction.
n
Sbspaces obey transaction isolation modes, so you can lock smart large objects at different levels of granularity, and the lock durations obey the rules for transaction isolation levels.
n
Smart large objects within table rows do not need to be retrieved in one statement, so an application can store or retrieve smart large objects in pieces using either the DataBlade API or the ESQL/C programming interface.
Managing Large Object Space 10-17
Sbspace Extent An sbspace extent consists of a collection of contiguous, smart-large-object data pages
sbspace1
Page 0 Bitmap
Page 1 Data
Page 2 Data
Page 3 Data
Page 4 Data
Page 5 Data
Page 6 Data
Page 7 Data
Page 8 Data
Page 9 Free
Page 10 Free
Page 11 Free
Page 12 Free
Page 13 Free
Page 14 Free
Page 15 Free 18
Smart large object pages have the same size and format as standard system data pages, and these system pages are allocated in contiguous units called extents, just as for standard tables.
Page header
Slot table
Timestamp
The database server allocates the entire smart large object as one extent.
10-18 Managing Large Object Space
Using Sbspace: Datablades n n n n n
DataBlades extend the functionality of your database server New data types (spatial, text, video, etc.) Plug-ins that function as integral part of the server Incorporate new data types to your business without have to start from scratch Example DataBlades: w w w w w
Excalibur Image and Text Search Geodetic and Spatial TimeSeries, NAG and RealTime Video and Image Foundation Text, C-ISAM and Web 19
A DataBlade module is a collection of database objects and code that extend IBM IBM Informix Dynamic Server by adding new functionality. A DataBlade module can enable the server to provide the same level of support for new data types as it provides for built-in data types. Developing a DataBlade module involves defining data types and the routines that operate on them, writing support code for the data types and client applications, testing, documenting, and packaging the module.
Managing Large Object Space 10-19
Sbspace Logging n n
n
DataBlade modules support only logged tables By default, sbspaces are not logged w metadata area of all sbspaces is always logged sbspaces for the Excalibur Text Search DataBlade module must be logged w Specify the -Df LOGGING=ON option of the onspaces command when you create the sbspace
20
By default, sbspaces are NOT logged. Lack of logging can be detrimental. n
DataBlade modules support only logged tables.
n
Rollback functionality. The index for an update will not be rolled back if a rollback occurs during the update operation. The data, however, is rolled back.
n
Index and table out of sync. By logging both the table and the sbspace, the table and index are kept in sync. If they become out of sync, erroneous results can be returned.
You can turn logging on and off for an sbspace using the onspaces utility. To turn on logging for an sbspace, add it as a default option (-Df) when you create the sbspace: onspaces -Df 'LOGGING=ON'
Excalibur Text Search DataBlade module automatically turns logging off while an index is being created. Once the index has been created, logging for the sbspace is automatically turned back on.
10-20 Managing Large Object Space
Sbspace Metadata
Chunk One Chunk header pages User data
Metadata area Sbspace-descriptor partition Chunk-adjunct partition Level 1 archive partition Level 2 archive partition
Metadata
Chunk one LO header partition Chunk one use-data free-list partition
User data
21
A smart blobspace, or sbspace, is a logical collection of chunks that is used to store smart large objects (also called smart LOs or smart blobs). When an sbspace is initially created, it assigns space for header data, metadata, and user data. There is always a metadata area stored within the first chunk of an sbspace. The default location is near the middle of the chunk to optimize access time. The location and size can be determined during the creation of the sbspace. A metadata area can contain information for one or more chunks within the sbspace. Once it is allocated, the metadata area cannot be changed in size or location. Any space remaining after the header pages and the metadata are allocated is used for storing user data, that is, the smart large objects.
Managing Large Object Space 10-21
Allocating Additional Chunks to Sbspace If you do not specify the onspaces -U option, the database server allocates another metadata area on the new chunk (the default).
SBSPACE
Chunk1 Chunk2 22
The first chunk of an sbspace must have a metadata area. When you add smart large objects, the database server adds more control information to this metadata area. If you add a chunk to the sbspace after the initial allocation, you can allocate another metadata area on the new chunk by default. This action provides the following advantages: n
It is easier because the database server automatically calculates and allocates a new metadata area on the added chunk based on the average smart large object size
n
Distributes I/O operations on the metadata area across multiple disks. The metadata pages in an sbspace contain information about the location of the smart large objects in the sbspace. Typically, these pages are read intensive.
In addition, the database server reserves 40 percent of the user area to be used in case the metadata area runs out of space. Therefore, if the allocated metadata becomes full, the database server starts using this reserved space in the user area for additional control information. You can let the database server calculate the size of the metadata area for you on the initial chunk and on each added chunks.
10-22 Managing Large Object Space
Sbspace Example Dbspaces address number flags fchunk nchunks flags a1ce558 1 1 1 1 N a20c6c8 2 8001 2 1 N S a20c810 3 1 3 1 N a20c958 4 8001 4 1 N S 4 active, 2047 maximum Chunks address chk/dbs offset size free bpages flags a1ce6a0 1 1 0 10000 720 POa20c2a8 2 2 0 2500 2229 2229 POS Metadata 218 171 218 a20c408 3 3 0 1000 927 POa20c568 4 4 0 5000 4495 4542 POS Metadata 405 327 405 4 active, 2047 maximum
owner informix informix informix informix
name rootdbs sbspace1 dbs2 s9_sbspc
pathname /chunks/chunk1 /chunks/chunk2 /chunks/chunk3 /chunks/chunk4
23
Above is a sample of output from an onstat -d command. An oncheck -pe report for the sbspace in this system is shown on the following page. The information for the user data free list was obtained right after the sbspace was created and appears as FREE USER DATA in an oncheck -pe report.
Managing Large Object Space 10-23
DBspace Usage Report: s9_sbspc Chunk Pathname 4 /chunks/sbspace2
Owner: informix Size 1000
Created: 10/30/2001
Used see below
Description Offset --------------------------------------------- -------RESERVED PAGES 0 CHUNK FREELIST PAGE 2 s9_sbspc:'informix'.TBLSpace 3 SBLOBSpace LO [4,4,1] 53 SBLOBSpace LO [4,4,2] 153 SBLOBSpace LO [4,4,3] 173 SBLOBSpace LO [4,4,8] 273 SBLOBSpace LO [4,4,9] 288 SBLOBSpace LO [4,4,10] 382 SBLOBSpace FREE USER DATA (AREA 1) 402 s9_sbspc:'informix'.sbspace_desc 476 s9_sbspc:'informix'.chunk_adjunc 480 s9_sbspc:'informix'.LO_ud_free 484 s9_sbspc:'informix'.LO_hdr_partn 494 SBLOBSpace FREE META DATA 507 SBLOBSpace RESERVED USER DATA (AREA 2) 576 SBLOBSpace LO [4,4,4] 764 SBLOBSpace LO [4,4,5] 784 SBLOBSpace LO [4,4,6] 884 SBLOBSpace LO [4,4,7] 904 Total Used: 669 Total SBLOBSpace FREE META DATA: Total SBLOBSpace FREE USER DATA:
Free see below Size -------2 1 50 100 20 100 15 94 20 74 4 4 10 13 69 188 20 100 20 96
69 262
Smart LO handle [sbspace, chunk, logical sequence number] The three values in brackets after each LO listing together represent the smart LO handle, which is used to uniquely identify every large object on the database server. In the example above, you can see smart LOs that have a physical order that is different from their logical order. The database server tries to balance the load of the sbspace by inserting into both user-data areas on a chunk.
10-24 Managing Large Object Space
Monitoring Sbspaces: oncheck l database server
Use oncheck to view: n metadata information n pages on disk n general chunk layout
oncheck 25
The oncheck utility provides options to generate reports on sbspaces and elements within sbspaces. -cs sbspace
Checks metadata information
-cS sbspace -ps/-pS sbspace -pp partnum logical_offset
Checks metadata and extent information Prints metadata information View a page on disk
-pP chunk page_offset -pd/-pD partnum
View a page on disk View contents of metadata tables. The partnum for the LO partition header is needed and can be obtained from -cs options. For example: oncheck -pD 0x200004
-ce/-pe sbspace
General chunk layout
Managing Large Object Space 10-25
The oncheck -pS command is used to display tblspace information for metadata tables (like oncheck -pt) and displays additional information about smart LO extents. Here is an example: $oncheck -pS ... TBLSpace Report for s9_sbspc:'informix'.LO_hdr_partn TBLspace Flags 200802 Row Locking TBLspace use 4 bit bit-maps Partition partnum 0x400005 Rowsize 456 Number of special columns 0 Number of keys 0 Number of extents 1 Current serial value 1 First extent size 13 Next extent size 13 Number of pages allocated 13 Number of pages used 4 Number of data pages 3 Number of rows 11 Partition lockid 4194309 Smart LO Optical Cluster Partnum -1 handle Current SERIAL8 value 1 Current REFID value 1 Internal LO Created Tue Oct 30 12:30:17 2001
identifier
Large Objects Space Chunk Page = [4,4,1] LO SW Version LO Object Version Created by Txid Flags
Object ID = 1004467118 4 1 7 0x32 LO_NOLOG LO_NOKEEP_LASTACCESS_TIME LO_HIGH_INTEG Data Type 0 Extent Size 100 IO Size 0 Created Tue Oct 30 12:33:31 2001 Last Time Modified Tue Oct 30 12:33:32 2001 Last Time Accessed Tue Oct 30 12:33:32 2001 Last Time Attributes Modified Tue Oct 30 12:33:32 2001 Ref Count 1 Create Flags 0x32 LO_NOLOG LO_NOKEEP_LASTACCESS_TIME LO_HIGH_INTEG Status Flags 0x400 LO_FROM_SERVER Size (Bytes) 15736 Size Limit -1 Total Estimated Size -1 Deleting TxId -1 LO Map Size 200 LO Map Last Row -1 LO Map Extents 2 LO Map User Pages 100
10-26 Managing Large Object Space
onstat -g smb Syntax: onstat -g smb [s | c | h | e | t | lod | fdd] smb smb s smb c smb h smb e smb t smb lod smb fdd
command usage sbspaces sbspace chunks LO header table header LO header and FD entries timing statistics for enabled server LO header table header and entries LO file descriptor entries
27
The onstat -g smb command can be used with any of the above options to obtain statistical information about sbspaces. The e and lod options provide a list of the entries in the LO header table. These represent the smart LOs that have been stored in sbspaces. Here is an example: $ onstat -g smb e
Smart LOVersion 9.30.UC1 Smart LO LO-header Object ID Informix Dynamic Server -- On-Line Up 1 days 22:26:14 -29696 Kbytes size handle table rowid (internal identifier)
Lo Header Table Entries opns refs 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
size 15736 98 21454 154 11751 123 25610 22 7937 147
ha_status 0x00000004 0x00000005 0x00000004 0x00000005 0x00000004 0x00000005 0x00000004 0x00000005 0x00000004 0x00000005
h_status 0x00000400 0x00000400 0x00000400 0x00000400 0x00000400 0x00000400 0x00000400 0x00000400 0x00000400 0x00000400
[ [ [ [ [ [ [ [ [ [ [
sbs,chk,seq(rid),oid ] 4,4,1 (0x102),1004467118 4,4,2 (0x103),1004467119 4,4,3 (0x104),1004467120 4,4,4 (0x201),1004467121 4,4,5 (0x202),1004467122 4,4,6 (0x203),1004467123 4,4,7 (0x204),1004467124 4,4,8 (0x301),1004467125 4,4,9 (0x302),1004467126 4,4,10 (0x303),1004467127
] ] ] ] ] ] ] ] ] ]
Managing Large Object Space 10-27
10-28 Managing Large Object Space
Appendixes
Appendix A IBM IDS Utilities
IBM IDS Utilities 07-2002 © 2002 International Business Machines Corporation
A-1
The Oninit Utility Syntax: oninit [-s] [-i] [-p] [-y] oninit oninit -s oninit -i oninit -p -y
Brings IBM IDS from Off-line to On-line mode. Brings IBM IDS from Off-line to Quiescent mode. Initializes IDS’s root dbspace. Does not search for and delete temporary tables during shared memory initialization. Option that automatically answers yes to any prompts.
2
The oninit utility is used to change the operating modes of the IBM IDS system at the command line. The -i option is used to initialize the IBM IDS systems root dbspace. Remember by initializing the root dbspace all other dbspace information is lost. If you want the IBM IDS system to be brought on-line when your machine is rebooted, place the oninit command with no arguments in the system startup file (/etc/rc on many UNIX platforms). Be sure to properly set the environment in the startup file, too. To see the usage statement for oninit, enter oninit --
at the command line.
A-2 IBM IDS Utilities
The Onmode Utility Syntax: onmode [-k] [-m] [-s] [-u] [-y] [-e] onmode -k onmode onmode onmode onmode -y
-m -s -u -c
Performs an immediate shutdown and brings the system offline. Brings the system from quiescent to online. Performs a graceful shutdown to quiescent. Performs an immediate shutdown to quiescent. Forces a checkpoint. Option that automatically answers yes to any prompts
3
The onmode utility is used to change the operating modes of the IBM IDS system at the command line. There are many other arguments for the onmode utility that are not related to the operating modes of the IBM IDS system. To see the usage statement for onmode, enter onmode --
at the command line.
IBM IDS Utilities A-3
Creating Dbspaces with Onspaces Syntax: onspaces -c [-d] [-b] [-g] [-p] [-o] [-s] [-m] [-t] -c -d -b -g -p -o -s -m -t
dbspace blobspace blobpgsize pathname offset size path offset
Create a dbspace or blobspace. The name of the dbspace to be created. The name of the blobspace to be created. Page size of the blobspace. Disk partition or device name. Offset in kilobytes. The size of the initial chunk in kilobytes. Mirror path name and offset. Indicates a temporary dbspace.
4
The onspaces utility is used to create a dbspace, temporary dbspace or blobspace. To obtain a usage statement from the command line enter onspaces --. To see dbspaces in the IBM IDS system enter onstat -D or onstat -d. This output displays important information such as chunk status, the number of free pages, number of reads and writes per chunk. IBM IDS systems may have many dbspaces in a given system especially if fragmentation is used. It is highly recommended that a script be used to create the dbspaces.
Examples An example using onspaces to create a dbspace is: onspaces -c -d dbs1 -p /dev/rvol3 -o 0 -s 6000
An example using onspaces to create a temporary dbspace is: onspaces -c -d tmp1 -t -p /dev/rvol5 -o 0 -s 8000
A-4 IBM IDS Utilities
Temporary dbspaces are important in the IBM IDS system. Typically you will have multiple temporary dbspaces located on separate physical devices. The onspaces utility is used to drop dbspaces, temporary dbspaces, or blobspaces. To drop a dbspace it must be unused. All tables must be dropped from a dbspace before the IBM IDS system will allow you to drop the dbspace.
Syntax To drop a dbspace enter: onspaces -d dbspace_name
IBM IDS Utilities A-5
Adding and Dropping Chunks Syntax: onspaces [-a] [-d] [-p] [-o] [-s] [-m] -a -d -p -o -s -m
spacename spacename pathname offset size path offset
Add a chunk to the dbspace. Drop a chunk from the dbspace. Disk partition or device name. The offset in kilobytes. Size of the chunk. Mirror pathname and offset.
6
As space is used within the initial chunk of a dbspace, more space may be required. It is recommended that initial space requirements are calculated prior to the creation of a dbspace so that a single chunk can accommodate the space requirement. Keep in mind that additional chunks may not be located contiguous to the initial chunk thereby increasing seek time when reading data that may be located on different chunks. The onspaces utility is used to add chunks in a dbspace or drop chunks in a dbspace. You may also start mirroring, stop mirroring or change the status of a chunk using the onspaces utility.
Examples An example of adding a chunk to a dbspace is: onspaces -a -d dbs1 -p /dev/rvol3 -o 6000 -s 6000
An example of dropping a chunk from a dbspace is: onspaces -d dbs1 -p /dev/rvol3 -o 6000
A-6 IBM IDS Utilities
The Onparams Utility Syntax: onparams [-s] [-p] [-d] [-s] [-d -l] -a -p -d dbspace -s size -d -l logid
Adds a logical log. Changes the physical log. The dbspace name where the log will be created. The size of the new log in kilobytes. Drops a logical log. Specifies the logical log to be dropped.
7
The logical and physical logs are created by default in the root dbspace at initialization. Since the logs will receive a large amount of activity, particularly in an OLTP environment, it is recommended that the logs be physically separated on different devices. An easy approach is to create the desired number of logical logs in the root dbspace and a small physical log. After initialization move the physical log to a different device. New logical logs added to the IBM IDS system are identified with an A flag in the onstat -l output. A simple method to activate the logs is to perform a fake system archive. This may be performed by setting the onconfig TAPEDEV parameter to /dev/null and running ontape -s. If you are using the onbar utility a fake archive is performed by entering onbar -b -F. Take care when performing fake archives, since they do not truly backup any information. Be sure to take a real system archive as soon as possible. Logical logs may be dropped only if they are available for reuse. Logical logs will not be available for reuse as long as its contents are needed for transaction rollback or fast recovery. A log must be backed up and can not contain any open transactions or the last checkpoint record to be reused. Output from an onstat -l command will have a C flag for the current log and an L flag for the log that contains the last checkpoint record. You may force the system to the next log with onmode -l. You could then force a checkpoint in an IBM IDS system by entering onmode -c.
IBM IDS Utilities A-7
The Onstat Utility n n n n
Lists what is in IDS shared memory structures. Minimal performance impact. Provides valuable information about the IBM IDS System. The utility of choice for interactive monitoring of the IBM IDS system at the command line.
8
The onstat utility is a valuable real time system monitoring and reporting tool. Onstat reads directly from shared memory structures and reports the contents at the moment it is run. Generally, onstat does not perform disk I/O. It places no locks on system resources and therefore has a minimal performance impact. Onstat is the most common Informix utility for interactive IBM IDS system monitoring. Some helpful options are: onstat --
lists all options
onstat -i
places you in interactive mode
onstat -
displays the operating mode and status of the engine
onstat -g sub_options runs multithreaded options onstat -r
A-8 IBM IDS Utilities
repeats the options every number of seconds provided by
Examples
onstat -g act
display all active threads
onstat -g ath -r 2
display all threads and repeat display every 2 seconds
onstat -l
display logging information
IBM IDS Utilities A-9
The System Monitoring Interface n n n n n n
sysmaster database created automatically at initialization. Database contains virtual tables that point to IBM IDS shared memory structures. Provides snapshot performance and system status information. Provides an SQL interface to the data dictionary information. Allows an administrator to automate aspects of system monitoring. Is useful when performing repetitive monitoring tasks.
10
The System Monitoring Interface (SMI) is a convenient method to access administrative information in an IBM IDS system through a supported SQL interface. The sysmaster database is automatically created in the root dbspace when the IBM IDS system is initialized. There is one sysmaster database created for an IBM IDS system. Most of the sysmaster tables are not real tables but point to data structures in shared memory. SMI is helpful to automate aspects of system monitoring particularly when performing repetitive tasks. The sysmaster database provides read only access. You cannot perform INSERT, UPDATE or DELETE operations or lock any SMI “tables”. All IBM IDS users have permission to query the sysmaster database tables. It is important to verify the successful creation of the sysmaster database at initialization time. Messages related to the creation of the sysmaster database are written to the IBM IDS message log file. Keep in mind that all DDLs are logged in the IBM IDS system. You must configure enough logical log space to support the creation of the sysmaster database. Dbschema will not work against the sysmaster database. Its schema is in the $INFORMIXDIR/ etc/sysmaster.sql file.
A-10 IBM IDS Utilities
The Oncheck Utility n n n n n
Examines IDS data structures on disk. Locates index and data page corruption. Performs I/O. Some options placed locks on the table it is processing. Can fix index page corruption.
11
The oncheck utility is used to repair index and data page corruption on disk. It is also used to examine and print reports on IBM IDS data structures. Care must be taken when running oncheck since some options, like oncheck -cd or oncheck -cD, place a shared lock on the table being examined. In order to prevent corruption, Online performs automatic consistency checks on data whenever it is manipulated in shared memory. When a page fails a consistency check or an index problem is detected, a message is written to the IBM IDS message log file. The message will detail the problem and suggest the appropriate oncheck option to run. The isam error code will be -105. If you receive a failed consistency check message, shut down the IBM IDS system immediately and run the oncheck option as indicated in the message log file. If oncheck was unable to repair the data page(s), you can try to unload and re-load the table. If corruption occurs in an index, drop and re-build the index. If these steps do not fix the corruption, restore from archive. There are two primary options, -c and -p, each having their own sub-options. The -c option indicates that data structures should be checked. The -p option indicated that data structure information should be printed to the screen but in some options also checked. A very helpful option that provides extent information is oncheck -pe.
IBM IDS Utilities A-11
A-12 IBM IDS Utilities
Appendix B Informix Server Administrator
Informix Server Administrator 07-2002 © 2002 International Business Machines Corporation
B-1
Informix Server Administrator is a tool that gives you the ability to perform database server administrative tasks from any remote location that has a web browser and network access to your database server’s host machine. Throughout this course, instructions are provided on how to use ISA to perform a variety of tasks. The purpose of this appendix is to provide some additional information to help you get started with installing and configuring ISA.
B-2 Informix Server Administrator
Installing Informix Server Administrator Before installing ISA, it is important to note the following: n n n n
ISA must be installed on the same machine as your database server. You must be logged in as root to run the installation program. A port number must be available to which web connections can be made to the ISA server. Refer to the README file provided with the product for additional information about installation and features of ISA.
To install ISA on your computer system, follow these steps: 1. 2.
Create a directory where you want to install the ISA files. Extract the ISA files from the cpio file provided using the command: cpio -icvdumB < filename
3. 4.
where filename is the name of the cpio file. Log in as user root using the su command. Run the installation program using the command: ./installisa
5.
Respond appropriately to the prompts during installation. Provide a port number that is not in use (not in /etc/services) when prompted. This is the number that the administrator uses to connect to the ISA server from a web browser.
An example of an installation is provided on the following pages. User input is in boldface print. Nonliteral user input is in italics.
Sample Installation Session # ./installisa This is the Informix Server Administrator install program. Checking manifest...
done
Informix Server Administrator includes its own copy of the Apache HTTP server, which ISA can install and configure for you. Alternatively, instructions are provided for using other HTTP servers. Do you want to use ISA's HTTP server? [yes]: yes
Informix Server Administrator B-3
Informix Server Administrator includes its own copy of Perl, which ISA can install and configure for you. Alternatively, instructions are provided for using other Perl installations. Do you want to use ISA's Perl environment? [yes]: yes First, enter the port number for the ISA server. This port number must be available for use on this machine. It should be above 1024 and below 65536. A list of port numbers that are probably in use is available from /etc/services, but this list may not be complete for your installation. Port number: 4000 Second, ISA needs to know the correct name of this machine. To use the value that appears in [square brackets] press the ENTER key. Hostname [eclipse]: Third, ISA needs an e-mail address for problem reports. Informix recommends you create an alias "isa-admin" for this purpose. E-mail address [isa-admin@eclipse]: Fourth, enter the HTTP password for user "informix". are prompted to confirm the password that you enter. username and password are stored in /httpd/etc/passwd.
You The
When a user accesses ISA through the web browser, the browser prompts for a username and password. The user should log in as "informix" and the password that you supplied here. Note: The HTTP server password does not have to be the same as the system password for user informix, but anyone who can access ISA with this username and password can do anything that user informix can do from the database server command line. New password: <passwd> Re-type new password: <passwd> Adding password for user informix You can add users with read-only access to ISA. They will be able to monitor the server but they will not be able to change the mode of the server, add or remove storage, or perform other administrative tasks.
B-4 Informix Server Administrator
Do you wish to add a read-only user [yes]: Modifying httpd.conf file... Modifying access files... Modifying httpd.conf file for jserv... Modifying jserv.conf file... Modifying jserv.properties file... Modifying zone.properties file... Generating control program... Generating isacfg file... Generating environment setting program... Generating isa.properties... Generating custom opt.conf file...
no done done done done done done done done done done done
----------------------------------------------------------ISA is now installed. To start the ISA server, become informix (or root) and run the command /prods/isa1.3/sbin/isactl start Or answer yes to the question below. To start ISA and finish configuration use your web browser to access the following URL: http://eclipse:4000/ Log in with the username and password you provided earlier. ----------------------------------------------------------Start ISA server [no]: yes /prods/isa1.3/sbin/isactl start: httpd started
Starting the ISA Server If you did not start the ISA web server during the installation process, you can do so by executing the following command as root: pathname/sbin/isactl start
where pathname is the name of the directory where ISA is installed. To stop the server, run the command: pathname/sbin/isactl stop
To automatically start and stop the ISA server on your computer when it is rebooted, place these commands in the appropriate startup and shutdown scripts. Refer to your operating system documentation for information on where these scripts are located.
Informix Server Administrator B-5
Accessing the ISA Server As long as the ISA server is running, you can access the ISA server from any machine that has access to the computer where it is installed. Use the following URL format to access the ISA server: http://hostname:port_number
where hostname is the name of the computer where the ISA server resides and port_number is the port number you provided during installation. The opening ISA webpage provides some basic information about ISA and includes links to additional documentation. To access the ISA web utility, select Use ISA. This command brings up the Informix Servers on hostname page, which displays is a list of Informix database servers that have been configured for ISA. If this is the first time that you have accessed ISA since installing it, only sample database servers are displayed.
Configuring Your Database Server in ISA To include your database server in the list shown on the Informix Servers on hostname page, press the Configure button. This brings up a text-editor window where you can view and modify the ISA configuration file. The ISA configuration file is in isa_dir/etc/isacfg and can also be edited on the computer that hosts the ISA server. To configure your database server, simply list the environment variables required to access your database server, including INFORMIXSERVER, INFORMIXDIR, ONCONFIG, and INFORMIXSQLHOSTS. Sample server entries are included in the file. You may want to simply substitute your server information in place of the example server information. Here is an example of the configuration parameters for a server called server1: INFORMIXSERVER INFORMIXDIR ONCONFIG INFORMIXSQLHOSTS END
server1 /prods/ids9.21 onconfig.server1 /home/stu100/sqlhosts
Note that the keyword END must follow your list of variables to terminate the entry for a single server. To save your configuration and return to the Informix Servers on hostname page, press the Save button. Your database server should now be included in the list of available servers. Now that you have configured your server, you are ready to use ISA to perform a variety of administrative tasks. Select your server by clicking on the server name and explore some of the commands provided in the ISA command list.
B-6 Informix Server Administrator
Index
Index A access time 1-13 ALTER FRAGMENT...DETACH clause 7-33
B Blob BYTE data type 10-6 TEXT data type 10-6 Blobspace blobpage 10-8
STMT_CACHE 9-4 STMT_CACHE_SIZE 9-4 TBLSPACE_STATS 6-17 Configuration parameter (PLCONFIG) AIOBUFFERS 3-17 AIOBUFSIZE 3-17 CONVERTTHREADS 3-17 CONVERTVPS 3-17 STRMBUFFERS 3-17 STRMBUFFSIZE 3-17 CPU system state time 1-28 user state time 1-28 utilization 1-28
C Checkpoint duration 6-15 Chunk writes 6-14 Configuration parameter AFF_NPROCS 1-30 AFF_SPROC 1-30 BUFFERS 3-30, 4-22, 5-18 CKPTINTVL 3-30, 4-22 , 6-13 CLEANERS 3-30 DATASKIP 7-32 DBSERVERALIAS 9-13 DBSERVERALIASES 9-13 DBSERVERNAME 9-13 DBSPACETEMP 4-22 , 5-18 DS_MAX_QUERIES 8-11 DS_MAX_SCANS 8-11 DS_TOTAL_MEMORY 1-37, 8-10, 8-11 FILLFACTOR 4-8 LRU_MAX_DIRTY 3-30, 4-22, 6-7, 6-9 LRU_MIN_DIRTY 3-30 , 4-22, 6-7, 6-9 LRUS 3-30, 4-22 , 6-8 MAX_PDQPRIORITY 8-4 , 8-11 MULTIPROCESSOR 1-31 NETTYPE 3-29 NOAGE 1-29 NOFUZZYCKPT 6-13 NUMCPUVPS 1-32 , 3-29, 9-17 PHYSFILE 4-22 RA_PAGES 5-18, 6-26 RA_THRESHOLD 5-18, 6-26 SHMADD 1-39 SHMVIRTSIZE 1-37, 4-22 , 4-24, 5-18 SINGLE_CPU_VP 1-31 , 1-32
D dbload description of 3-8 desparation swapping 1-35 Dispatching priority 1-29 DSS 8-10
E Environment variable DBUPSPACE 5-14 FET_BUF_SIZE 3-31 IFX_NETBUF_PVTPOOL_SIZE 9-29 INFORMIXCONRETRY 9-25 INFORMIXCONTIME 9-25 INFORMIXSERVER 9-27 INFX_NETBUF_SIZE 9-29 KAIOOFF 6-21 LIGHT_SCANS 1-17 OPTOFC 9-7 PDQPRIORITY 8-5
F Foreground write 6-11 Fragmentation hash function 7-17 Fuzzy checkpoint 6-13
G global pool 9-15
Index-1
H High Performance Loader AIOBUFFERS 3-18 AIOBUFSIZE 3-18 CONVERTTHREADS 3-17 CONVERTVPS 3-17 Deluxe load mode 3-15 description of 3-12 Express load mode 3-16 PLCONFIG 3-17 STRMBUFFERS 3-18 STRMBUFSIZE 3-17
I Implicit index 4-12 Informix Server Administrator accessing B-6 configuration of B-6 description of B-2 installation of B-3 starting B-5 stopping B-5 In-line poll thread 9-17 in-line poll thread 9-17 Installation Informix Server Administrator B-3 ipcrm 2-8 ipcs 2-7
L latency 1-13 Least Recently Used (LRU) queues 6-8 light scan 1-16 listen thread 9-11 Logical Unit 1-25 LRU writes 6-11
M Memory Grant Manager 8-14 mpstat 1-30
O OLTP 8-10 oncheck -pe 10-24, A-11 -pS 10-26 -pt A-11 onektup 3-4
Index-2
oninit -i A-2 -p A-2 -s A-2 -y A-2 onmode -c 3-32, 6-14, A-3 -c fuzzy 6-13 -D 8-12 -F 1-39 -k A-3 -M 1-37, 8-12 -m A-3 -Q 8-12 -S 8-12 -s A-3 -u A-3 onparams -a A-7 -d A-7 -l A-7 -p A-7 -s A-7 onspaces -a A-6 -b A-4 -c A-4 -d A-4 -g A-4 -m A-4 -o A-4 -p A-4 -s A-4 -t A-4 onstat -D 7-9 , 7-31, A-4 -d A-4 -F 6-11 -g ath 6-21, 9-13 -g cac stmt 9-4 -g glo 2-24 -g iof 7-31 -g ioq 6-18 , 6-23 -g lap 3-16 -g lsc 1-17 -g nsc 9-24 -g nss 9-24 -g ntu 9-23 -g ppf 6-22, 7-31 -g smb 10-27 -g sub_option A-8 -i A-8 -l 6-24 -p 6-5
-r 2-4 , A-8 ontape -s 4-16
P page outs 1-34 paging 1-34 Parallel Data Query (PDQ) 7-9 Parallel Database Query 8-3 pfread utility 1-20 poll thread 9-12 prepmem 9-30 Priority aging 1-29 Processor affinity 1-30
R RAID 1-25 level 1 1-27 level 10 1-27 level 5 1-27 RAW permanent table 3-25 Resolution 5-6
S Sbspace 10-21 definition of 10-16 metadata 10-21 oncheck -pe 10-24 onstat -d 10-23 smart LO page 10-15 sbspaces SBSPACENAME 10-20 turning logging on 10-20 seek time 1-13 Shared memory message portion 1-33 resident portion 1-33 virtual portion 1-33 shared memory connection 9-16 shared memory segment 2-7 sid 2-8 Smart blob 10-21 Smart blobspace 10-21 Smart large object 10-21 Smart LO 10-21 SMI 2-5 Spin locks 1-31 sqexplain.out 8-7 SQL LOAD 3-6 SQL Statement Cache 9-3
SQL statements CREATE CLUSTER INDEX 4-14 FRAGMENT BY EXPRESSION 7-14 SET EXPLAIN 7-9 sqlhosts file 9-13 STANDARD permanent table 3-25 Stripe 1-25 swapping 1-34 sysdistrib table 5-3 sysmaster database 2-5 sysmaster.sql file 2-5 System monitoring interface (SMI) A-10
T tenktup1 3-4 tenktup2 3-4 thrashing 1-35
U Unix utilities iostat 2-14 netstat 9-22 ps 2-13 sar 2-10 sar -d 7-9 time 2-12 timex 2-12 vmstat 2-16 UPDATE STATISTICS Confidence level 5-8 data distribution 5-5 HIGH 5-9 LOW 5-9 MEDIUM 5-7, 5-9 PDQPRIORITY 5-14 sample size 5-10
V Vertical parallelism 4-18 VPCLASS 1-29
W working set 1-34 Write cache rate 6-6
Index-3
Index-4