An investigation of dataserver CPU & IO utilization to resolve performance bottlenecks
By Prasanna Kirtani July 2009
Introduction Any database management system be it Sybase, Oracle, SQLserver or any other DBMS, relies to on finite availability of hardware and software components. These components make up the resource pool of the system. Of interest to me when I troubleshoot database performance is Disk IO, CPU and memory. Performance is a measure of how the DBMS uses the I/O, CPU and memory to process DDL and DML requests. • • •
Performance is typically calculated in terms of throughput and response time. The aim of performance tuning is to optimize the utilization of CPU, Disk IO and Memory. The goal of performance troubleshooting is to identify performance bottlenecks by examining one variable or component at a time. We have to be aware that resolving one bottleneck can or in a lot of cases uncover previously hidden problems.
Through this document, I would like to share my approach to troubleshooting performance issues on Sybase ASE.
Where Do We Start? Unfortunately, this is a question that is faced every time a performance problem is encountered. There are lots of areas that could be responsible for degradation of performance on a DBMS, Some of which are outside of the database server. For Sybase ASE, whenever a poor server performance is reported, even before I register my mind that there seems to be a performance issue, I do the following 1. Check connectivity to the dataserver 2. Check the dataserver errorlog for any unfamiliar messages
3. Login to the dataserver and check for blocking, log suspend and a filled up tempdb 4. Run a simple select query in the master database
Once I assure myself that the dataserver is functioning normally, I proceed with my performance investigation
CPU, IO and Memory utilization Each DBMS has its own flavor of commands that help isolate performance issues. There is no silver bullet which can magically pinpoint performance bottlenecks. What I have seen to be most effective is to start looking at a problem by capturing CPU, IO and memory utilization within the dataserver. A few readings of these three basics variables can provide us with clues as to where to look next to pinpoint the performance bottleneck. For the Sybase ASE, I find store procedure “sp_monitor” to be most useful. “sp_monitor” is a well known stored procedure available for the longest time on ASE that captures CPU busy and IO busy statistics since the last run of the same procedure. Sybase like other DBMS now provides a lot more features that can help with performance troubleshooting but I have prefer sp_monitor for its simplicity and ease of use. “sp_monitor” provides a first insight at what is going on in the dataserver as far as IO and CPU utilization is concerned. Note: To collect information relating to “memory”, I use another procedure “sp_sysmon”. Specifically, I look for two values “buffer hit ratio” and “spinlock contention”. “Buffer hit ratio” reports the % times a requested datapage is found in the data cache. A buffer hit ratio of less than 97% could indicate memory starvation. If sufficient amount of memory is not allocated to the data caches the first thing to do would be to increase memory. “Spinlock Contention” reports the number of times an engine encountered spinlock contention on the cache, and had to wait to acquire memory. Splitting the data
caches into cachelets will reduce spinlock contention in case of SMP environments
Further investigation using sp_monitor So, moving on, readings of sp_monitor over a few minutes provide real time CPU busy and I/O busy statistics. Reviewing CPU and IO utilization on the dataserver gives us four combinations of CPU and IO values. The dataserver would be experiencing either Low CPU utilization / Low IO utilization, High CPU utilization/Low IO utilization, Low CPU utilization/High IO utilization or High CPU utilization/High IO utilization. Let’s review these four possibilities in a bit more detail Low CPU utilization (below 80%)/Low IO utilization (below 80%) If CPU and IO utilization reported by dataserver is low, it is quite likely that Sybase is not the source of poor performance. In such as case, I work with the application team and OS administrators to investigate components outside of the ASE realm. Some of the areas that merit investigation are • • • • •
Client Application and API Issues Network issues Operating system performance ( output of top, vmstat, iostat etc ) Problems with the Sybase interfaces file on the etc. Problems with database connectivity tools
High CPU utilization (over 95%)/Low IO utilization (below 80%) High CPU utilization is not a bad thing. However consistent high CPU utilization of over 95-98% indicates that the dataserver may be experiencing one of more of the following situations:
•
•
•
•
•
•
A lot of queries performing aggregation operations i.e. sum, avg, max, min etc. These operations are CPU intensive. A long term solution is to implement incremental aggregation. The data is over normalized. As a consequence there is need to join 5 or more tables to extract even simple result sets. Such extent of normalization causes data to be memory bound resulting in a lot of logical I/O that shows up as CPU cycles. The dataserver does not have sufficient capacity to process DML/DDL requests. The only alternative could be to increase the number of engines/CPUs or migrate to better hardware. Too many parallel queries running simultaneously.. causing large number of worker threads overwhelming dataserver engines. Improperly sized data cache buffer pools forces the optimizer to choose the incorrect caching strategy. This could result in data being swapped in and out of memory. The swapping shows up as CPU and IO cycles. Improperly set configuration parameters can drastically increase CPU utilization by unnecessarily wasting CPU cycles. An example is a high ‘runnable process search count’
Immediate steps can be taken to help decrease CPU load. These include setting parallelism to 1, reducing the number of queries running simultaneously on the dataserver, increasing the number of online engines and ensuring that the dataserver is the only principal process on the machine where it is hosted and adjusting configuration parameters to reduce CPU overhead.
Low CPU utilization (below 80%)/High IO utilization (over 95%) High I/O utilization may result from the following •
•
The data profile has changed rendering the index statistics useless. This results in the query optimizer selecting improper indexes or running table scans causing IO to spike. Lack of proper indexes cause tables scans especially when queries involve multiple table joins that
•
•
•
•
translates to I/O requirements to blow out of proportion. DSS and OLTP applications are being hosted on the same dataserver. DSS applications typically need massive amounts of data pages to be read into memory causing OLTP data to be swapped out due to LRU algorithm. This results in an IO heavy dataserver. The data is over normalized and hence requires more data pages to be read and written increasing I/O utilization A bad database design dictates complex queries to be written, the optimizer chooses incorrect indexes or proper indexes do not exist causing a lot more I/O to be performed. Again, improperly sized data buffer pools cause incorrect caching strategy resulting in data being swapped in and out of memory. For example a small 16K buffer pool would cause the optimizer to choose the 16K buffer pool to reduce I/O because of its existence but in reality leads to high I/O due to in and out page swaps.
In the short term, As with CPU utilization, I/O utilization can be tamed by scaling down the number of processing running on the dataserver, performing update statistics if possible on the most critical tables, resizing data caches, increasing network packet size, increasing the size of buffer pools etc. High CPU utilization(over 95%)/High IO utilization(over 95%) Good luck! Look at the both of the above paragraphs to provide relief in the short term.
Conclusion To conclude our discussion, when we encounter performance degradation on ASE there are a few things your DBA can do to provide immediate relief. However there is no simple bunch of tricks that can permanently and quickly fix performance issues. Application and Physical Database Design decisions account for over 80% of your system's "tunable" performance. I had come across a comment “It is Not Possible to Tune a Bad Design” in one of the articles I had read. I find this to be very true. Project resources
should be planned to include thorough review of your database design for optimal performance. Good luck!