Iowa It

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Iowa It as PDF for free.

More details

  • Words: 796
  • Pages: 3
Relevance of High %iowait in Server Performance High %iowait has historically indicated a problem in I/O performance. However, due to advances in CPU performance, high %iowait may be a misleading indicator, especially in random I/O workloads. It's misleading because %iowait measures CPU performance, not I/O. To be precise, %iowait measures the percent of time the CPU is idle, but waiting for an I/O to complete. As such, it is only indirectly related to I/O performance, which can result in false conclusions. It is possible to have healthy system with nearly 100% iowait, or have a disk bottleneck with 0% iowait. High %iowait is becoming more common as processor speeds increase. Gains in processor performance have significantly outpaced disk performance. While processor performance has doubled every 12 to 18 months, disk performance (in IOPS per disk) has remained relatively constant* . This imbalance has resulted in a trend toward higher %iowait on healthy systems. (* IOPS depend on seek time, which hasn't increased at the rate of processor performance. The improvements in storage have been in areas such as areal density (MB/sq. in.) and rotational speed.) The following example illustrates how faster CPU's can increase %iowait. Assume we upgrade a system with CPU's that are 4 times faster. All else remains unchanged. Before the upgrade, a transaction takes 60 ms which includes 40 ms of CPU time plus 20 ms to perform an IO, and that our application performs one transaction after another in a serial stream. Before CPU Upgrade CPU time = 40 ms IO time = 20 ms Total transaction time = CPU + IO = 40 + 20 = 60 ms %iowait = IO time/total time = 20/60 = 33% After CPU Upgrade CPU time = 40 ms/ 4 = 10 ms IO time = 20 ms Total transaction time = CPU + IO = 10 + 20 = 30 ms %iowait = 20/30 = 66% In this example, transaction performance doubled, despite a 2x increase in %iowait. In this case, the absolute value of %iowait is a misleading indicator of an I/O problems. So, how do you identify an I/O problem if you can't rely on %iowait? The best way is to measure I/O response times using filemon. As a rule of thumb, read/write time should average 15-20 ms on non-cached disk subsystems. On cached disk subsystems, reads should average 5-20 ms, and writes should average 2-3 ms. Higher response times indicate the storage subsystem is possibly overloaded. Here's an example of a 90 second filemon trace from an actual customer system that was heavily utilized. The filemon command was:

# filemon -o /tmp/filemon.out -O lv,pv -T 320000; sleep 90; trcstop The output is in /tmp/filemon.out. From the Detailed Physical Volume Section: VOLUME: /dev/hdisk60

description: EMC Symmetrix FCP Raid1

reads:

9217

(0 errs)

read sizes (blks):

avg

71.8 min

read times (msec):

avg

61.515 min

read sequences:

6249

read seq. lengths:

avg

writes:

avg

43.0 min

write times (msec):

avg

40.651 min

write sequences:

6939

write seq. lengths:

avg

seek dist (blks): sdev 19185871.9

8 max

8 max

76.6 min

8 max

utilization:

309.8

256 sdev

37.6 88.734

1696 sdev

88.9

(62.7%)

seek dist (%tot blks):init 0.00000, avg 17.80295 83.57367 sdev 20.34995

throughput:

3920 sdev

0.003 max 1544.865 sdev

init 0, avg 16784566.3 min

time to next req(msec): avg 54.050

93.4

(0 errs)

write sizes (blks):

10188

256 sdev

0.011 max 1643.486 sdev 130.135

105.9 min

7023

seeks:

8 max

22.074 min

8 max 78792992 min 0.00001 max

0.006 max 2042.710 sdev

1598.1 KB/sec 0.73

The average read and write service times are highlighted: the average read time for hdisk60 was 61.515 ms, and the average write time was 40.651 ms. In this case, we have a disk bottleneck. High IO service times are typically due to overloaded disks in the disk subsystem (i.e. we're sending more IOPS than the disks can handle), an overloaded processor in the disk subsystem, or bottlenecks or problems in the interconnect to the disk. Here are some alternatives to alleviate this problem: •

Tune AIX: Use asynch I/O, read larger blocks (vmtune- maxpgahead), etc



Reduce the number of IOs to the disk subsystem. Increase memory for data caching, or by use a RAM filesystem.



Spread data over more physical disks



Move IO from overloaded disks to under-utilized disk by moving LVs with migratepv



Tune the application/database to do less I/O



Schedule lower priority jobs to off peak hours

So in conclusion, don't rely on %iowait to diagnose I/O bottlenecks. Use filemon or open a "perfpmr" with IBM Support. But be open to the possibility that your system is operating "normally". After all, it's hard to make a disk go faster, but doubling the CPU speed allows you to wait twice as fast!

Related Documents

Iowa It
June 2020 10
Iowa
October 2019 76
Iowa
December 2019 19
Iowa
April 2020 17
2003 Iowa
June 2020 9
Iowa Anal
November 2019 27