Hunting I/O Bottlenecks with iostat
http://www.linuxquestions.org/linux/articles/Jeremys_Magazine_Articles/Hunting_I_O_Bottlenecks_with_iostat Hunting I/O Bottlenecks with iostat Tech Support Written by Jeremy Garcia The October 2004 “Tech Support” column showed you how to track down performance bottlenecks using vmstat. This month, let’s take a closer look at input/output (I/O) issues that you may have identified using vmstat. The iostat command monitors system I/O device loading by observing the time devices are active in relation to their average transfer rates. The iostat command generates reports that can be used to modify your system configuration to better balance the I/O load between physical disks or to let you know when you have reached the threshold of your current disk subsystem. Running iostat with no arguments generates a report that contains information since the system was booted. You can provide two optional parameters to change this: Code: $ iostat [ interval [ count ] ] The interval parameter specifies the amount of time in seconds between each report. You can specify the count parameter in conjunction with the interval parameter and control how many reports are generated before iostat exits. When using these arguments, the first report contains information since the system was booted, while each subsequent report covers the time period since the last report. By default, iostat generates two reports, one for CPU utilization and one for device utilization. You can use the –c option to get just the CPU report or the –d option to get just the device report. Here is the default output from iostat: Code: avg-cpu: %user %nice %sys %iowait %idle4.92 0.00 0.78 0.77 93.53Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtnsda 1.22 0.19 33.09 690042 121113184sda1 0.82 0.04 23.16 146122 84755712sda2 0.40 0.15 9.93 543450 36357472sdb 1.25 0.27 33.09 986982 121113184sdb1 0.84 0.04 23.16 160592 84755712sdb2 0.41 0.23 9.93 825960 36357472 The first three columns of the CPU report show the percentage of CPU utilization that occurred while executing at the user level (applications), at the user level with a nice priority, and at the system level (kernel), respectively. The last two columns show the percentage of time the CPU was idle while it had an outstanding disk I/O request and while it did not have an outstanding disk I/O request. The columns in the device report section are transfers per second, number of blocks per second read, number of blocks per second written, total number of blocks read, and total number of blocks written. You can use the –k parameter to change the last four columns to be expressed in kilobytes instead of blocks, which makes the report a little more readable. You can get additional extended statistics using the –x parameter, but you must be running either a 2.6 kernel or a patched 2.4 kernel (such as the one shipped with Red Hat Enterprise Linux 3). Now that you know what each column means, how do you use this information? First, make sure you run iostat when your system is running slow or you suspect a problem. Rediect the output of iostat to a file, setting the interval to about 15 seconds and the count to 12-16. This provides a quick snapshot of what’s happening over the span of a few minutes. You don’t want to run iostat too often, as it will start to actually contribute to the load and skew your numbers. Now that you have the output, how do you interpret the numbers? As you might guess, reading iostat output takes a bit of experience and an understanding of the underlying principles behind the numbers. The first thing you should look at is iowait. If you have a high percentage of CPU time idle while it’s waiting on disk I/O, that’s a good indicator that you have an I/O bottleneck. Moving on to the device section, you should be able to easily see how I/O is being distributed between disks. Do you have a lot of activity on one disk while another one is sitting idle? If so, you should see if you can move some of the activity from the active disk to the idle disk. You may have a case where all of your available disks are being utilized or you can’t evenly distribute the load among the existing disks. In that case, you need to either add additional disks (if you have the capacity) or replace the current disks with ones that have a faster spindle speed, higher throughput, and lower seek times. Once you are comfortable with iostat you can use the –x parameter to get useful information such as average request size, average wait time for requests and average service time for requests. With a little work, iostat allows you to identify I/O bottlenecks and lead you to potential solutions. The numbers may seem overwhelming at first, but with some patience, you’ll be able to use iostat productively in no time. Happy hunting.