Another counter to watch when the %Disk Time exceeds 60 percent to 80
percent is the Avg. Disk sec/Transfer. This counter measures the time in seconds
of the average disk transfer (i.e., the time the disk needs to service each
request). A disk can complete only so much work before its service begins to
degrade. When disk performance begins to degrade, the Avg. Disk sec/Transfer
increases dramatically. This increase affects NT's overall performance.
You will want to review the Disk Transfers/sec counter to determine the
amount of work a disk is completing. This counter measures the rate of read and
write operations (also known as the rate of input/output per second) on the
selected disk. The amount of work a disk can support depends on the disk
technology and the I/O workload the disk encounters. In my experience, an Ultra
Fast/Wide SCSI 7200rpm disk encountering a mixed I/O workload (random,
sequential, write, and read operations) supports approximately 50 disk transfers
per second to 100 disk transfers per second before its performance degrades.
Monitoring the Avg. Disk sec/
Transfer counter lets you observe this
performance degradation.
Detecting RAID Bottlenecks
RAID technology lets you group multiple hard disks and present them to NT as
one logical disk device. To detect a RAID bottleneck, you use the single-disk
bottleneck detection techniques I just described, but with a twist. The %Disk
Time counter uncovers problems that are brewing in any RAID device. When you're
attempting to detect a RAID bottleneck, RAID 0, disk striped sets, is the
easiest RAID level to work with. RAID 0 takes advantage of all the disks in the
array equally. Thus, a three-disk RAID 0 array can support three times as much
workload (i.e., disk requests) and three times as many outstanding disk requests
(Avg. Disk Queue Length) as a one-disk configuration before becoming a
bottleneck.
In a RAID 1 mirror with two disks, the array uses both disks for all write
activities. To determine the workload that a RAID 1 mirror can support (i.e.,
the number of transfers per second), use the following equation: (disk
reads/sec + [2 * disk writes/sec])/(number of disks in the RAID array).
Today's RAID 1 mirrors use a two-disk configuration. Despite greater
availability, RAID 1 arrays support a slightly lower workload in a
write-intensive environment than systems with one hard disk. However, if your
Avg. Disk Queue Length divided by the number of disks in the array exceeds 2,
you have a serious bottleneck in a RAID 1 mirror.
The RAID 5 disk stripe with parity environment is similar to a RAID 0
stripe set for read-intensive environments. A RAID 5 array with five disks
supports almost five times as much workload and up to five times as many
outstanding disk requests as a one-disk system before becoming a bottleneck. To
calculate how many disk requests a RAID 5 array can support, use the following
formula: (disk reads/sec + [4 * disk writes/sec])/(number of disks
in the RAID array). A RAID 5 array's performance is different than a RAID 0
stripe set's performance because of additional disk activity associated with
parity generation. In a RAID 5 array, parity information is spread across all
the disks in the array for fault tolerance. To calculate this parity
information, each RAID 5 write operation reads the data block, reads the parity
block, logically exclusive Ors (XORs) the data, writes the data block, writes
the parity block, and so on for each single write operation. Thus, each write
request in a RAID 5 array incurs four disk operations. This parity generation
slows write operations in RAID 5 environments compared with RAID 0. However,
this parity information lets you continue operations if one of the disks in the
RAID 5 array fails. You can replace the failed disk and reconstruct the failed
disk's data on the new disk using parity information from the other disks in the
array.
You can use hardware-based RAID solutions to avoid the performance pitfall
associated with generating this parity information. Hardware-based RAID
controllers generate parity information using their own CPU, not the system's
CPU. As a result, a system using a hardware-based RAID solution can handle more
disk I/O operations than a software-based solution. An additional benefit of
offloading parity generation to a hardware-based RAID solution is that you can
recover and use processing power elsewhere on your system that might otherwise
be wasted on disk I/O parity.
Sizing Additional Disk Capacity for RAID Arrays
If you evaluate your RAID array's performance and determine that the array
is causing a bottleneck in your system, you can intelligently size additional
storage capacity. Without the information that the Performance Monitor counters
provide, you can only guess how much disk space you need to add to improve
performance.
Adding a RAID-based disk subsystem to NT can improve a system's
performance, availability, and manageability. However, you need to consider
fault-
tolerant support, cost, capacity, and performance when sizing RAID
subsystems.