White Paper by
The purpose of the RAID Initiative is to evaluate the effect of using different RAID levels with OnLine DSA 7.10 and to compare RAID performance to the performance of similar features offered in Informix DSA 7.10 and various operating systems. There are many uncertainties in the Informix community about how to best leverage the RAID technology available through our hardware partners and what RAID configurations are best for our customers. A sample of the questions I've seen recently from the field are:
The answer to these questions is almost invariably, "It depends". The proper disk configuration depends on how the data is to be used. A system that performs many simultaneous small, indexed queries and updates a small quantity of data frequently will cause much different disk contention problems than a system that requests large amounts of random data and updates data in batch infrequently. This white paper is intended to address many of the issues Informix engineers and customers are facing when they want to maximize the benefits of RAID in an Informix environment.
Why Use RAID?
Since disk I/O is the slowest part of a data management system, it may be the most important area to tune. In addition, since these disks usually store the only on-line copies of the data for these systems, their reliability is more stringent than any other part of the system. The motivation behind using RAID technology is therefore two-fold:
In some cases, both can actually be achieved while in other cases, one can be achieved at the expense of the other.
Informix's Online Dynamic Scaleable Architecture (DSA) uses advanced techniques to reduce the amount of disk I/O needed to perform database operations but the fact still remains, bringing data and index pages from disk into memory is one of the most frequent functions OnLine performs. RAID technology, when used correctly in combination with the Informix DSA, can significantly increase I/O performance and/or data reliability.
In 1988, David A. Peterson, Garth Gibson, and Randy H. Katz of the University of California at Berkeley published a paper entitled A Case for Redundant Arrays of Inexpensive Disks which outlined five disk array models, or RAID Levels. They labeled their models RAID Levels 1 through 5. Since inexpensive is a relative term, the industry has replaced the Inexpensive in RAID to be Independent because the disks comprising an array are independent units. The original RAID levels are:
Since the publication of the original Berkeley paper, a sixth RAID level has been described by the original authors. RAID Level 6 uses a second disk containing redundant information to provide protection against data loss due to double as well as single disk failures. Also described are combinations of RAID Levels now found in many commercial products. The most popular combination is RAID 10 which combines RAID Level 0 and RAID Level 1 in a single array that provides data reliability through RAID Level 1 and enhanced I/O performance through disk striping.
In addition, the term RAID Level 0 is often used to refer to disk striping because the data mapping is similar to that used in RAID implementations. Because there is no redundancy in disk striping, RAID Level 0 is not consistent with the RAID acronym, but the term is in common use and has been endorsed by the industry.
The most common RAID Levels implemented by our hardware partners are 0,1,3,5 and 10.
The RAID Levels discussed previously are extensions of the disk array concept. A disk array is a collection of disks controlled by Array Management Software. The Array Management Software controls the operation of the disks and presents them as one or more virtual disks to the host operating environment. Figure 1 illustrates the image a disk array presents to the operating environment.
Figure 1. General Model of a Disk Array
A virtual disk is functionally equivalent to a physical disk in the view of an application (or Informix). Its cost, availability and performance may be quite different, however. It is important to restate that Informix only sees the virtual disks which the Array Management Software presents. Therefore, other than the performance implications of using different RAID Levels for different purposes, Informix does not require any special setup to use RAID devices.
A series of benchmark tests was run to compare the relative performance of the most common RAID levels. A variation of the Wisconsin Benchmark was used because of its simplicity and its ability to be easily tailored to meet the specific demands of testing different disk configurations. This testing is intended to compare the relative performance of various RAID configurations with Informix OnLine and to help determine the proper Informix configuration parameter values to use for the different configurations. It is NOT intended to provide benchmark numbers for Informix or the hardware platform used.
The hardware configuration consisted of a Data General CLARiiON Disk Array connected to a Data General AViiON 9500 SMP computer. The CLARiiON Disk Array is capable of configuring RAID Levels 0,1,3,5, and 10 simultaneously.
Data General AViiON 9500
Data General CLARiiON Series 2000 Disk Array
The 1 GB disks connected to a single storage processor were used for all tests with the exception of the configuration which combines RAID Level 5 with fragmentation, which needed all 20 disks.
DG/UX 5.4Rev 3.10
The tests are organized in 5 major categories:
For each category, a series of tests was run and the response times recorded. Following is an explanation of the benchmark scripts. See Appendix A for a complete listing of the scripts. The single disk configuration was included as the control test to compare the various RAID levels. Refer to the Benchmark Summary for this comparison.
Disk striping is a performance-oriented technology and provides no redundancy for the data stored on its member disks. The failure of a single member of a RAID Level 0 array is therefore equivalent to the failure of the entire array. This dramatically decreases the Mean Time Before Failure (MTBF) compared to a single disk.
Disk striping, often called RAID 0, can be accomplished by binding a group of disks together in a disk array or by using the disk striping feature offered in many operating systems. DG/UX has this feature and the disk striping tests therefore consisted of both disk array and operating system tests.
To explain the concept of disk striping, we must first define some terms. A chunk is the amount of contiguous virtual disk storage mapped to contiguous storage on a single member disk in a disk array. A chunk is usually a number of disk sectors. The set of chunks in corresponding positions on all members of an array is called a stripe. Figure 2 illustrates how chunks are mapped from virtual disks to member disks.
Figure 2. Mapping Chunks of Data from Virtual Disk to Member Disks
Some Array Management Software allows you to specify the chunk size or stripe size. The chunk size can be an important factor in determining the performance of the stripe.
High Data Transfer Capacity
For application requests which specify large amounts of data (64KBytes or more), a small chunk size compared to the request size will result in parallel I/O requests across the member disks. Ideally, the chunk size should be set so that the average I/O request is split across all members of the array. With Informix, you cannot configure the number of pages OnLine reads in a single I/O request. The maximum size I/O request that Informix will perform is port dependent and is based on the MAXAIOSIZE kernel configuration parameter. In the absence of this parameter, 16 pages (32Kbytes or 64Kbytes, depending on page size) is the default. For ports with a page size of 4Kbytes or a large maximum value for MAXAIOSIZE, it may be possible to parallelize member disk requests for some application requests such as sequential scans. For ports with a page size of 2Kbytes or a small maximum MAXAIOSIZE, this is not a realistic goal.
High I/O Request Rate
Throughput-intensive applications do not usually require that a great deal of data be transferred, but do require a high rate of I/O request execution. Striped arrays of independently accessed disks can provide a very high throughput for these applications by automatically balancing the I/O load across the array's disks. Fortunately, Informix makes I/O requests asynchronously so that multiple requests may be outstanding at a single instant, taking advantage of this benefit of disk striping.
For applications that make large numbers of small I/O requests (less than 4kbytes) the data transfer time is a small part of overall execution time, so the increased software and access time of splitting the I/O request add up to more than the time saved by parallel data transfer. In this case it would be advantageous not to split the I/O request. For I/O request-intensive applications, the chunk size should be set so that the average I/O request has a small probability of being split across multiple array members. For Informix ports with a page size of 2K, a chunk size of 64KB or greater is adequate. For Informix ports with a page size of 4K, a chunk size of 128KB or greater is adequate.
Logical Volume Managers
It is extremely important to maintain proper alignment when creating the raw disk partitions that will be used to create Informix chunks. This can be accomplished by making the volume size a multiple of the chunk size and by starting the volume at a disk address that is a multiple of the chunk size. Application performance may be significantly worse if you don't maintain proper alignment. The following is an example of a logical volume (called a virtual disk in DG/UX) that is not properly aligned, and one that is properly aligned.
Logical Volume Not Properly Aligned
Disk name State Reg? Format Total blocks Free blocks
sd(ncsc(2,7),6,1) avail y vdisks 9912320 0
Name Role Address Size
<Various System Partitions> 0 121
disk1 (not multiple of 121 9912183 (not multiple of
128KB stripe size) 128KB stripe size)
<Various System Partitions> 9912304 16
Properly Aligned Logical Volume
Disk name State Reg? Format Total blocks Free blocks
sd(ncsc(2,7),6,1) avail y vdisks 9912320 0
Name Role Address Size
<Various System Partitions> 0 121
<maybe unwritable free space> 121 7
disk1 (multiple of 128KB 128 9912064 (multiple of 128KB
stripe size) stripe size)
<maybe unwritable free space> 9912192 112
<Various System Partitions> 9912304 16
Five RAID Level 0 configurations were evaluated:
No write cache was used for these configurations.
The DG/UX striping performed slightly better for the load tests and significantly worse for the application tests. The one major finding here is the significant difference between a 16 Kbyte stripe size and a 128 Kbyte stripe size. In particular, the update statistics, null scan and hash join tests were 30%, 20% and 20% faster respectively using a stripe size of 128 Kbytes than using a 16 Kbyte stripe size. Also, you can easily see the effect of properly aligning the logical volume. Performance is up to 20% faster when logical volume alignment is used. The RAID 5 tests will show even more dramatic numbers regarding logical volume alignment.
Applications for RAID Level 0 Arrays
A RAID Level 0 array can be particularly useful for:
RAID Level 0 Arrays are an excellent choice for temporary dbspaces. Since this data only lives for the duration of a transaction, high availability of the data is usually not an issue. If you decide to use RAID Level 0 for temporary dbspaces, make sure there are a sufficient number to ensure that subsequent transactions do not fail if one is down.
The primary benefit of disk mirroring is reliability. Depending on the application, performance may either be slightly better or slightly worse. Disk mirroring presents a very reliable single virtual disk whose capacity is equal to that of the smallest of its member disks, and whose performance is usually measurably better than that of a single disk for reads and slightly lower for writes. Disk mirroring can be implemented using a disk array subsystem, the operating system or Informix.
For request rate-intensive applications with a high percentage of reads in their I/O loads, disk mirroring can provide significant performance benefits. For read requests, the Array Management Software chooses which member disk should handle the request. Some implementations choose alternately or randomly among member disks for a simple form of load balancing. More sophisticated designs select the least-busy member to better balance the I/O load. Disk mirroring may also improve I/O performance for data transfer-intensive applications with a high percentage of reads if the Array Management Software is smart enough to split each read request so that both members participate in each read.
Disk Array vs. O/S vs. Informix
Informix uses a method called split reads, which read a data page from either the primary chunk or the mirror chunk, depending on which half of the chunk includes the address of the data page. Chances are, the operating system's and/or the disk array subsystem's method of choosing which disk to read from in a mirrored pair is more efficient than Informix's. The performance of a disk subsystem versus the operating system is dependent on the sophistication of the algorithm to determine which disk to read in each.
Because the host operating environment only needs to perform a single write to the disk array subsystem, disk subsystems should out-perform both Informix and operating system mirroring. Both Informix and the operating system must perform two physical I/Os in order to mirror the data.
Therefore, the order of implementing RAID Level 1 is:
1. Disk Subsystem, if available, then
Three RAID Level 1 configurations were evaluated:
Applications for RAID Level 1 Arrays
A RAID Level 1 array is suitable for data which reliability requirements are extremely high and for which the cost of storage is a secondary issue. Informix structures with such requirements include:
. root dbspace
RAID Level 5 is an independent access array that provides high availability at a fraction of the cost of disk mirroring and can increase performance for certain applications. Independent access arrays do not require that all member disks be accessed, and in particular, written, concurrently in the course of executing a single I/O request. Figure 3 illustrates a common mapping for a RAID Level 5 array, with parity distributed across all of the array's member disks.
Figure 3. Mapping for RAID Level 5 Array
In a RAID Level 5 array with N+1 members, each stripe of data has N data chunks and one parity chunk. RAID Level 5 parity is in the form of a bit-by-bit Exclusive OR function or corresponding data chunks from all of the data disks. The contents of any chunk of data on any one of the disks in the array can be regenerated from the contents of the corresponding chunks on the remaining disks in the array. In other words, any single disk in the array can fail without impairing the arrays ability to deliver data to applications.
RAID Level 5 arrays have two unique I/O performance characteristics:
Each write request to a RAID level 5 array requires that both the target data and its corresponding parity be read and rewritten. Applications that include large numbers of writes will exhibit poorer response when they write to a RAID Level 5 virtual disk than when they write to individual disks.
For read-only or read-mostly application I/O loads, RAID Level 5 should performance should be close to that of a RAID Level 0 array.
Because of the poor write performance associated with RAID Level 5 arrays, many hardware vendors have augmented RAID Level 5 technology with write cache. As you will see in the test results below, write cache makes a significant difference for Informix applications. The CLARiiON disk array is able to buffer the write requests from Informix and perform very efficient bulk writes to the member disks.
Four RAID Level 5 configurations were evaluated:
The RAID Level 5 Load Test shows the dramatic increase in I/O performance due to:
Load performance increased 20% by maintaining proper logical volume alignment and over 90% by using the CLARiiON's write cache! Load performance of the 16 Kbyte and 128 Kbyte stripe sizes were identical.
Application performance was also significantly affected by proper logical volume alignment. In particular, the update statistics, null scan and hash join tests performed an average of 25% better when alignment was maintained. It also appears that the write cache degrades the performance of read-only or read-mostly applications. Some application environments (like benchmarks) may actually allow you the flexibility to turn write cache ON during batch loading and turn it OFF during production or application testing. One last note is that, in general, a 16 Kbyte stripe size slightly outperformed a 128 Kbyte stripe size for the read-intensive applications.
Applications for RAID Level 5 Arrays
RAID Level 5 arrays perform best in applications with the following data characteristics:
Inquiry-type transaction processing is very well suited for RAID Level 5 arrays.
When RAID Level 5 technology is combined with cache to improve its write performance, the resulting arrays can be used in any applications where general purpose disks would be suitable.
The reason that fragmentation tests are included in this initiative is not to demonstrate the benefits of fragmentation. Fragmentation provides an intelligent method of distributing data in a manner that OnLine understands and can process intelligently. Fragmentation is not a substitute for RAID technology and RAID technology is not a substitute for fragmentation, but the two technologies, when combined and used correctly, can complement each other. At the same time, when combined incorrectly, they can also be extremely uncomplimentary. The reason that fragmentation tests are included in this initiative is to prove that these can be both complementary and disparaging technologies.
Customers are faced with the same challenges when evaluating disk technologies whether they fragment their data or not. They want their I/O subsystem to provide:
consistent with the requirements of their applications at a reasonable cost. The various RAID levels can provide the same benefits to our customers that fragment their data as to those who don't. RAID Levels 1, 3 and 5 all provide varying degrees of high-availability and performance for fragmented data at different costs. As the size of our customers' databases become larger, this combination makes more and more sense. A 500 gigabyte database can have 100, 5 gigabytes stripes for which to spread it's table fragments.
The ideal situation is to have a mirrored pair dedicated to each fragment. This will provide the highest degree of availability with read performance slightly better than that of single disks and write performance slightly less. However, it is also the most expensive and many customers choose RAID 5 as an alternative.
With any form of disk striping, proper care must be taken to ensure that the benefits of fragmentation are not negated by the cyclical mapping of the stripes data chunks to the member disks. To achieve this goal, multiple fragments of the same table should not reside on the same stripe. This rule is no different than with single disks. For load balancing, though, it may be advisable to place fragments from separate tables on the same stripe in an independent access array (RAID 0 or 5).
Three fragmented disk configurations were used to test fragmentation with RAID:
The RAID Level 5 fragments utilized the CLARiiON write cache. All tests used round-robin fragmentation strategy.
The load tests all performed similarly with the RAID Level 1 configuration slightly higher than the single disk configuration and the RAID Level 5 configuration slightly lower. Once again, the write cache is a clear advantage with RAID Level 5.
The application tests were somewhat sporadic. As expected the RAID Level 1 configuration was consistently slightly better than the single disk configuration. The RAID Level 5 configuration, however, ranged from slightly better to considerably worse. The performance of a fragmented RAID 5 stripe is highly dependent on the Informix application being performed. This is most likely due to the very small I/O requests Informix submits when fragmenting data round-robin. Subsequent versions of this document will investigate this theory further.
RAID technology, when used correctly, can provide significant benefits to Informix customers. When used incorrectly, it can present a serious I/O bottleneck that often requires a complete data rebuild to rectify. Knowing how to efficiently use RAID technology can also make or break a competitive benchmark. This paper provides some guidelines and benchmark data that can be used when defining a database architecture that includes RAID technology.
The graphs below summarize the best times for each of the RAID levels tested. Also included is a RAID Level 3 test for comparison.
Based on discussions in the previous sections, there should be nothing startling about the summary results. RAID Level 0 performs the best in both read and write intensive environments. RAID Level 1 performs slightly worse for writes and slightly better for reads. With the comparatively small I/O's requested by Informix, RAID Level 3 was not expected to perform well, and therefore, was not tested extensively. The summary graphs substantiate this theory. Until Informix changes its maximum I/O request size, RAID Level 5 should outperform RAID Level 3. RAID Level 3 technology will become increasingly more important as more multi-media applications appear. RAID Level 5 with write cache is a big win. The performance of RAID Level 5 without write cache may be a serious bottleneck for write-intensive applications. Fragmentation and RAID are definitely compatible and should be used whenever necessary.
The tests presented in this paper also magnify the benefits of a full-feature, flexible disk array. Because different functions of OnLine have different I/O requirements, a disk array that can support multiple RAID levels becomes particularly useful. For example, within the same disk array, we could configure separate RAID Level 1 pairs for the root dbspace, logical logs and physical logs, RAID Level 0 for the temporary dbspaces, RAID Level 5 for the production data and in the future, RAID Level 3 for large binary images. Appendix B lists many of the disk array offerings.