Informix Logo



RAID and Informix Databases

White Paper by
Ken Bendix 
Advanced Technology Group
June 1995


INTRODUCTION

The purpose of the RAID Initiative is to evaluate the effect of using different RAID levels with OnLine DSA 7.10 and to compare RAID performance to the performance of similar features offered in Informix DSA 7.10 and various operating systems. There are many uncertainties in the Informix community about how to best leverage the RAID technology available through our hardware partners and what RAID configurations are best for our customers. A sample of the questions I've seen recently from the field are: 

  •  "Does it make sense to use RAID if I am also using DSA fragmentation?"

  •  "I would like to mirror my data. Which is better, hardware mirroring, O/S mirroring or Informix mirroring?"

  •  "What is the difference between RAID 5 and RAID 3 and which performs better?"

The answer to these questions is almost invariably, "It depends". The proper disk configuration depends on how the data is to be used. A system that performs many simultaneous small, indexed queries and updates a small quantity of data frequently will cause much different disk contention problems than a system that requests large amounts of random data and updates data in batch infrequently. This white paper is intended to address many of the issues Informix engineers and customers are facing when they want to maximize the benefits of RAID in an Informix environment.

Why Use RAID?

Since disk I/O is the slowest part of a data management system, it may be the most important area to tune. In addition, since these disks usually store the only on-line copies of the data for these systems, their reliability is more stringent than any other part of the system. The motivation behind using RAID technology is therefore two-fold:

  •  to improve the I/O performance of the disk subsystem

  •  to provide reliable access to on-line data 

In some cases, both can actually be achieved while in other cases, one can be achieved at the expense of the other.

Informix's Online Dynamic Scaleable Architecture (DSA) uses advanced techniques to reduce the amount of disk I/O needed to perform database operations but the fact still remains, bringing data and index pages from disk into memory is one of the most frequent functions OnLine performs. RAID technology, when used correctly in combination with the Informix DSA, can significantly increase I/O performance and/or data reliability.

RAID Overview

In 1988, David A. Peterson, Garth Gibson, and Randy H. Katz of the University of California at Berkeley published a paper entitled A Case for Redundant Arrays of Inexpensive Disks which outlined five disk array models, or RAID Levels. They labeled their models RAID Levels 1 through 5. Since inexpensive is a relative term, the industry has replaced the Inexpensive in RAID to be Independent because the disks comprising an array are independent units. The original RAID levels are:

  •  RAID Level 1, also known as disk mirroring, protects against disk failure by replicating all data stored at least once. It offers extremely high data reliability at a relatively high cost. For some I/O-intensive applications, a RAID Level 1 array can improve performance significantly over a single disk.

  •  RAID Level 2 provides redundancy through Hamming Coding. Data and an error detection code are interleaved across several disks at the bit level. The correction code is the Hamming code used for error correction in RAMs. Because the Hamming code is used for both error detection and correction, RAID Level 2 does not make full use of the extensive error detection capabilities commonly built into disks. Properties of the Hamming code also restrict the configurations possible for RAID Level 2 arrays. Therefore, RAID Level 2 has not been widely implemented in commercially available products.

  •  RAID Level 3 uses a parity disk to store redundant information about the data on several data disks. RAID Level 3 relies on close coordination of member disk activities. RAID Level 3 is optimal for applications in which large blocks of sequential data must be transferred quickly but is not well suited for transaction processing.

  •  RAID Level 4 also uses a parity disk to store redundant information about the data on several data disks. The difference between RAID Level 3 and RAID Level 4 is that RAID Level 3 operates the array member disks in unison, while RAID Level 4 operates its disks independently. 

  •  RAID Level 5 uses storage capacity equivalent to that of one disk in an array to store the parity of the user data stored on the array's remaining disks. It differs from RAID Level 3 in that the arrays disks operate independently of each other, and in that the redundant information is distributed across all disks in the array. RAID Level 5 offers data reliability approaching that of mirroring, with read performance benefits similar to those of striping. Unless a cache is used, there can be a substantial performance penalty compared to a single disk when data is written. RAID Level 5 is well suited for applications whose I/O loads consist predominantly of a large number of asynchronous read requests.

Since the publication of the original Berkeley paper, a sixth RAID level has been described by the original authors. RAID Level 6 uses a second disk containing redundant information to provide protection against data loss due to double as well as single disk failures. Also described are combinations of RAID Levels now found in many commercial products. The most popular combination is RAID 10 which combines RAID Level 0 and RAID Level 1 in a single array that provides data reliability through RAID Level 1 and enhanced I/O performance through disk striping.

In addition, the term RAID Level 0 is often used to refer to disk striping because the data mapping is similar to that used in RAID implementations. Because there is no redundancy in disk striping, RAID Level 0 is not consistent with the RAID acronym, but the term is in common use and has been endorsed by the industry.

The most common RAID Levels implemented by our hardware partners are 0,1,3,5 and 10.

The RAID Levels discussed previously are extensions of the disk array concept. A disk array is a collection of disks controlled by Array Management Software. The Array Management Software controls the operation of the disks and presents them as one or more virtual disks to the host operating environment. Figure 1 illustrates the image a disk array presents to the operating environment.

Figure 1. General Model of a Disk Array

A virtual disk is functionally equivalent to a physical disk in the view of an application (or Informix). Its cost, availability and performance may be quite different, however. It is important to restate that Informix only sees the virtual disks which the Array Management Software presents. Therefore, other than the performance implications of using different RAID Levels for different purposes, Informix does not require any special setup to use RAID devices.

Benchmark Overview

A series of benchmark tests was run to compare the relative performance of the most common RAID levels. A variation of the Wisconsin Benchmark was used because of its simplicity and its ability to be easily tailored to meet the specific demands of testing different disk configurations. This testing is intended to compare the relative performance of various RAID configurations with Informix OnLine and to help determine the proper Informix configuration parameter values to use for the different configurations. It is NOT intended to provide benchmark numbers for Informix or the hardware platform used. 

Hardware

The hardware configuration consisted of a Data General CLARiiON Disk Array connected to a Data General AViiON 9500 SMP computer. The CLARiiON Disk Array is capable of configuring RAID Levels 0,1,3,5, and 10 simultaneously.

Data General AViiON 9500
8 CPUs 
1GB Main Memory
2 I/O Controllers
1GB Internal Disk 

Data General CLARiiON Series 2000 Disk Array 
10, 2GB disks
10, 1GB disks
2 Storage Processors
64MB Cache 

The 1 GB disks connected to a single storage processor were used for all tests with the exception of the configuration which combines RAID Level 5 with fragmentation, which needed all 20 disks.

Software

DG/UX 5.4Rev 3.10
Informix OnLine7.10UC2
Wisconsin Benchmark Scripts 

Benchmark Tests

The tests are organized in 5 major categories:

  •  Single Disk

  • · Striping

  • · Mirroring

  • · RAID 5

  • · Fragmentation

For each category, a series of tests was run and the response times recorded. Following is an explanation of the benchmark scripts. See Appendix A for a complete listing of the scripts. The single disk configuration was included as the control test to compare the various RAID levels. Refer to the Benchmark Summary for this comparison.

1) Load Script 1,000,000 rows were loaded into a single table
2) Update Statistics Script Update statistics medium for the table
3) Create Index Script Create 1 index on 1 column
4) Null Join Script Join non-indexed columns from table and return 0 rows
5) Null Scan Scan entire table and return 0 rows
6) Indexed Join Join indexed columns from table and return 0 rows
7) Hash Join Straight Hash Join, return 0 rows
8) Hash Join Group By Hash Join with aggregate and group by, return 0 rows

Note that the intent was to utilize the disk(s) as much as possible and therefore, the amount of data returned by the queries was insignificant and could only cloud the response times.

The disk array was used to store the test data only. The root dbspace, all temp dbspaces and the index dbspace were on separate internal disks and were constant for all RAID configuration tests.

Disk Striping

Disk striping is a performance-oriented technology and provides no redundancy for the data stored on its member disks. The failure of a single member of a RAID Level 0 array is therefore equivalent to the failure of the entire array. This dramatically decreases the Mean Time Before Failure (MTBF) compared to a single disk.

Disk striping, often called RAID 0, can be accomplished by binding a group of disks together in a disk array or by using the disk striping feature offered in many operating systems. DG/UX has this feature and the disk striping tests therefore consisted of both disk array and operating system tests.

To explain the concept of disk striping, we must first define some terms. A chunk is the amount of contiguous virtual disk storage mapped to contiguous storage on a single member disk in a disk array. A chunk is usually a number of disk sectors. The set of chunks in corresponding positions on all members of an array is called a stripe. Figure 2 illustrates how chunks are mapped from virtual disks to member disks.

Figure 2. Mapping Chunks of Data from Virtual Disk to Member Disks

Some Array Management Software allows you to specify the chunk size or stripe size. The chunk size can be an important factor in determining the performance of the stripe. 

High Data Transfer Capacity

For application requests which specify large amounts of data (64KBytes or more), a small chunk size compared to the request size will result in parallel I/O requests across the member disks. Ideally, the chunk size should be set so that the average I/O request is split across all members of the array. With Informix, you cannot configure the number of pages OnLine reads in a single I/O request. The maximum size I/O request that Informix will perform is port dependent and is based on the MAXAIOSIZE kernel configuration parameter. In the absence of this parameter, 16 pages (32Kbytes or 64Kbytes, depending on page size) is the default. For ports with a page size of 4Kbytes or a large maximum value for MAXAIOSIZE, it may be possible to parallelize member disk requests for some application requests such as sequential scans. For ports with a page size of 2Kbytes or a small maximum MAXAIOSIZE, this is not a realistic goal.

High I/O Request Rate

Throughput-intensive applications do not usually require that a great deal of data be transferred, but do require a high rate of I/O request execution. Striped arrays of independently accessed disks can provide a very high throughput for these applications by automatically balancing the I/O load across the array's disks. Fortunately, Informix makes I/O requests asynchronously so that multiple requests may be outstanding at a single instant, taking advantage of this benefit of disk striping. 

For applications that make large numbers of small I/O requests (less than 4kbytes) the data transfer time is a small part of overall execution time, so the increased software and access time of splitting the I/O request add up to more than the time saved by parallel data transfer. In this case it would be advantageous not to split the I/O request. For I/O request-intensive applications, the chunk size should be set so that the average I/O request has a small probability of being split across multiple array members. For Informix ports with a page size of 2K, a chunk size of 64KB or greater is adequate. For Informix ports with a page size of 4K, a chunk size of 128KB or greater is adequate.

Logical Volume Managers

It is extremely important to maintain proper alignment when creating the raw disk partitions that will be used to create Informix chunks. This can be accomplished by making the volume size a multiple of the chunk size and by starting the volume at a disk address that is a multiple of the chunk size. Application performance may be significantly worse if you don't maintain proper alignment. The following is an example of a logical volume (called a virtual disk in DG/UX) that is not properly aligned, and one that is properly aligned.

Logical Volume Not Properly Aligned

Disk name                         State   Reg? Format Total blocks  Free blocks

sd(ncsc(2,7),6,1)                 avail      y      vdisks    9912320            0

 Name                                                  Role        Address              Size

 <Various System Partitions>                                      0               121

disk1                                      (not multiple of          121       9912183 (not multiple of

                                                128KB stripe size)                                   128KB stripe size)

<Various System Partitions>                            9912304                16

 

Properly Aligned Logical Volume

Disk name                         State   Reg? Format Total blocks  Free blocks

sd(ncsc(2,7),6,1)                 avail      y       vdisks    9912320            0

   Name                                                Role          Address              Size

   <Various System Partitions>                                      0               121

   <maybe unwritable free space>                               121                  7

   disk1                                     (multiple of 128KB    128      9912064 (multiple of 128KB

                                                   stripe size)                                             stripe size)

   <maybe unwritable free space>                        9912192              112

   <Various System Partitions>                           9912304                16

Test Results

Five RAID Level 0 configurations were evaluated:

  • 16 Kbyte stripe size, no alignment

  •  16 Kbyte stripe size, logical volume aligned

  •  128 Kbyte stripe size, logical volume aligned

  •  128 Kbyte DG/UX stripe, logical volume aligned

  •  16 Kbyte DG/UX stripe, logical volume aligned

No write cache was used for these configurations.

The DG/UX striping performed slightly better for the load tests and significantly worse for the application tests. The one major finding here is the significant difference between a 16 Kbyte stripe size and a 128 Kbyte stripe size. In particular, the update statistics, null scan and hash join tests were 30%, 20% and 20% faster respectively using a stripe size of 128 Kbytes than using a 16 Kbyte stripe size. Also, you can easily see the effect of properly aligning the logical volume. Performance is up to 20% faster when logical volume alignment is used. The RAID 5 tests will show even more dramatic numbers regarding logical volume alignment.

Applications for RAID Level 0 Arrays

A RAID Level 0 array can be particularly useful for:

  •  storing program image libraries or run-time libraries for rapid loading

  •  storing large tables of read-only data for rapid application access

  •  collecting data from external sources at very high data transfer rate

RAID Level 0 Arrays are an excellent choice for temporary dbspaces. Since this data only lives for the duration of a transaction, high availability of the data is usually not an issue. If you decide to use RAID Level 0 for temporary dbspaces, make sure there are a sufficient number to ensure that subsequent transactions do not fail if one is down. 

Mirroring

The primary benefit of disk mirroring is reliability. Depending on the application, performance may either be slightly better or slightly worse. Disk mirroring presents a very reliable single virtual disk whose capacity is equal to that of the smallest of its member disks, and whose performance is usually measurably better than that of a single disk for reads and slightly lower for writes. Disk mirroring can be implemented using a disk array subsystem, the operating system or Informix.

Performance

For request rate-intensive applications with a high percentage of reads in their I/O loads, disk mirroring can provide significant performance benefits. For read requests, the Array Management Software chooses which member disk should handle the request. Some implementations choose alternately or randomly among member disks for a simple form of load balancing. More sophisticated designs select the least-busy member to better balance the I/O load. Disk mirroring may also improve I/O performance for data transfer-intensive applications with a high percentage of reads if the Array Management Software is smart enough to split each read request so that both members participate in each read.

Disk Array vs. O/S vs. Informix

Informix uses a method called split reads, which read a data page from either the primary chunk or the mirror chunk, depending on which half of the chunk includes the address of the data page. Chances are, the operating system's and/or the disk array subsystem's method of choosing which disk to read from in a mirrored pair is more efficient than Informix's. The performance of a disk subsystem versus the operating system is dependent on the sophistication of the algorithm to determine which disk to read in each.

Because the host operating environment only needs to perform a single write to the disk array subsystem, disk subsystems should out-perform both Informix and operating system mirroring. Both Informix and the operating system must perform two physical I/Os in order to mirror the data.

Therefore, the order of implementing RAID Level 1 is:

1. Disk Subsystem, if available, then
2. Operating System, if available, then
3. Informix

Test Results

Three RAID Level 1 configurations were evaluated:

· CLARiiON Mirroring
· DG/UX Mirroring
· Informix Mirroring

No write cache was used for these configurations.

Based on the discussion above, there is no surprise with the results below. The Informix and DG/UX mirroring were about 10% slower than the CLARiiON mirroring for loading. One interesting statistic, though, is the extremely good performance of the DG/UX mirroring for the read-intensive tests. DG/UX mirroring is approximately 10% faster than both the CLARiiON and Informix.

Applications for RAID Level 1 Arrays

A RAID Level 1 array is suitable for data which reliability requirements are extremely high and for which the cost of storage is a secondary issue. Informix structures with such requirements include:

· root dbspace
· logical logs
· physical logs

RAID 5

RAID Level 5 is an independent access array that provides high availability at a fraction of the cost of disk mirroring and can increase performance for certain applications. Independent access arrays do not require that all member disks be accessed, and in particular, written, concurrently in the course of executing a single I/O request. Figure 3 illustrates a common mapping for a RAID Level 5 array, with parity distributed across all of the array's member disks. 

Figure 3. Mapping for RAID Level 5 Array

In a RAID Level 5 array with N+1 members, each stripe of data has N data chunks and one parity chunk. RAID Level 5 parity is in the form of a bit-by-bit Exclusive OR function or corresponding data chunks from all of the data disks. The contents of any chunk of data on any one of the disks in the array can be regenerated from the contents of the corresponding chunks on the remaining disks in the array. In other words, any single disk in the array can fail without impairing the arrays ability to deliver data to applications.

Performance

RAID Level 5 arrays have two unique I/O performance characteristics:

  •  Because writes to the array consist of multiple member writes, performance is strongly dependent on the percentages of reads and writes in the I/O load.

  • Since their members operate independently, RAID Level 5 arrays are more suitable for I/O request-intensive applications than for data transfer-intensive ones.

Each write request to a RAID level 5 array requires that both the target data and its corresponding parity be read and rewritten. Applications that include large numbers of writes will exhibit poorer response when they write to a RAID Level 5 virtual disk than when they write to individual disks. 

For read-only or read-mostly application I/O loads, RAID Level 5 should performance should be close to that of a RAID Level 0 array. 

Write Cache

Because of the poor write performance associated with RAID Level 5 arrays, many hardware vendors have augmented RAID Level 5 technology with write cache. As you will see in the test results below, write cache makes a significant difference for Informix applications. The CLARiiON disk array is able to buffer the write requests from Informix and perform very efficient bulk writes to the member disks. 

Test Results

Four RAID Level 5 configurations were evaluated:

  •  16 Kbyte stripe size with write cache.

  •  16 Kbyte stripe size with write cache, logical volume aligned.

  •  128 Kbyte stripe size, logical volume aligned, no write cache.

  •  128 Kbyte stripe size with write cache, logical volume aligned.

The RAID Level 5 Load Test shows the dramatic increase in I/O performance due to:

  •  Proper logical volume alignment

  •  Write Cache

Load performance increased 20% by maintaining proper logical volume alignment and over 90% by using the CLARiiON's write cache! Load performance of the 16 Kbyte and 128 Kbyte stripe sizes were identical.

Application performance was also significantly affected by proper logical volume alignment. In particular, the update statistics, null scan and hash join tests performed an average of 25% better when alignment was maintained. It also appears that the write cache degrades the performance of read-only or read-mostly applications. Some application environments (like benchmarks) may actually allow you the flexibility to turn write cache ON during batch loading and turn it OFF during production or application testing. One last note is that, in general, a 16 Kbyte stripe size slightly outperformed a 128 Kbyte stripe size for the read-intensive applications.

Applications for RAID Level 5 Arrays

RAID Level 5 arrays perform best in applications with the following data characteristics:

  • data whose enhanced availability is worth protecting, but for which the value of full disk mirroring is questionable

  •  high read request rates

  •  small percentage of writes in I/O load

Inquiry-type transaction processing is very well suited for RAID Level 5 arrays.

When RAID Level 5 technology is combined with cache to improve its write performance, the resulting arrays can be used in any applications where general purpose disks would be suitable.

Fragmentation

The reason that fragmentation tests are included in this initiative is not to demonstrate the benefits of fragmentation. Fragmentation provides an intelligent method of distributing data in a manner that OnLine understands and can process intelligently. Fragmentation is not a substitute for RAID technology and RAID technology is not a substitute for fragmentation, but the two technologies, when combined and used correctly, can complement each other. At the same time, when combined incorrectly, they can also be extremely uncomplimentary. The reason that fragmentation tests are included in this initiative is to prove that these can be both complementary and disparaging technologies.

Customers are faced with the same challenges when evaluating disk technologies whether they fragment their data or not. They want their I/O subsystem to provide:

  •  I/O performance and 

  •  access to on-line data at levels of reliability 

consistent with the requirements of their applications at a reasonable cost. The various RAID levels can provide the same benefits to our customers that fragment their data as to those who don't. RAID Levels 1, 3 and 5 all provide varying degrees of high-availability and performance for fragmented data at different costs. As the size of our customers' databases become larger, this combination makes more and more sense. A 500 gigabyte database can have 100, 5 gigabytes stripes for which to spread it's table fragments.

The ideal situation is to have a mirrored pair dedicated to each fragment. This will provide the highest degree of availability with read performance slightly better than that of single disks and write performance slightly less. However, it is also the most expensive and many customers choose RAID 5 as an alternative. 

With any form of disk striping, proper care must be taken to ensure that the benefits of fragmentation are not negated by the cyclical mapping of the stripes data chunks to the member disks. To achieve this goal, multiple fragments of the same table should not reside on the same stripe. This rule is no different than with single disks. For load balancing, though, it may be advisable to place fragments from separate tables on the same stripe in an independent access array (RAID 0 or 5).

Test Results

Three fragmented disk configurations were used to test fragmentation with RAID:

  •  4 Single Disk Fragments

  •  4 RAID Level 1 Fragments

  •  4 RAID Level 5 Fragments

The RAID Level 5 fragments utilized the CLARiiON write cache. All tests used round-robin fragmentation strategy.

The load tests all performed similarly with the RAID Level 1 configuration slightly higher than the single disk configuration and the RAID Level 5 configuration slightly lower. Once again, the write cache is a clear advantage with RAID Level 5.

The application tests were somewhat sporadic. As expected the RAID Level 1 configuration was consistently slightly better than the single disk configuration. The RAID Level 5 configuration, however, ranged from slightly better to considerably worse. The performance of a fragmented RAID 5 stripe is highly dependent on the Informix application being performed. This is most likely due to the very small I/O requests Informix submits when fragmenting data round-robin. Subsequent versions of this document will investigate this theory further.

Summary

RAID technology, when used correctly, can provide significant benefits to Informix customers. When used incorrectly, it can present a serious I/O bottleneck that often requires a complete data rebuild to rectify. Knowing how to efficiently use RAID technology can also make or break a competitive benchmark. This paper provides some guidelines and benchmark data that can be used when defining a database architecture that includes RAID technology.

The graphs below summarize the best times for each of the RAID levels tested. Also included is a RAID Level 3 test for comparison.

Conclusions

Based on discussions in the previous sections, there should be nothing startling about the summary results. RAID Level 0 performs the best in both read and write intensive environments. RAID Level 1 performs slightly worse for writes and slightly better for reads. With the comparatively small I/O's requested by Informix, RAID Level 3 was not expected to perform well, and therefore, was not tested extensively. The summary graphs substantiate this theory. Until Informix changes its maximum I/O request size, RAID Level 5 should outperform RAID Level 3. RAID Level 3 technology will become increasingly more important as more multi-media applications appear. RAID Level 5 with write cache is a big win. The performance of RAID Level 5 without write cache may be a serious bottleneck for write-intensive applications. Fragmentation and RAID are definitely compatible and should be used whenever necessary. 

The tests presented in this paper also magnify the benefits of a full-feature, flexible disk array. Because different functions of OnLine have different I/O requirements, a disk array that can support multiple RAID levels becomes particularly useful. For example, within the same disk array, we could configure separate RAID Level 1 pairs for the root dbspace, logical logs and physical logs, RAID Level 0 for the temporary dbspaces, RAID Level 5 for the production data and in the future, RAID Level 3 for large binary images. Appendix B lists many of the disk array offerings.

 


Украинская баннерная сеть
 

[Home]

Сайт поддерживается группой пользователей Информикс на Украине.

Hosted by NO-more.