You know an application slows after it has been in production, but do you know why? Sure, the number of transactions has increased slightly and there are more rows in the tables than before, but is that enough to account for the performance degradation? In many of these cases, the reason for performance degradation is "hidden." This is largely due to database disorganization. Database disorganization is scattered free space. It is a condition where a database's logical and physical storage allocations contain many scattered areas of storage that are too small, not physically contiguous, or too disorganized to be used productively.
Disorganization poses a key need for database maintenance and tuning because it negatively affects both the logical and physical storage of a database. It hinders logical access paths and it heightens physical I/O seek time as well as the number of physical reads required to retrieve data. Disorganization can degrade database performance and, in some cases, lead to complete database failure. This article examines the causes of disorganization in INFORMIX Online databases and offers some reorganization solutions. While most of the research was conducted against Online Version 7, everything discussed in this article applies to Online Versions 5 through 8.
JUST THE FACTS
The building block for Online storage is a data page, usually 2 kilobytes or 4 kilobytes. Blank pages are created when an administrator creates a chunk. Each page has a header area, a slot-table and a portion for storing data. The header area is used for statistic storage and integrity validation. The slot-table contains the rowids and pointers to the row pieces stored in the data area.
Pages are allocated to objects in extents . An extent is a contiguous (sequential) set of pages. The number of pages in an extent is determined by the DDL when creating an object. When initially adding rows to a table, the pages are filled sequentially. For tables with clustered indexes, the fill factor is monitored and the appropriate amount of space is left free in each page.
To reduce the amount of physical I/O required, rows are not unnecessarily broken across pages. Any row shorter than the size of a page is always stored in one piece. Online considers a page full when the free byte count is less than the number of bytes required to store a row. A row that is larger than a page is broken into pieces by Online. The terms used for the pages this row spans are:
When a row only uses two pages, the Home Page has a four-byte pointer to the Remainder Page. For more than two pages, the Home Page and each Big Remainder Page thereafter have forwarding pointers to the next data portion of the row. Please note that these pointers are at the page level, not the row level.
When updated, Online returns a modified row to its current location and maintains the same slot table entry if the row size has not changed. If the row size becomes smaller due to the modification, Online changes the size of the slot-table entry to reflect the new row size. If a row does not fit after being modified, the row is stored in another location on the same page and the slot-table entry changes to reflect the new starting offset and row length.
When a modified row is shorter than one page, but the current page cannot allocate it, a four-byte forward pointer, containing the new rowid is stored on the row home page. The modified row retains its original rowid but is stored on a different page.
The performance curve
Several types of physical degradation can occur in a database to affect performance, including partial empty pages, unused extent space, extent interleaving, index node separation, and row migration. Let's examine these types of physical degradation and their effect on Online performance:
Partial Empty Pages
Many types of rows, particularly meta-data and summary rows, change size over time. This change occurs when variable length fields, such as VARCHAR, get updated. As mentioned in the previous section, there are three different scenarios that may apply here:
Online is considerate of these partial empty pages, and at present is the only open systems database that performs page compression to reclaim wasted space within pages. To perform page compression, Online copies the entire page into memory, and shuffles the rows around the page to make all the free space contiguous. Unfortunately for end users, page compression is triggered when an UPDATE is requested on a row and there is not enough contiguous space (but enough free space) for the updated row. This means the user must wait for the compression to occur.
In indexes, both UPDATE and DELETE commands cause partial empty pages. UPDATEs cause partial empty pages when an indexed value changes and DELETEs cause free space. In other databases, the accepted term for this is unbalanced indexes . An obvious type of index that suffers from unbalancing are dates that change. Many systems have summary columns indexed that contain "date last updated," or some value that keeps changing. These are the worst possible types of indexes from a physical storage point of view.
As with data pages, index pages also undergo compression when Online wants to reclaim space. The criteria for index node compression is more strict (to avoid numerous compressions from occurring). An index node must have two or fewer items to be a candidate for compression.
Unused Extent Space
One of the more dynamic features of Online is the ability to free extents if they are not used anymore. In reality, this feature does not have a chance to assist the DBA, as everything has to be removed from the entire extent for it to be freed. This type of wasted space is very common in financial databases, where old transactions are deleted by month or year and new ones added gradually. If it was possible to keep data within the smallest number of possible pages, the data would be more closely packed, requiring less I/O to process requests against the table.
Another consideration when dealing with extents with a large amount of unused space is the impact all the free pages have on the free list. If the free list is excessively long, this can also increase the amount of I/O required for UPDATE and will increase the probability of page compression occurring.
When multiple objects exist in the same dbspace, their extents may become physically interleaved. For the majority of Online sites where the dbspaces are using raw devices, extent interleaving causes slowing of sequential activity within the dbspace. Sequential access to tables and indexes occurs with surprising frequency in most applications, and therefore can pose a significant performance issue.
Index Node Separation
This form of physical degradation has the most severe effect on performance. It occurs in both indexes and tables with clustered indexes. It also has a severe effect on tables without a cluster index.
Sequential index scans are one of the most common activities performed on a database. Every time a binary search is initiated through an equivalence operator, such as greater than, less than or not equal to (>, < , <>), an index B-Tree is sequentially traversed. Many sort operations can be done through a sequential scan of an index depending on how an object's statistics tell Online to do it.
As indexes grow, the index node (a page of index entries) becomes full and has to split to accommodate additional values. When an index node splits, it links in a new page and moves half of the existing index entries from the original node to it. The new entry can then be inserted successfully into the original node.
Using an example from the Online Administrator's Guide, we can graphically see this split occurring. As shown in Diagram , Node 3 is full and a new data value (Groovy) is about to be inserted. Online tables node number 3, removes half of its contents to Node 6 and places the new value into the original node. In summary, when a leaf node gets full, Online adds another leaf node and splits the entries between the two. When a root node gets full, Online splits the contents of this node into two branch nodes, retaining two pointers -- one to the left branch and the other to the right. Splitting off index pages may not be physically adjacent on disk.
*** Diagram ***
Now, imagine that instead of page 6, it was page 12,000,000. In addition, imagine this splitting has occurred 100,000 times in your index. When old index nodes get continually split, the sequential order of the nodes is nothing like their order on disk. This greatly increases the amount of seek time required to sequentially traverse the index and will have a major impact on performance.
As previously mentioned, when an UPDATE is performed on a row and it will not fit back into its home page, the row is relocated to a remainder page. This remainder page is the next available page in the current extent and is probably not physically adjacent to the home page for the block.
Extending this example to when 30 percent of the rows are relocated, the seek time required to sequentially scan the table becomes an enormous performance problem.
Table scans are usually less common than index scans, but still occur. In applications where a large amount of ad hoc reporting is to be supported (such as data warehouses and decision support systems), large numbers of table scans are common.
Why would Online ALLOW the physical structure TO DEGRADE?
There is a simple and prudent reason why this degradation occurs: performance. If Online was to continually perform housekeeping during UPDATE or DELETE activity, the OLTP performance of the database would drop substantially. Some of our larger Online customers have noticed that when Page Compression is continually occurring (during UPDATE activity) end user performance suffers. This type of housekeeping, however, keeps Online from suffering block fragmentation problems.
What are some realistic management techniques?
INFORMIX provides recommendations to manage physical degradation problems in the performance guide. These recommendations perform different kinds of physical reorganization of the data and are outlined below:
Obviously the above techniques are not a total waste of time. Online DBAs have managed to survive for years with production databases by following these techniques. In fact, many shops permanently maintain a clustered index on every table and have scripts that periodically drop and recreate the indexes (clustered indexes first, of course). These recommendations, however, do not provide INFORMIX DBAs with an efficient, safe or practical method of reorganizing data to repair the damage inflicted by routine database operation.
CONCLUSION: How can we improve the situation?
There are several factors that are influencing the need for further reorganization capabilities beyond what is offered by INFORMIX. These include:
The after-market tool vendors offer faster and more efficient alternatives. They have built tools with stringent methodologies that automate, ensure integrity and fully resolve all forms of physical degradation mentioned in this article. Designed for large mission-critical databases these products improve the efficiency of databases by restructuring data that have become disorganized due to heavy use without using the time-consuming, error-prone procedure of rebuilding the entire database structure. These solutions also improve speed and efficiency of database functions by reorganizing databases, dbspaces, tables, and indexes; reduce errors associated with manually recreating databases; and maximize data accessibility by minimizing the time which data is unavailable during a reorganization.