Informix Logo



Maximizing Performance of INFORMIX Online Databases

Through Reorganization

Sean Kennedy,
Product Manager,
PLATINUM technology, inc.
 


You know an application slows after it has been in production, but do you know why? Sure, the number of transactions has increased slightly and there are more rows in the tables than before, but is that enough to account for the performance degradation? In many of these cases, the reason for performance degradation is "hidden." This is largely due to database disorganization. Database disorganization is scattered free space. It is a condition where a database's logical and physical storage allocations contain many scattered areas of storage that are too small, not physically contiguous, or too disorganized to be used productively.  

Disorganization poses a key need for database maintenance and tuning because it negatively affects both the logical and physical storage of a database. It hinders logical access paths and it heightens physical I/O seek time as well as the number of physical reads required to retrieve data. Disorganization can degrade database performance and, in some cases, lead to complete database failure. This article examines the causes of disorganization in INFORMIX Online databases and offers some reorganization solutions. While most of the research was conducted against Online Version 7, everything discussed in this article applies to Online Versions 5 through 8.


JUST THE FACTS

The building block for Online storage is a data page, usually 2 kilobytes or 4 kilobytes. Blank pages are created when an administrator creates a chunk. Each page has a header area, a slot-table and a portion for storing data. The header area is used for statistic storage and integrity validation. The slot-table contains the rowids and pointers to the row pieces stored in the data area.

Pages are allocated to objects in extents. An extent is a contiguous (sequential) set of pages. The number of pages in an extent is determined by the DDL when creating an object. When initially adding rows to a table, the pages are filled sequentially. For tables with clustered indexes, the fill factor is monitored and the appropriate amount of space is left free in each page. 

To reduce the amount of physical I/O required, rows are not unnecessarily broken across pages. Any row shorter than the size of a page is always stored in one piece. Online considers a page full when the free byte count is less than the number of bytes required to store a row. A row that is larger than a page is broken into pieces by Online. The terms used for the pages this row spans are: 

  • Home Page: Where the first row piece is located
  • Big Remainder Page: Wholly used page containing a piece of the row
  • Remainder Page: Where the last row piece is located

When a row only uses two pages, the Home Page has a four-byte pointer to the Remainder Page. For more than two pages, the Home Page and each Big Remainder Page thereafter have forwarding pointers to the next data portion of the row. Please note that these pointers are at the page level, not the row level.

 When updated, Online returns a modified row to its current location and maintains the same slot table entry if the row size has not changed. If the row size becomes smaller due to the modification, Online changes the size of the slot-table entry to reflect the new row size. If a row does not fit after being modified, the row is stored in another location on the same page and the slot-table entry changes to reflect the new starting offset and row length.

 When a modified row is shorter than one page, but the current page cannot allocate it, a four-byte forward pointer, containing the new rowid is stored on the row home page. The modified row retains its original rowid but is stored on a different page.

The performance curve

Several types of physical degradation can occur in a database to affect performance, including partial empty pages, unused extent space, extent interleaving, index node separation, and row migration. Let's examine these types of physical degradation and their effect on Online performance:

Partial Empty Pages

Many types of rows, particularly meta-data and summary rows, change size over time. This change occurs when variable length fields, such as VARCHAR, get updated. As mentioned in the previous section, there are three different scenarios that may apply here: 

  • A row is shortened: Occurs when an UPDATE is performed on a row and it is shortened. The row does not move, but the slot list is updated to reflect the new, shorter length. Usually the small space remaining at the end of the row is too small for a complete new row. The only way this becomes usable is through page compression.
  • A row is lengthened but fits in home page: Occurs when an UPDATE is performed on a row and there is enough contiguous free space within the current page. The row is written to a section of contiguous free space where it fits and the slot table is updated to reflect its new position. In many cases, the old position of the row (which is now free) will not be used again until page compression is performed.
  • A row is lengthened but does not fit in home page: Occurs when an UPDATE is performed on a row that is in a page that is too full to fit the updated row. Online moves the entire row to a new page and places a four-byte pointer in the home page where the original row was located. As shown below, this pointer contains the "new" rowid, which allows Online to locate the new page where the row has been relocated to. Every time the user wants to access this row, Online will read the home page, retrieve the pointer and then retrieve the data page. In addition, the space freed in the home page is available for new rows, but would probably require page compression to be useful.

 Online is considerate of these partial empty pages, and at present is the only open systems database that performs page compression to reclaim wasted space within pages. To perform page compression, Online copies the entire page into memory, and shuffles the rows around the page to make all the free space contiguous. Unfortunately for end users, page compression is triggered when an UPDATE is requested on a row and there is not enough contiguous space (but enough free space) for the updated row. This means the user must wait for the compression to occur.

 In indexes, both UPDATE and DELETE commands cause partial empty pages. UPDATEs cause partial empty pages when an indexed value changes and DELETEs cause free space. In other databases, the accepted term for this is unbalanced indexes. An obvious type of index that suffers from unbalancing are dates that change. Many systems have summary columns indexed that contain "date last updated," or some value that keeps changing. These are the worst possible types of indexes from a physical storage point of view.

 As with data pages, index pages also undergo compression when Online wants to reclaim space. The criteria for index node compression is more strict (to avoid numerous compressions from occurring). An index node must have two or fewer items to be a candidate for compression.

Unused Extent Space

One of the more dynamic features of Online is the ability to free extents if they are not used anymore. In reality, this feature does not have a chance to assist the DBA, as everything has to be removed from the entire extent for it to be freed. This type of wasted space is very common in financial databases, where old transactions are deleted by month or year and new ones added gradually. If it was possible to keep data within the smallest number of possible pages, the data would be more closely packed, requiring less I/O to process requests against the table.

 Another consideration when dealing with extents with a large amount of unused space is the impact all the free pages have on the free list. If the free list is excessively long, this can also increase the amount of I/O required for UPDATE and will increase the probability of page compression occurring.

Extent Interleaving

When multiple objects exist in the same dbspace, their extents may become physically interleaved. For the majority of Online sites where the dbspaces are using raw devices, extent interleaving causes slowing of sequential activity within the dbspace. Sequential access to tables and indexes occurs with surprising frequency in most applications, and therefore can pose a significant performance issue.

Index Node Separation

This form of physical degradation has the most severe effect on performance. It occurs in both indexes and tables with clustered indexes. It also has a severe effect on tables without a cluster index.

 Sequential index scans are one of the most common activities performed on a database. Every time a binary search is initiated through an equivalence operator, such as greater than, less than or not equal to (>, < , <>), an index B-Tree is sequentially traversed. Many sort operations can be done through a sequential scan of an index depending on how an object's statistics tell Online to do it.

 As indexes grow, the index node (a page of index entries) becomes full and has to split to accommodate additional values. When an index node splits, it links in a new page and moves half of the existing index entries from the original node to it. The new entry can then be inserted successfully into the original node.

 Using an example from the Online Administrator's Guide, we can graphically see this split occurring. As shown in Diagram , Node 3 is full and a new data value (Groovy) is about to be inserted. Online tables node number 3, removes half of its contents to Node 6 and places the new value into the original node. In summary, when a leaf node gets full, Online adds another leaf node and splits the entries between the two. When a root node gets full, Online splits the contents of this node into two branch nodes, retaining two pointers -- one to the left branch and the other to the right. Splitting off index pages may not be physically adjacent on disk.

[Under Construction]

*** Diagram  ***

 Now, imagine that instead of page 6, it was page 12,000,000. In addition, imagine this splitting has occurred 100,000 times in your index. When old index nodes get continually split, the sequential order of the nodes is nothing like their order on disk. This greatly increases the amount of seek time required to sequentially traverse the index and will have a major impact on performance.

Row Migration

As previously mentioned, when an UPDATE is performed on a row and it will not fit back into its home page, the row is relocated to a remainder page. This remainder page is the next available page in the current extent and is probably not physically adjacent to the home page for the block.

Extending this example to when 30 percent of the rows are relocated, the seek time required to sequentially scan the table becomes an enormous performance problem.

Table scans are usually less common than index scans, but still occur. In applications where a large amount of ad hoc reporting is to be supported (such as data warehouses and decision support systems), large numbers of table scans are common.

Why would Online ALLOW the physical structure TO DEGRADE?

There is a simple and prudent reason why this degradation occurs: performance. If Online was to continually perform housekeeping during UPDATE or DELETE activity, the OLTP performance of the database would drop substantially. Some of our larger Online customers have noticed that when Page Compression is continually occurring (during UPDATE activity) end user performance suffers. This type of housekeeping, however, keeps Online from suffering block fragmentation problems.

What are some realistic management techniques?

INFORMIX provides recommendations to manage physical degradation problems in the performance guide. These recommendations perform different kinds of physical reorganization of the data and are outlined below:

ALTER INDEX TO CLUSTER: Clustering an index rebuilds the table with the rows in index order in a different location within the dbspace. This cluster order is not respected in future updates, requiring that this be done periodically. For this to work, all indexes must be dropped, a single index clustered and the remaining indexes recreated. This command requires that there be sufficient space within the dbspace to create a copy of the original table. As the table grows and this process is repeated, allocation of new extents cannot be contiguous as the dbspace has holes left behind by the prior rebuild(s). The reality of this solution is that it requires a large amount of additional space to be maintained in the dbspace and it makes extent interleaving worse over time.

ALTER FRAGMENT: By modifying the fragmentation scheme, INFORMIX will rebuild the table, similar to option 1. For large tables, this is generally not a viable alternative, as too much free space is required and performance suffers when a temporary fragmentation scheme is employed.

UNLOAD / RELOAD DATA & DDL: You can unload the table, drop and re-create it, then reload the data. To perform this in a realistic time frame, all indexes must be removed before and recreated once reload is complete. In addition, by performing a DROP on the object, constraints, stored procedures and other related constructs need to be managed as well. This is a manual process which requires the maintenance of DDL for the table, correct sequencing of commands and the ability to control user access during the process. One wrong step and data or application logic contained in the database may be corrupted or lost. Even by performing this type of management, if more than one table exists in a dbspace the DBA may still incur extent interleaving if he does one object at a time.

Obviously the above techniques are not a total waste of time. Online DBAs have managed to survive for years with production databases by following these techniques. In fact, many shops permanently maintain a clustered index on every table and have scripts that periodically drop and recreate the indexes (clustered indexes first, of course). These recommendations, however, do not provide INFORMIX DBAs with an efficient, safe or practical method of reorganizing data to repair the damage inflicted by routine database operation.

CONCLUSION: How can we improve the situation?

There are several factors that are influencing the need for further reorganization capabilities beyond what is offered by INFORMIX. These include:

  • The growth of data volumes to a size "unmanageable" with INFORMIX methods
  • Large-scale application vendors, such as SAP, adopting Online as their database of choice
  • Companies requiring 24x7 up-time from their applications
  • Mature databases that have been operating for long periods of time without maintenance for all forms of fragmentation
  • DBAs being given more responsibility and consequently having less time to spend writing and maintaining their own scripts. Management calls this core-competency.

The after-market tool vendors offer faster and more efficient alternatives. They have built tools with stringent methodologies that automate, ensure integrity and fully resolve all forms of physical degradation mentioned in this article. Designed for large mission-critical databases these products improve the efficiency of databases by restructuring data that have become disorganized due to heavy use without using the time-consuming, error-prone procedure of rebuilding the entire database structure. These solutions also improve speed and efficiency of database functions by reorganizing databases, dbspaces, tables, and indexes; reduce errors associated with manually recreating databases; and maximize data accessibility by minimizing the time which data is unavailable during a reorganization.

 


Украинская баннерная сеть
 

[Home]

Сайт поддерживается группой пользователей Информикс на Украине.

Hosted by NO-more.