by Ron M. Flannery email@example.com
This article contains the following information:
This article describes strategies that help you tune and monitor your Informix database servers. In it, you will learn the day-to-day tasks that can help keep your systems healthy for a long time. In the event that problems do occur, it will help you understand how to solve them. This chapter will help you learn the following:
Disk configuration can often be a tough balance in Informix Dynamic Server. A prevailing goal is to make the most efficient use of your disks while not filling them up. Disk layout should be planned carefully, especially when you are first creating the Informix instance. Obviously, it isn't possible to completely predict the usage of your system, but it's important to put the most you can into the planning of disk space. If you need to change something, you can add chunks later. Following is the output of onstat -d, which shows the current dbspaces and chunks in your Informix instance:
The output is divided into two parts: dbspace information and chunk information. In a nutshell, a dbspace is a collection of chunks. Each chunk represents all or part of a physical disk drive. The dbspaces are displayed in the top part of the output followed by the chunks that map to them. The "number" field in the dbspaces section is the dbspace number, which maps to the "dbs" field in the chunks portion of the screen. For this discussion, we will only look at the chunk usage.
The columns that are of particular interest are "size" and "free." These columns define in pages the size of the chunk and how much of it remains free. The page size can be found by using the command onstat -b. It is very important to monitor the amount of available pages. This is particularly true in the root dbspace: if it fills, your whole instance can be forced off line.
To watch the activity for certain chunks, use the command onstat -g iof. The following is the output of this command:
If you notice that one of the chunks has a high number of reads or writes when compared to other chunks, it could be causing performance problems. In this case you can consider redistributing the data on that chunk.
The virtual processors (VPs) help Informix process things efficiently. VPs are "mini CPUs" that are actually operating system processes. These VPs handle user "threads" and provide the multi-threading capabilities of Informix. It's important to have enough VPs and have them properly configured.
The command onstat -g glo provides information about all the current virtual processors as well as statistics for each class of virtual processor (CPU, for example). You can use the results of this command to search for certain classes of VPs that might have an inordinate amount of activity, causing you to add more VPs.
The command onstat -g ioq can help you determine if you need to add more AIO VPs and has output like the following:
If this command shows an I/O queue that continues to grow, you might need to add more AIO VPs.
The amount of memory used by Informix can vary, depending on the activity of your system. Memory is initially allocated as the amount of KB given by the onconfig parameter SHMVIRTSIZE and can be dynamically allocated by Informix Dynamic ServerT in KB chunks of SHMADD. Informix Dynamic Server will abort if the amount of memory reaches the amount in SHMTOTAL. To pay attention to how much memory Informix has allocated, simply use the command onstat -, which quickly displays the total amount used by Informix Dynamic Server. For information on specific memory segments that were allocated, use the command onstat -g seg, which displays information as shown in this sample output:
COMMAND: onstat -g seg
When Informix is initially brought on line, there is one virtual memory segment, which is equal to the size of SHMVIRTSIZE. If more segments are added, you will see them in the output above, with a class of "V." Alternately, you can watch the message log for messages that mention "dynamically allocated new shared memory segment (size xx)." To do this, use onstat -m (to see last 20 entries in message log) or look through the whole message log by using a patter matching command (grep on UNIX).
In a DSS environment, you can also set the variable DS_TOTAL_MEMORY. This value is typically set to a certain percentage of SHMTOTAL. The percentage depends on how many DSS versus OLTP queries are performed. The DS_TOTAL_MEMORY usage is monitored through onstat -g mgm.
Performance problems can sometimes be totally related to what the users are doing. If random queries are allowed on a system, or there are too many queries, system performance can be greatly reduced. To find out which users are using the Informix instance, use the command onstat -u, which displays output that looks like this:
A quick glance at this output can tell you who has the highest number of reads and writes. On larger print outs, it might be more difficult to monitor, however. The last line of the output can help you decide if the number of users is the problem.
To follow through on a particular query, you can use the session ID ("sessid") that is displayed here. Executing the command onstat -g ses sesid displays information about that particular session, including the query and more detailed information. For example, the following shows part of the output of the command onstat -g ses 49:
There are lots of useful fields in the output of this command, including the query, memory, user, and other information. You can obtain a smaller piece of the same information by using the command onstat -g sql sesid. To get information for a particular thread within the session, issue the command onstat -g tpf tid where tid is the value shown in the output of the above command.
You can also use onstat -g wai to find threads that are waiting to be processed by Informix. Using onstat -g rea shows threads that are ready to be executed.
Another way to monitor performance and health of your systems is through monitoring tools. These tools are designed to monitor things like CPU usage, memory usage, and possible problems. Some tools provide support for watching performance within the database, including query monitoring, dbspace usage, and more. Choosing an automated tool like this can save a lot of the manual steps that I previously mentioned. These tools are made to automate these tasks and can greatly simplify the life of an administrator.
There are many options in these tools and more becoming available. Some of the tools provided by Informix include onperf, oncockpit, and Informix Enterprise Command Center (IECC). IECC is a GUI-based tool that greatly simplifies administrative and monitoring operations. In addition, there are several third-party tools, like Compuware's ECO Tools and BMC's Patrol.
The sysmaster database is created in shared memory when you start Informix Dynamic Server. It provides a good deal of the information provided by the "on" commands. Using sysmaster allows you to build your own queries and programs that do monitoring. For example, you can create a program that will monitor much of the information provided by the onstat -p (system profile) command. This will let you create your own alarm conditions and process them accordingly. For a full description of how to use sysmaster, see the article "SMI and the sysmaster Database" by Lester Knutsen that appeared in Tech Notes Volume 8, Issue 4, 1998 (posted on TechInfo Center).
Of course, performance problems can occur on many different levels. If you are experiencing extremely slow keyboard response time, for example, the problem could reside in the network. Likewise, if your Informix instance is starting to have problems accessing some of the disk drives, you might be having hardware errors. Finally, if the system is strapped for memory or CPU, you might look to other things that are running on the operating system. Remember that Informix is sharing disk, memory, and CPU with the operating system, which needs its own resources. Keep in mind, though, that they aren't available on all UNIX systems. If they aren't available, find out which utilities your system offers. You can get information you need by using the UNIX man command.
These commands can give you information that you can use in conjunction with the information you get from Informix. Between the two, you should have a good idea if the problem is an Informix configuration issue or just a lack of resources (like CPU).
Most operating systems have their own log files. These files can show errors that could later affect Informix (I/O errors, for example). They should also be regularly watched.
Don't forget that user applications might be causing the problems. By following some of the listed methods to monitor user sessions, you can trace the original SQL that was being executed by the user. If you find users that are doing certain queries (sequential scan, for example), you might want to suggest indexes or changes to their applications.
Some ongoing maintenance commands need to be performed, but not as often as many of the commands we've already discussed. These commands might best be executed in a batch job that runs on a regular schedule. This next section desribes some of these commands.
The strategy for updating your statistics must be chosen carefully. The "Informix Performance Guide" explains the proper strategy.
Checking your Tables and Indexes
Note the "n" supplied for every command. This instructs oncheck to ignore any questions about trying to fix the data if an error is found. If you find that you need to correct errors, you can run oncheck again and manually answer or run the command with "y" instead of "n" at the end.
Checking System Information
Each database has a set of system catalogs that contains information about its tables, indexes, and other items in the database. These catalogs are tables that begin with "sys". It is crucial that these catalogs have the proper information; if damaged, they can make an entire database inaccessible. To check the catalogs, run the SQL command "update statistics" on each database in the instance and then run the command oncheck -cc.
Reviewing Table Structure: Extents and Otherwise
Tables with too many extents can have a negative impact on Informix response times since data is scattered in many different parts of the disk. A good ballpark maximum number of extents is 10. You can prevent this problem by creating extents that are large enough to hold your data. The extent can be created with the table or added later with the create table or alter table statements. To monitor the amount of extents, try running the command oncheck -pe, which will display output similar to the following:
If you see a table that shows many times in this output, you might consider backing it up, dropping, re-creating, and re-loading into a table with the proper initial and next extent sizes.
The oncheck -pt command will show summarized information of all the tables. This information can tell you how many extents each table has, like in the following example:
Note that "number of extents" is seven, still an acceptable number.
The last way to find out about all the extent sizes is to query the sysmaster database.
You've now learned numerous ways to monitor your servers, watching performance and preventing problems. But sometimes trouble is going to come no matter what you do. It's inevitable: things go wrong. Here we discuss how to address and correct some common problems.
There are many errors that will cause Informix to immediately go off line, possibly causing damage to your data and databases. Possible problems include:
As an administrator, you need to be prepared to handle any of these errors. Rule number one of an administrator is:
A good number of system crashes cause no damage to the data and are recovered through Informix's powerful recovery system.
When your Informix instance goes off line, you should first check the message log for what happened. To display the last 20 lines of the message log, type onstat -m. If you need to see more of the message log, you can use a text editor or other command.
Tip: The messages in the message log are listed in the Informix Dynamic Server Administrator's guide.
Many details about the error are included in the message log. For example, the mention of an "I/O error" is a good indication of a problem with hardware.
A common message for Informix crashes indicates an "assertion failure." An assertion failure is simply a message from Informix stating that it could not perform a necessary operation and needed to shut down. These errors often require a simple shutdown and startup of the instance, but must still be investigated.
Starting with version 7.30, Informix attempts to capture assertion failures and kill just that Informix session, automatically saving the necessary debug information. This should prevent Informix from crashing during assertion failures. The failure will of course be documented in the message log for that Informix instance.
An assertion failure will almost always create one or more files that give complete information about the error. The most common file is af.nnn where nnn is a unique hex number. Other possible files are a shared memory file (shm.nnn), a core dump of VP processes (gcore.nnn), and a regular core dump. The data in these files might mean nothing to you, but they are very important to Informix technical support.
If you set the environment variable AFDEBUG before you initialize Informix, the engine will suspend processing-rather than crash-during many errors. This is useful because it allows you to run certain diagnostic commands like oncheck and onstat before bringing the engine down. Here is an example of how to do this when initializing Informix:
Informix gives a lot of information during an assertion failure, often including a possible solution to the problem ("Action: Run 'oncheck -cDI 6449916'"). Be sure to carefully analyze this information to find out what caused the problem, and if appropriate, try the corrective action. When an assertion failure occurs, you should:
In most cases, the above three steps will be sufficient in bringing Informix back on line and you won't have to take it any further. If not, you'll need to apply some real administrator's skills (or call Informix!).
After a while, Informix administrators develop a sixth sense about problems. Through experience, you'll be able to either recognize errors or intuitively know how to fix them. These are the kind of things that are hard to teach in a book; they come naturally. But it is a way of thinking.
Really consider what happened and what could have caused it. Logically think through the messages and conditions of the system and try to eliminate the obvious. Consider what has happened with your system in the past. Run various Informix commands to try to trace problems. Remember that some crashes are caused by problems with the current Informix release and your applications. Again, don't panic and logically walk through the problem.
Following is an interesting anecdote from Clem Akins, Informix Technical Support Engineer. Notice how Clem did follow the logical steps but had to really improvise to find the problem:
"The error was from an Assert Fail due to a memory Segmentation Violation. The engine was given an address that was out of bounds, and it crashed. After one week of all-day-long, on-site effort we proved that the fault was with a part of the hard drive that was used for temp storage. Informix would write one address there, and retrieve another. The internal bounds checking is good enough to find the fault and bail out before corrupting any data. As the customer was a financial institution, they appreciated that, even after the hard time they gave us at first. We ran oncheck as it says to do, but found no errors. We examined the stack trace, and compared it to the optimizer source code that was running when it crashed. We sent the core dumps and .af files to Informix Advanced Support for further analysis. We reproduced the error using the stores7 database (though we had to make some of the tables several times bigger to see the error). Advanced Support finally built a machine exactly like the customer's, ran the exact query on the same data, and had no errors. Given that, we gave some heavy thought and long, thorough analysis and found the bad disk."
That story demonstrates a time when the obvious didn't solve the problem and it took real improvisation. If something just doesn't make sense, start thinking about other things that could have caused the problem.
Informix provides excellent technical support. In some instances, either you have exhausted all of your ideas or the problem can't be fixed by you. This is when you should call technical support. Be sure to have your information together before you call. Again, take a look at the views of Clem Akins.
"The most important thing to do is to take the view of the support engineer. This view includes things like:
'Can I reproduce the problem at will?
Have I thought about why this problem occurs?
What are the possible causes?
Can I isolate them and test them individually?
What has changed that caused the problem? (System, network, database, application changes?)
Is the test case the simplest possible that will reproduce the problem? Does it reproduce on the stores7 database?
Does it reproduce with only simple SQL instead of requiring complex custom code or applications or tables?
Do I have a dial-up agreement in place with Informix Support? (Gee, I wish I had thought of that during business hours...) What is my support contract number (or product serial number) and what kind of support do I have? Is all of my support information ready to provide to Informix?
When I talk to the support engineer will I sound like an IT professional who has taken all the reasonable steps to solve the problem? Even better, like a person who also has to support code and who thinks about the poor soul who is trying to help me from the other end of a phone!"
Certain errors will be just plain unrecoverable. In these times, you might see the dreaded "restore from an archive" error. At this point, the administrator will look like a hero if he had implemented a strong backup and recovery plan (which of course we all do!).
There are two kinds of restores-cold and warm. The type you choose depends on your error. Refer to the Informix documentation for the backup strategy you are using (onbar, ontape, onarchive, etc.).
The amount of data you restore will depend on your levels of backup. Generally speaking, the process is to:
If you are not using logging, you can only go as far as step 2. The point of restoration depends on your backup schedule. If you need to get as close as possible to the point-in-time of your database, you'll need to have implemented logged databases and properly backed up all the logical logs.
This chapter described the ongoing process of maintaining your Informix servers. You learned several commands that can be done on an ongoing basis to watch your servers for possible problems or bottlenecks. And you learned the best method for contacting Informix support during down times.
This article is an excerpt from a chapter in the upcoming Informix Handbook due out later this year. The Informix Handbook promises to be a comprehensive, must-have reference guide to Informix products. The book provides substantial detail and reference information on multiple products and subjects. The following are some related chapters from the book:
For more information, visit the Web site at www.informixhandbook.com.
About the Author
Ron M. Flannery is the founder and president of One Point Solutions, Inc., a Web- and technology-based company in Michigan. He also serves as the president of the Michigan Informix User Group (www.miug.com), is a member of the IIUG board of directors, and is the principal author of the Informix Handbook from Informix Press. Ron may be contacted by email at firstname.lastname@example.org.
Украинская баннерная сеть