Informix Logo



Database Application Performance: Elements of Performance

by Jacques Roy jroy@informix.com
Tech Notes 1998, Volume 8, Issue 4


Overview

Introduction

According to Moore's law, computer performance doubles every 18 months. We have seen PCs go from an Intel 8088 at 4.77MHz and a maximum of 640KB of memory to the current machines with multiple (4 today) 450MHz Pentium II processors and up to 4GB of memory. In other classes of computers, we find machines with up to 64 processors and 64GB of memory. Machines are also available in clusters to provide massively parallel processing (MPP). With all this power, why worry about database application performance?

Database applications are more complex, and the amount of data has grown over the years. We are already dealing with multi-terabyte databases. When can we expect petabyte (1000TB) databases? It probably won't surprise you to find that a few already exist. As a result, the processing is becoming more complex and the number of users is increasing.

This document provides an overview of the factors that impact performance, and will describe guidelines that will help in application development and tuning. It covers hardware, operating system, database engine, application development, and database access.

Hardware Components

Hardware is the first element of performance. Given everything else being constant, performance can be improved, if only marginally, by throwing hardware at a problem. In some cases it is the most cost-effective solution. Then the question is: what hardware should be added? Let's look at the major hardware components of a system, using the Sun Enterprise Server 10000 as an example.

  • CPU: The central processing unit is what executes program instructions. Since the late '80s, UNIX vendors have supported symmetrical multiprocessing (SMP). The scalability of these systems has increased to support a large number of processors. SMP means that a machine contains several CPUs that are all equal in functionality: any CPU can handle any task in the system, including interrupts and I/O.

    High-performance CPUs include on-chip memory and often use a secondary cache. For example, each UltraSPARC processor in the Sun ES10000 has 16KB of instruction cache and 16KB of data cache. It also has a 4MB secondary cache.

    Why have a cache and so much of it? Let's again consider the Sun ES10000. The processor speed is listed at 336MHz and it uses a 128 bit (16 bytes) interface. For each CPU cycle, it can get 16 bytes:

      336 x 1000000 cycles/sec x 16 bytes/cycle = 5376000000 bytes/sec @ 5.2GB/sec

    So, at peak performance, one UltraSPARC processor can go through 5.2GB per second. Memory caches provide faster access to instructions and data in hopes of keeping the CPU busy

  • Memory: Memory sub-systems use the concept of interleaving to increase data transfer performance. In the Sun ES 100000, a memory module achieves transfer rates of 1.3GB per second.
  • System bus: The system bus is the critical component of the I/O sub-system that allows communication between the I/O controllers and the system. The standard Sun SBus provides a data throughput rate of 100MB per second.
  • Controllers: Several types of controllers are available. They are used to access disk drives, networks, tape drives, and so forth. The standard SCSI controller can accommodate the performance of the SCSI bus at 20MB/sec. Sun has come out with controllers for disk arrays based on fiber optic technology that are rated at 100MB/sec. This matches the speed of the system bus.
  • Disks: The standard disk drive used in high-performance systems is the SCSI drive. You can calculate the transfer rate of a disk by looking at its capacity, rotation speed, latency, seek time, track-to-track seek, etc. The real throughput is a factor of all these factors and the size of the transfer requested. If you ask for disk transfers of a few disk blocks scattered all over the disk, you will incur a high percentage of seek time and latency. However, if you are transferring a large number of contiguous disk blocks, most of the time will be spent transferring data.

    The average seek time, average latency, and the track-to-track seek times are measured in milliseconds. A top of the line drive may have an average seek time of 5.4 msec. During that time, the UltraSPARC processor can execute about 7 million instructions. In an effort to reduce the impact of this overhead, most disk manufacturers include memory cache in their disk drives.

    The previous paragraph explains why it is always recommended to allocate contiguous space and avoid disk fragmentation. So what transfer rate should we expect from a disk drive?

    Some disk drives have peak transfer rates of up to 80MB/sec. Of course this is a "best case" scenario. The performance will vary greatly depending on the type of I/O being done and the impact of latency and seek time. In a "worst case" scenario, we could require the average seek time and average latency for each page, which would reduce our performance to around 240KB/sec (2KB every 8.4 msec). This is a huge range. The real performance will have to be evaluated by benchmarking the application.


        • Figure 1: Some Disk Drive Characteristics.1

    1 Data extracted from http://www.seagate.com/disc/cheetah.shtml

  • Network: The most popular network is the ethernet. The usual transfer rate is 10Mbit/sec. Considering the standard overhead of control bits and data used for error detection, we can expect a transfer rate of, at most, 1MB/sec. However, new network technology is rapidly appearing and provides transfer rates of 100Mbits/sec (10MB/sec).

    The ethernet protocol defines a format for sending information. Each packet of information has a maximum size of 1526 bytes. This includes 26 bytes of control information.2 The size of the data transfer will have a major impact on the effective transfer rate; for example, the transfer of 20 bytes of data will still incur 26 bytes of control data for more than 50 percent overhead.

    2 Douglas Comer, Internetworking with TCP/IP, pp. 18-19.

  • Modems: Modems are available at various speeds. The most popular are 14.4Kbits/sec, 28.8Kbits/sec, and 56Kbits/sec. Because we've been dealing mostly with MB/sec up to now, let's convert the previous ratings. They translate to around 0.0014MB/sec, 0.0028MB/sec, and 0.0056MB/sec respectively.
  • Interactive users: Finally, we arrive to the user. Depending on the application, a user could have a relatively high receiving rate but, in general, should be limited to short bursts. However, the transmission rate is generally very low. People think, issue commands, and analyze the result. Let's consider a data entry clerk that types 120 words per minute. If we assume an average word of 8 letters, we get a rate of 960 bytes/minute or 16 bytes/sec. This translates to 0.000016MB/sec.

This constitutes a basic set of numbers that allows some generalizations on the performance of the system. This has to be used carefully, because it does not take into consideration any of the interactions between components. For example, a Sun ES10000 can contain up to 16 UltraSPARC daughterboards. Each board includes four processors, one or two memory modules, two SBus', and a number of controllers with devices attached to them. In addition, to consider all the interactions on a daughterboard, we have to consider the communication between daughterboards and the nature of the processing done on the machine.

Instead of getting lost in the complexity of modern computers, let's consider some simple guidelines that could help the database environment. Some may be outside of our control, but should still be kept in mind.

Hardware: Putting It All Together


      • Figure 2: System Components' Peak Performance

Looking at the numbers in Figure 2, some people may get the impression that systems are not well balanced. Large benchmarks have demonstrated that systems scale well, which shows that no component is an obvious bottleneck. This is explained by a few simple points:

  • CPUs have a cache containing data and instructions that may be reused several times before the CPU requests something that is not already stored in the cache.
  • Components can be added to scale performance. Depending on the processing environment, an optimal ratio of CPUs, memory, controllers, and disk drives can be achieved.
  • Disk drives include a local cache used to reduce the impact of latency and seek time. Furthermore, if the data written is read back, the cache may provide it without accessing the drive.

From the numbers in Figure 2, we can extract the following guidelines:

  • Keep the CPU busy.
    The CPU is the key to all processing. If it becomes idle, the processing cycles are lost. The CPU will become idle if the cache does not contain the data it requires. The cache will have to wait if the memory does not contain the proper information. The memory will then wait for I/O completion. We will see later some ways to impact this.
  • Balance your I/O.
    The most common performance problem is when a large portion of the I/O goes to one disk drive while other disks are idle. A variation on this is if all the I/O goes through the same controller. This can be caused by database tables that are not distributed properly, or because all the system I/O is done on one drive. To solve these hot spots in the database, analyze your I/O and distribute your tables properly. This can include moving the logical logs to different dbspaces. If the hot spot occurs within one table, take advantage of the Informix table partitioning so the table spans several disk drives. You can also improve the partitioning by using hardware disk striping. On the system side, the hot spots will most likely occur on the system disk. Consider moving the swap area and the temp directory (/tmp) to other drives.
  • Avoid disk fragmentation.
    We saw earlier that fragmented I/O can have a dramatic impact on disk throughput. Make sure you allocate large chunks, and try to allocate your tables contiguously.
  • Limit your network traffic.
    Because the network has a relatively low bandwidth, increased traffic can generate collisions that force re-transmission which, in turn, adds more overhead to the network communication. Keep this in mind when developing applications. The network is a valuable resource - avoid transferring large amounts of data when the database server can perform the computation and only return a small result.

These general guidelines take different forms at other levels of the computing environment. It always comes back to a balance of activities in your system

Operating System

The main goal of an operating system is to optimize the use of the hardware resources. This simply stated goal involves very sophisticated decisions. Let's consider again the Sun ES10000. A fully configured machine contains 16 system boards communicating together through a crossbar interconnect. Each system board contains four UltraSPARC CPUs with their secondary cache, up to 4GB of memory, two SBus', and a number of controllers.


      • Figure 3: Block Diagram of an OS Kernel.3

3 Modified version from Maurice J. Bach, The Design of the UNIX Operating System, p. 20

Figure 3 shows the major components of an operating system. A user is an executing program called a process. It interacts with the system through system calls. The Solaris operating system, for example, provides about 185 system calls. Through system calls, a process can access devices, file systems, and communicate with other local or remote processes.

The operating system must optimize the use of memory to accommodate as many processes as possible. Memory is used by the operating system and its system processes, device drivers, the file system, user processes, etc.

Let's take a high-level look at the major components of a system:

  • Process
    A process is an executing program that sees memory as a contiguous addressable space. In 32-bit systems, a process can address 4GB of memory. The operating system divides the process address space into pages. At any one time, a process requires a minimum number of pages to be able to execute. This is referred to as working set size.

    A process goes through several states of execution-it can be running, ready to run, or waiting for an event (sleeping). This allows the operating system to decide how to schedule processes for execution.

  • Thread
    Today, most UNIX implementations include the concept of "thread of execution" within a process. The simplest model is where a process includes one thread of execution. It is possible to start an arbitrary number of threads of execution within a process. When using a native thread implementation, a single process can execute on multiple CPUs simultaneously.

    Since a thread executes in the context of a process, it uses less resources, and the communication between threads is simpler. Furthermore, thread creation is significantly less expensive than process creation.

  • Process control and scheduling
    The operating system schedules processes for execution after a specified time period has elapsed, a system interrupt occurred, or when the currently running process executes a system call. If the process is still ready to run, the scheduler will decide if it should continue its execution, or if another process should take its place.

    In large systems, the process scheduler must also consider where a process ran last. This way, it is possible that the process memory pages will already be available in the processor cache. The processor can then be productive right away.

  • Memory management
    When the demand on physical memory becomes too great, the operating system takes entire working sets of processes and swaps them to disk to accommodate memory demands. It may reach a point where the memory demand is so high that the operating system spends a large portion of the CPU time administering the system. This is often reflected by a high percentage of system time reported by system administration tools.
  • File system
    Most UNIX file systems have their origin in the BSD Fast File System. This file system structure implemented several improvements over the original UNIX file system. These improvements include: increased block size and creation of cylinder groups where file descriptors (inodes) are closer to the data they describe.

    The simple act of doubling the block size from its original 512 bytes doubled performance. Current file systems often use a default block size of 8KB. Some UNIX vendors provide file systems with a tunable block size.

    You can find detailed information on file systems in the reference manuals listed at the end of this article. For our purpose, two internal structures are of particular interests: inode and directory.

    An inode describes a specific file. It contains information including the owner, group, file type, access permission, access time, number of links, and file size. It also includes a set of 15 pointers that identify the location of the data. Figure 4 illustrates the structure of an inode, including its data pointers. The first 12 data pointers reference data blocks directly. They are identified on figure 4 as "Direct0" to "Direct11." The following three pointers, PTR 1, PTR 2, and PTR 3, are pointers to index blocks. Each of these pointers add a level of indirection. PTR 3 points to an index block that contains pointers to index blocks. As Figure 4 shows, PTR 3 points to 3 levels of index blocks.

    To be able to read a file, the operating system needs to have the inode, the required index block, and finally the data block. The first time a file is read, there may be one disk access to get the inode, other disk accesses to get an index block, and additional accesses to get the needed data blocks. The file system optimizes disk access by caching the disk blocks in memory. When a process requests a data page from a frequently accessed file, the data cache may be able to return it without any disk I/O.

    The inode describes a file, but the description does not include a name. A directory file provides a mapping between names and inodes. This scheme allows the possibility to refer to an inode through several names. Directories also provide the hierarchical structure of a file system.


        • Figure 4: I-node data pointers.

    Figure 5 illustrates the format of a directory. It consists of a list of variable length entries aligned on 4-byte boundaries. A sequential search is executed to find a file. The average length of a search will increase with the number of entries.


        • Figure 5: Directory entries.
  • Network access
    UNIX systems provide access to the network through library functions. The two standard libraries are sockets and TLI (Transport Layer Interface). A user program will establish a network connection by issuing calls to the network library, which, in turn, will issue system calls. Once the connection is established, either library calls or system calls can be used to transfer data over the network.

    Let's consider a TCP/IP connection. The TCP protocol (Transmission Control Protocol) prepares data for transmission and adds 24 bytes of control information before passing it to the IP layer. The IP (Internet Protocol) also adds 24 bytes of control information and passes it to the physical layer. The physical layer needs to break down the data it receives into packets that include 32 bytes of control information.

    If a TCP packet fits into a physical frame, we incur an 80-byte overhead in our communication. Otherwise, the TCP, or more precisely, the IP packet, is divided in a number of physical packets. The maximum size of a physical packet varies with the network interface used. The ethernet format defines a maximum size of 1526 bytes including the control information. The FDDI format defines a maximum size of 4500 bytes.


      • Figure 6: TCP/IP Layering Model
  • System Calls
    Processes request services from the operating system through system calls. This effectively gives additional privileges to a user, so that he or she can access system resources. Because all of these resources are shared between a number of users, additional checks must be made to ensure the proper execution of the system call. Otherwise, the entire user community could be disrupted.

    Issuing a system call is a relatively expensive operation-consider what must happen when a system call is executed:

    • The hardware is switched into kernel mode.
    • The process context is saved. This includes the program counter, processor status, and the processor registers, among other things.
    • The number of parameters is validated.
    • The parameter's location is checked and they are then copied into the kernel space.
    • The system prepares for the possibility of an interrupted system call.
    • The system call operation is executed.
    • The return value is copied to the proper location.
    • The processor restores the user process context.
    • The hardware switches back to user mode.

Some system calls do more processing than others. Consider the following statement:

fd = open("/usr/informix/online.log", O_RDONLY);

    This system call parses the file path, find "/". It then opens the directory and finds the entry for "usr". The "usr" directory is then opened and scanned for "informix", which is open. Finally, we find the inode number for "online.log". All these operations involve disk I/O and buffer cache operations.

Operating System: Putting It All Together

We see that the same concerns apply at the operating system level as at the hardware level. The operating system gives the CPUs to a different process each time a process has to wait for an event. This way the operating system keeps the CPUs as busy as possible. Some adjustments can be made to the operating system so the scheduling method better fits the execution environment. For example, a machine that is a dedicated database server should not be set up for interactive users.

The file system tries to optimize performance by caching the disk information. Furthermore, it is using a relatively large I/O size. This is in line with what we saw in the hardware section.

From the information discovered in the operating system section, we can create the following guidelines:

  • Limit system calls usage.
    As we saw earlier, a system call requires quite a bit of processing even before it does anything. There are situations where it can easily be done-a system call could be taken out of a loop and still provide the same functionality. Consider the following code:

      while (0 < read(fd, buf, 1) ) {
        ...
      }

    This loop issues one system call per loop. This may be because your application logic needs to process one character at a time. If this is not a hard requirement, you can buffer your reads by using the standard file access functions and still process one character at a time:

      while ( NULL != (c = fgetc(fd) ) {
        ...
      }

    You can change the definition of a function that opens a file and instead pass it a file descriptor. Instead of looping on a system call to check on some status, it may be better to block on an event. Finally, some work could be done in user space instead of always executing a system call. Threads can provide major benefits in that area instead of using cooperating processes.

  • Localize your code and data.
    A process needs a certain amount of memory to execute. This memory includes instruction and data pages. The minimum amount of memory required by a process is called its working set size. You should put functions that are often used together close to each other so they will be in contiguous pages, if not in the same page. This also applies to data pages.
  • Buffer your I/O.
    Take a hint from the file system and try to buffer your I/O. For example, instead of writing one character at a time, accumulate the data until a certain size or condition occurs and then execute the I/O. This is done automatically for you by some programming libraries like the fputc() function in "C".
  • Limit network activities.
    Networks are getting faster and faster; however, the other components are also getting faster, which keeps the performance ratio virtually intact. The number of users on the network exacerbates the contention problem. For large organizations, a small utilization difference per user can make a big difference. Consider that if the network utilization goes down 10 percent on a saturated network that has 1000 users, it means that an additional 100 users could use the network.

If you have a performance problem, you must first find out where the bottleneck is, and then review the situation to decide if you should add hardware (if so, which hardware) and where, or if, you should review the software implementation.

You should also be familiar with the capabilities provided by your operating system, specifically in the area of tuning parameters, scheduling, and thread support.

Informix Dynamic Server

 


      • Figure 7: Simplified Server Architecture view.

Informix Dynamic Server is a multithreaded database server. It implements its threading model using a concept of virtual processors (VPs). The server architecture is depicted in Figure 7.

Virtual processors are divided in specialized classes:

PIO, LIO: these virtual processors handle the physical and logical logs, respectively.

NET: The NET virtual processors handle network connections. The database server can be configured with multiple NET virtual processors and handle several separate connection services. An alternative is to use the CPU virtual processors to handle network connections.

AIO: In the case where the disk storage is not on raw devices, or when a specific platform does not support kernel asynchronous I/O, AIO virtual processors are used to optimize disk access.

CPU: The CPU virtual processor is the heart of the system. It manages the user session threads and all of the processing threads. It also takes care of disk I/O when the platform supports kernel asynchronous I/O. It can also be used to manage network connections.

EVP: The Extended Virtual Processor is a variation of the CPU VP that can be used in the Universal Data Option of Informix Dynamic Server. It is used to execute user-defined routines that are "misbehaved." Refer to the Universal Data Option documentation for more information.


      • Figure 8: Example of query plan.

This database server architecture provides "fan-in" of user connections into a few processes. It also provides "fan-out" of queries. Queries that are coming in are divided into multiple threads of execution. The Informix Dynamic Server architecture can execute all threads in parallel, taking advantage of all the computer resources, including all CPUs and all I/O devices provided. Figure 8 shows a query plan example where the query is divided in several operations that execute in separate threads.

Query Processing

The query execution is impacted by several factors:

  • Partitioning: Parallelism depends on data partitioning. The engine will start as many scan threads as there are dbspaces involved in the execution. The result of the scan operation is passed through exchanges to other threads that execute operations including joins, group, sort, and merge.
  • Indexes: Indexes can be used to reduce the I/O required to perform the query. The basic indexing scheme is based on b+ trees.
  • Statistics: Informix Dynamic Server keeps, among other things, statistics on the amount of data and its distribution in tables. Statistics are vital in the search for the best query plan.
  • Optimizer: The optimizer uses an exhaustive search algorithm looking at all possible ways to process the query. During query optimization, the optimizer may rewrite the query, which may generate query plans that were not obvious when looking at the original text of the query. The cost of each plan is established by getting the cost of each operation, and by getting an estimate of the amount of data that will be processed.

Informix Dynamic Server: Putting It All Together

We saw that Informix Dynamic Server tries to optimize CPU usage and reduce the I/O processing. There are several tuning parameters that relate to CPU VPs, memory allocation, and network connections, among other things. The database administrator, with the help of the application system architect, can still do more to improve performance.

You should fragment tables on as many dbspaces as appropriate. You should also analyze the queries that have the most impact on your application to determine the fragmentation strategy.

Indexes can help performance, but too many indexes can reduce insert and update performance. Some indexes are obvious. For others, you have to look at the query plans to see if some indexes are ever used. In the case where the query plan shows table scans, you may be able to create an index that will speed up the processing. Keep in mind that a table scan is not necessarily a bad thing.


      • Table 1: Some onconfig server parameters.

When it comes to finding the best way to process a query, the most important thing for the optimizer is to have updated statistics. This should be done any time a significant number of rows are added or modified and after indexes are added. Informix recommends the following UPDATE STATISTICS methods:

  • Medium distribution only: for each table or the entire database.
  • High: for all columns that head an index
  • Low: for all columns in a multi-column index.

Application Development

Applications are becoming more complex. We now have to choose between two-tier, three-tier, N-tier applications, multithreading, component architecture (including CORBA and DCOM), etc. These decisions have a huge impact on application performance. These topics are too large to be covered here but, hopefully, the basic concepts discussed in this article will help you improve application performance.

Algorithms

When it comes down to it, the way you decide to solve a problem has the biggest impact on performance. Many books are available on algorithms that show how to represent some types of data and how to perform sorts and searches, etc. Finding the right algorithm starts with defining the right problem.

There are some very good examples of algorithms application and finding the right problem in "Programming Pearls," listed in the reference section. In chapter 5 of his book, author Jon Bentley describes a case study where a different approach provided a 400 times performance improvement.

Of course, you can't always expect dramatic performance improvement. The key is to try to come up with several ways to solve a problem, and evaluate the solutions before implementing one. When using Informix Dynamic Server with Universal Data Option, many new possibilities open up. By adding business logic in the server, you can take advantage of the database engine's sophisticated algorithms and parallelism.

This can be as simple as providing the proper comparison operators or date arithmetic. You can provide functions that will be used for grouping, allowing the server to do the aggregation instead of pulling all the information in the application program and perform the aggregation there. You can find more information on how to take advantage of Universal Data Option features in the "Best Practices" book listed in the reference section.

Using the Compiler

It is sometimes easy to forget the simplest things. I have seen production code that was compiled for debugging. This probably happened because the development team had forgotten to change the compiler options. The difference is so small. Instead of "-O" they kept "-g".

What is the impact of the compilation options? It affects the total number of instructions and the number of instructions needed to set up a function call. Let's look at a simple example using the following program:

    #include
    int fn1(int i) {
    int j, k;
    k = 0;
      for (j = 0; j < i; j++) {
       k += j;
      }
      return(k);
    }
    int main(int argc, char **argv) {
    int ret;
      ret = fn1(1000000);
      printf("ret = %d\n", ret);
      return(0);
    }

Keep in mind that this program is an example. The result of fn1(1000000) overflows and does not give the proper result.

We can compile this program with or without optimization and with options for profiling and debugging. The following table reports some statistics on the assembler file generated by compilation on a specific platform. These results will vary depending on the platform and the compiler used.


      • Table 2: Assembler File Statistics.

The debugging option and the optimization option used to be mutually exclusive. This is why you may still find make files that will not include optimization when compiling for debugging.

Most "C" compilers provide an option to generate assembler code (-S). This is what was used to determine the number of instructions generated with each set of compiler options. The assembler source generated also included comments and assembler directives that are not included in the count. The sizes for object file size and executable file size are in bytes.

The execution times were obtained using the timex(1) function. The elapse time represents the time it took to complete the execution. This is measured in a "stop watch" manner. If the system had been busy, it would have also included any time spent in a queue waiting to execute. The user and system time represent the CPU cycles spent in user space and the cycles spent in kernel mode.

Most of the execution time is spent in the loop of function fn1(). All of the execution times are within one second, which could be dismissed as insignificant. But when we look at ratios between executions, we see that the time for standard compilation takes almost three times as long in user time (0.11 sec) as the optimized version (0.04 sec). In other words, the optimized version is 63% faster!

Profiling

To improve performance in your application, you first need to find out where the time is spent. You can compile your program so it generates information on function calls, and how much time was spent in each function. Here is an example using the program listed above. You first need to compile the program with the profiling option:

    cc -p -o prg1 prg1.c

Then you execute the program normally. A file named mon.out is generated. You can get the profiling statistics by executing the prof command:

    prof -V prg1

    %Time Seconds Cumsecs #Calls msec/call Name
    100.0 0.11 0.11 1 110. fn1
    0.0 0.00 0.11 1 0. main

Now we know that the time is spent in fn1() so we can see if it can be optimized. You may have realized that this is really the classic problem of adding all the numbers between 1 and N. The loop can be replaced with:

    i * ( ( i - 1 ) / 2 ) ;

This will make the fn1() execution time insignificant for any number. Of course, there is still the problem of the overflow.

Here are a few simple suggestions to improve performance:

  • Move code out of a loop: make sure you don't have extra code in a loop. Sometimes some code can be moved out without impacting the result.
  • Merge loops: if you have two loops that operate on the same range, you may be able to perform the two loops together.
  • Reorder tests: a complex test will terminate as soon as it can determine the result. For "or" conditions, the first true value determines the result. For "and" conditions, the first false value determines the result. Based on your knowledge of the data, you can reorder the tests so the result can be found as soon as possible.
  • Use sentinel values: a sentinel is a value that will ensure that a condition will eventually be met. Assuming that you have a text string and you need to split it into words, you could use a condition like:

      for ( ; string[pos] != ' ' && pos < maxlen; pos++ ) ;

    This code looks for spaces as word separators. It also tests for the end of the string. You can replace the NULL at the end of the string with a space, saving one comparison per loop. The statement would become:

      for ( ; string[pos] != ' '; pos++ ) ;

    You just have to make sure you put back the NULL value once you are done.

  • Table lookup: when doing a table lookup, you may want to order it with the most common pick first, so that most of your searches will terminate after one comparison. In other situations, you may want to order them alphabetically and do a binary search. For a table of 100 elements of equal probability, you'll use, at most, seven comparisons to find your answer, instead of averaging 50 with a sequential search.

Database Access

Adding database access to an application raises a number of issues that relate to the communication with the database server. This section discusses some of the most important points to consider.

Database Connections

Establishing a connection with the database server is an expensive operation. You can look at it as a login procedure where some validation needs to be done and execution threads need to be created. You should try to keep your connection for the duration of the application. If you need to change connection, you should still preserve your other connections for reuse. This is done in different ways depending on the client interface used. Using INFORMIX-ESQL/C, you can name your connections:

    EXEC SQL CONNECT TO 'database' AS 'connection1';

This way you can refer to your connections by name. Assuming that you opened two connections named "connection1" and "connection2", you can set "connection1" as the current one with:

    EXEC SQL SET CONNECTION 'connection1';

In a multithreaded environment, connections can move from thread to thread. A connection must be set "dormant" before it can be used by another thread. To make "connection1" available to other threads, you can execute the following statement:

    EXEC SQL SET CONNECTION 'connection1' DORMANT;

If "connection1" is the current connection, it can be done with:

    EXEC SQL SET CONNECTION CURRENT DORMANT;

The connection names can be passed through host variables. This makes it easier to manage a pool of connections. Refer to the Informix documentation for more information.

Prepared Statements

Before an SQL statement can be executed, it must first be parsed, analyzed, and optimized. This represents a significant overhead for statements that are executed frequently. A statement can be prepared once for multiple execution:

    EXEC SQL PREPARE stmt1 FROM
    "UPDATE customer SET lname=? WHERE cus_num=?";

The statement is then executed by passing arguments to take the place of the question marks:

    EXEC SQL EXECUTE stmt1 USING :lname, :cnum;

By preparing SQL statements once for multiple executions, you can notice a significant performance improvement.

Insert Cursors

When you have a large number of rows to insert, you can optimize the communication with the database server by buffering the rows. The general idea is to open an insert cursor and use the PUT command to insert the row. Here is some pseudo-code that illustrates the process:

    EXEC SQL PREPARE stmt FROM
      "INSERT INTO customer VALUES(?,?,?,?);
    EXEC SQL DECLARE cur CURSOR FOR stmt;
    EXEC SQL OPEN cur;
    for (....) {
      ....
      EXEC SQL PUT cur FROM :val1, :val2, :val3, :val4;
    }
    EXEC SQL CLOSE cur;
    EXEC SQL FREE cur;
    EXEC SQL FREE stmt;

It is also possible to tune the buffer size. This could provide additional performance benefits. Please see the INFORMIX-SQL reference manual for information on the FET_BUF_SIZE environment variable. The default can also be set in the sqlhosts file.

Server-Side Programming

Informix Dynamic Server with Universal Data Option provides a new set of possibilities. According to Michael Stonebraker ("Betting on ORDBMS," Byte Magazine, April 1998), we can now choose between thick client, thick middleware, and thick database. In fact, it could be a mix of the three, depending on the processing requirements.

Universal Data Option can be part of the application instead of being a rigid persistence mechanism. Most businesses have particular ways to manipulate their business data. This often requires custom application development, with all the additional functionality required for reporting, printing, etc. This particular way to manipulate data can be included in the server to provide the proper answer. This way, common off-the-shelf products can be used to provide the user interface and additional functionality.

The benefits extend to performance. The full power of sophisticated algorithms and parallelism in the engine can be tapped by adding simple functions that will provide a better way to manipulate a specific type of data. Here is a simple example. Some Departments of Motor Vehicles store a license number that also includes the county of registration and the type of vehicle. To find out how many vehicles of each type have been registered for the month of September, you would need to pull all the records in a custom application and count them. With Informix Dynamic Server with Universal Data Option, you can add a function that will extract the type of vehicle from the license number. Answering the question becomes a simple SQL statement like:

    SELECT vehicleType(license), COUNT(*)
    FROM vehicleTable WHERE MONTH(date) = 9
    GROUP BY 1;

The vehicleType() function is very simple. It takes a license number as input and returns the type of vehicle it represents. A custom application would have required this function and additional code to process each row coming from the database. This represents more complexity than just adding the function in the server. Furthermore, the cost of transferring data to the application is significantly greater than doing the processing in the server. The result of the function vehicleType() could also be indexed. The index would be used, if it made sense, and performance would increase accordingly.

A lot more can be done with Universal Data Option. You can create new types, new operators, and even new aggregation functions. This is discussed in greater detail in the book "Informix Dynamic Server with Universal Data Option: Best Practices" listed in the reference section.

Conclusion

Performance is a complicated subject, and estimating performance is even more complex. This article shows that processing costs increase dramatically with the distance from the processor: cache, memory, disk, and network. Furthermore, since the CPU is orders of magnitude faster than I/O devices, we should always try to optimize the I/O by balancing the requests on as many devices as it makes sense for a specific application.

Operating systems try to optimize resource utilization. Each request made by application programs incurs a significant overhead. It is sometime possible to reduce the number of requests by grouping them together. This frees precious CPU cycles for more productive work.

Informix Dynamic Server is a sophisticated server that can be tuned to use as much resources as you see fit. Make sure it is tuned properly. Distribute your I/O on as many devices as possible, and don't forget to update the statistics so the optimizer can pick the right query plan.

The application code has the most impact on performance. The key is how you solve the problem. Some solutions processing increase performance exponentially, others linearly. Some programming techniques will improve performance with minimal effort. This includes optimized compilation and simple programming techniques like binary searches.

By keeping these simple concepts in mind during the design and implementation phases, it is possible to have applications that provide adequate performance at delivery.

References

  • Jon Bentley, Programming Pearls, ISBN 0-201-10331-1, Addison-Wesley, 1986
  • Jon Bentley, More Programming Pearls, ISBN 0-201-11889-0, Addison-Wesley, 1988
  • Maurice J. Bach, The Design of the UNIX Operating System, ISBN 0-13-201799-5, Prentice-Hall, 1986
  • Samuel J., Leffler ...[et al.], The Design and Implementation of the 4.3 BSD UNIX Operating System, ISBN 0-201-06196-1, Addison-Wesley, 1989
  • Andrew S. Tanenbaum, Operating Systems Design and Implementation, ISBN 0-13-637406-9, Prentice-Hall, 1987
  • Bil Lewis and Daniel J Berg, Threads Primer, ISBN 0-13-443698-9, SunSoft Press, 1996
  • Douglas Comer, Internetworking with TCP/IP, ISBN 0-13-470154-2, Prentice-Hall, 1988
  • Jacques Roy, Threaded Application Development Using Informix-Client SDK: A Practical Guide, Informix Tech Notes Volume 8, Issue 3, 1998
  • Informix Press, Informix Dynamic Server with Universal Data Option: Best Practices, ISBN 0-13-911074-7, 1998

About the Author

Jacques Roy, a frequent Tech Notes contributor, is a member of the Informix Advanced Technology Group in Denver, Colorado. He can be contacted via email at jroy@informix.com.

 

Украинская баннерная сеть
 

[Home]

Сайт поддерживается группой пользователей Информикс на Украине.

Hosted by NO-more.