DB2 Trace Commands and Facilites

DB2 has extensive tracing facilities that generate trace records called IFCIDs. IFCIDs are produced by the DB2 Instrumentation Facility. These trace records describe internal events within DB2 at various levels of granularity. The Instrumentation Facility Interface allows application programs to retrieve IFCIDs from DB2. IFCIDs have a complex record format, and require sophisticated programming techniques for interpretation. The DB2 START TRACE command controls the production and disposition of IFCIDs. IFCIDs are useful for analysis and planning purposes. IFCIDs can be written to SMF, GTF, or OP buffer destinations. They can be processed by vendor report packages, or by locally developed SAS programs. Depending on the type of trace and the classes of information collected, DB2 traces can introduce overhead that ranges from inconsequential to devastating.

Introduction

Significant internal events within DB2 are represented by trace data created by DB2 resource managers and collected by the DB2 Instrumentation Facility. This facility is itself a fully integrated DB2 resource manager responsible for the collection and distribution of trace data. The Instrumentation Facility organizes event data into logical records called IFCIDs. These IFCIDs are recorded as variable-length, self-defining physical records that can be processed by QSAM or BSAM.

The DB2 Instrumentation Facility(IFI) is a software interface through which external programs may request and obtain IFCIDs from DB2. Application monitor programs may connect to DB2 via the batch Call Attach Facility. Once such a connection (thread) is established, DB2 traces can be started and stopped under program control.

While a DB2 trace is active, an application can collect IFCIDs from DB2 by issuing READA and READS IFI calls. Each kind of call acquires it’s own kind of data. READS calls collect snapshots of work in progress, and statistical data about the system. READA calls collect comprehensive low level detail IFCIDs. Â

IFI call types

To use the READA call, the application’s start trace command must specify a special parameter area address, and the size of a destination buffer that DB2 uses to temporarily hold the trace data until the application calls for it. The parameter area contains a percent value and the address of an event control block (ECB) in the applications addressable storage upon which the application will eventually wait. The destination buffer is called an OP buffer, and is allocated in the extended common storage area. When the OP buffer becomes full to the specified percentage, DB2 will POST the ECB, thereby notifying the application that there is data to be retrieved.Â

The application responds to the POST by first clearing the ECB and then issuing a READA IFI call. When the application makes it’s READA call, it must supply the address of a local buffer to receive the collected data. The Instrumentation Facility will move the IFCIDs from the OP buffer to the local buffer specified in the READA call. When control is returned to the application the READA IFCIDs are available for application processing. The READA caller is not suspended while DB2 collects the requested data, but is suspended while the collected IFCIDs are moved to its local buffer.Â

There are eight OP buffers, (OP 1 through OP8), with the maximum total allocation of OP buffer space of 4 Megabytes. The start trace command specifies the IFCID destination either by number (i.e. OPI), or generically with the OPx designation. OPx will use the next available OP number. OP buffers are filled by a wrap-around technique. If the application does not call for the data soon enough, DB2 may overwrite the OP buffer. In this case a console message is issued by the Instrumentation Facility Resource manager indicating that data has been lost.Â

The space allocated for the OP buffer is obtained under the TCB of the application that issues the start trace command. When the space for the buffers is accounted for, it will appear to belong to the application that started the trace, not to DB2.Â

OP buffer space is not released when the trace that caused it’s allocation ends. DB2 will reuse that space for subsequent trace requests if possible. If a second trace asks for a buffer larger than that used by the first trace, DB2 will allocate the larger buffer size, ignoring the buffer used by the first trace. This process will continue up to the four megabyte limit on OP buffer space, at which point DB2 will refuse to start subsequent traces.Â

The OPx destination specification should be used with caution. If two distinct applications issue identical start trace commands using OPx, the second trace will not be started. DB2 will issue a message stating that the trace is already started. This may cause the application issuing the start trace command to fail.Â

The READS call is a synchronous request for data. A caller supplies a buffer to hold requested data, and is suspended until the IFI has moved the required IFCIDs into the buffer. No OP buffer is associated with a READS call.Â

Some IFCID details

There are more than 190 different IFCIDs. Some internal events are of significant duration, and require the production of a pair of IFCIDs, one representing the beginning of the event, the other the end. Such pairs are useful for the computation of elapsed times. Less lengthy events are represented by a single IFCID.Â

While the physical components of an IFCID are various and sundry, for practical purposes there are only two parts to an IFCID: the product section and the data section.Â

The product section identifies that which is being traced. The start trace command allows the specification of special „headers“ that will be included by IFI when the IFCID is produced. These headers contain data on cpu time consumption, correlation information for identification purposes, and information about the use of Distributed Data Facilities (DDF).

The correlation header contains fields which identify both the consumer and the provider of resources. Some of these fields are the Resource Manager ID of the resource provider, an Agent Control Element address of the consumer of resources, and the Authorization ID of the user associated with the resource consumption.

Cumulative CPU consumption of an execution unit (TCB or SRB) is recorded in the CPU headers contained within the Product section of the IFCID if so requested on the START TRACE command.

The purpose of the data section is to record specific information key to understanding the internal DB2 operation represented by the IFCID. Some data section fields are characterized as „serviceability“ fields by IBM, and available to the user at his own risk. No further information about such fields can be expected from IBM. These fields appear to contain DB2 internal debugging data, and are typically only used under the direction of IBM. Most IFCIDs have data fields that describe events which are pertinent to the performance tuning and problem analysis processes.Â

IFCID physical layout

The first physical part of an IFCID is the Writer Header. Each IFCID is preceded by a header that is placed on the IFCID by the mechanism that wrote the record. Records directed to GTF have a GTF header. Records directed to SMF have SMF headers. IFI bound records are prefixed by a simple 4 byte Monitor header. The TRACE header is produced for GLOBAL trace IFCIDs.

The second physical part of the IFCID is the self-defining section. This part consists of a number (determined by the contents of the product section) of 8 byte subsections, each containing halfword length and count fields, and a fullword offset field. These subsections are used to locate the remaining sections of the IFCID.

The first subsection in the self-defining section locates the third part of the IFCID, the Product section. The product section contains a field that counts how many actual subsections in the self-defining section are valid. The number of sub-sections is a function of and is fixed for a given IFCID. However, not all of the subsections will contain information. If you process IFCIDs directly, you must first find the product section to determine how much of the self-defining section is valid.

The next subsection of the self defining section, and as many following subsections identified as valid in the product section, will point to the fourth part(s) of the IFCID, the data sections.

Although this structure is complex, most IFCIDs have a single data section. Exceptions to this seem to be confined to IFCIDs that describe internal DB2 reference data such as DBID/OBID translation records, and LOG dataset mapping. When multiple data sections do appear, they are a linear concatenation of sections described by the associated self-defining subsection.Â

More IFCID details

Some IFCIDs have individual significance. The IFCID 031 for example represents the EDM Pool Full condition. When space is exhausted in the Environmental Descriptor Management storage pool, this IFCID will be produced by the Data manager to record the occurrence, provided some trace was started that specified that this IFCID be collected.

Another example is the IFCID 024, Utility change of phase. IFCIDs such as these are useful for identifying events that may be problems, or as in the case of the IFCID 024, just noteworthy.

Each SQL statement has a beginning IFCID (59,60,61,… ) and an end IFCID (058) that can be used to calculate the duration and cpu resource consumption of the SQL statement. Further the 058 record carries the SQLCA with the result of the SQL call.

CREATE THREAD and TERMINATE THREAD elapsed times can be identified by the 072-073 and 074-075 pairs. In fact, even events such as storage requests can produce IFCIDs if requested to do so by a TRACE command. Care must be exercised so that too many IFCIDs are not generated by the trace request. Collecting storage request IFCIDs may be necessary sometimes, but should done judiciously, since the overhead of such traces is enormous.

Some IFCIDs have no data sections at all. They are markers in time. IFCID 074 (Begin Terminate Thread) has no data field, neither do the 042-043 begin-end Checkpoint IFCIDs.

The remainder of the IFCIDs contain reference data about the DB2 environment. IFCIDs 104, 105, 106, and 107, as well as IFCIDs 001, 002, and 003 contain environmental and/or summary data. These IFCIDs can give a system wide view of DB2.Â

Start trace command overview

The start trace command allows the specification of SMF and/or GTF as destinations for IFCIDs in addition to the OP destinations previously discussed. This command is fully documented in the DB2 Command and Utility Reference (SC26-4378). The command is complex and requires careful consideration.

The command specifies the type of trace that is to be started. Each type (PERFM, GLOBAL, ACCTG, STAT, AUDIT, MON) is associated with a certain set of IFCIDs. The sets overlap among the trace types. Depending on the circumstances, different types of traces may be started.Â

Types of DB2 traces

MONITOR Trace classes 1, 2 and 3 must be active if application programs are to collect IFCIDs via the Instrumentation Facility Interface. The Monitor Trace produces many IFCIDs that are also produced by other types of traces. MONITOR traces are low internal overhead traces. Most of the data is not collected until an application issues a READS IFI call.

The GLOBAL trace is used by IBM for debugging DB2 itself. The DB2 Diagnostics Guide identifies the types of IFCIDs collected by the trace class. Trace class is a parameter on the START TRACE DB2 command that is a short hand way of specifying sets of IFCIDs. Typically, the GLOBAL trace output is directed to an SRV or RES destination. IBM may direct you to start the GLOBAL trace in response to the reporting of a DB2 problem. You will be given explicit instructions when this is required. By its nature, GLOBAL trace can introduce significant overhead into the DB2 system.

The ACCOUNTING trace is used to associate resource consumption with a consumer (thread). Many levels of trace granularity are available according to the accounting classes specified on the START TRACE command. The Thread Accounting record (IFCID 003) is built at thread completion, and is a summary of resources used by the thread. Accounting trace IFCIDs can be directed to SMF for processing by post processor accounting systems. ACCOUNTING trace produces few records, and introduces a small internal overhead to DB2.

STATISTICS trace output (IFCIDs 001 and 002) is useful for trend analysis and capacity planning considerations. These IFCIDs can be directed to SMF. During DB2 installation, a DSNZPARM specification is made on panel DSNTIPN STATISTICS TIME field) that determines how frequently the system statistics trace records are produced. Since STATISTICS IFCIDs are only produced periodically, they are not a continuous source of internal overhead.

The AUDIT trace is used to supervise the access to objects by users. Audit IFCIDs are often sent to SMF. IFCIDs produced by the AUDIT trace contain detailed information about the use of DB2 objects, and are used by security analysis software packages for system administration purposes. AUDIT traces monitor specific thread related events that may not repeat very often during the life of a given thread. However, since the AUDIT trace tracks all threads in the system, there may be a moderate overhead introduced by the AUDIT trace.

PERFORMANCE trace can capture more kinds of IFCIDs than any other type of DB2 trace. The collection granularity varies from coarse to extremely fine. Because the potential output from the Performance trace is so great, IBM recommends that the IFCIDs be directed to GTF.

The performance trace can be tailored to monitor very specific events, or can be unrestricted for a system-wide view. This practice requires careful specification of the information to be collected.

Application monitors retrieve performance trace data by the READA IFI call described earlier. If an unrestricted trace is started it will generate a large volume of IFCIDs and DB2 will POST the monitor application’s ECB frequently. If the application cannot process the IFCIDs from a POST before DB2 POSTS the ECB again, DB2 may overwrite the OP buffer and data will be lost. The percent value given on the start trace command should be sized so that delays in application dispatching will not cause data to be lost. For example, if 50% is specified, DB2 will still have half the OP buffer empty when it POSTs the application. This should give the application some time before overwrite occurs.Â

The performance trace data can be used to establish application performance benchmarks. It is customary to use traces for problem resolution. However, the existence of begin-end pairs for important DB2 events creates opportunities to measure internal times of SQL statements and to develop application performance profiles. While gross thread resource consumption is available from historical accounting data, it is not possible from such data to determine the exact amount of resource consumption attributed to each SQL statement. Highly granular performance trace data supports resource consumption quantification by internal event. When a plan is noticed be deviating from it’s historical profile, the trace information could be used to identify the nature of the deviation.Â

Of course the basic use of the trace is problem determination. IFCIDs are written that identify resource contentions. Excessive resource consumption can be isolated to particular units of work. Performance trace may introduce significant overhead into the DB2 system. Often quoted estimates from IBM suggest from 20 to 200 percent increase in internal overhead due to performance trace. Because of the potentially damaging impact of performance traces they should be considered as a „court of last resort“ for the problem determination effort in a performance-critical production system. Performance trace requests for such systems need to be restricted to cases where the cost of not solving the problem is greater than the overhead introduced by running a performance trace. Of course the overhead is introduced only for the interval that the trace is running. Well designed traces of limited duration can be more useful than any other performance analysis techniques.Â

Additional command parameters

The DEST parameter determines where the IFCIDs that are generated by the START TRACE command are to be placed. If OPx (or OPn) is specified, the BUFSIZE parameter tells DB2 how much space it must reserve in ECSA to hold the IFCIDs for this trace. Monitor applications usually start some of the DB2 traces automatically during their initialization phase and direct them to OPx or OPn. If you use such applications, you must not attempt manual control of the traces used by these applications. Doing so might cause their abnormal termination. DB2 supports a display trace command that lets you associate an active DB2 trace with an authorization id so you can avoid accidental interference.Â

If GTF is specified, DB2 will span IFCIDs that are longer than 256 bytes. To avoid lost data, GTF must be started prior to the issuance of the start trace command, and must be stopped after the DB2 trace is stopped. If you do not use monitor applications, you may direct accounting and audit trace output to SMF. These IFCIDs can be processed by any SMF post-processing technique. If SMF is selected as the destination, adequate SMF buffers must be available to avoid lost data. Some types of DB2 traces (performance trace) produce tremendous volumes of IFCIDs in very short lengths of time, and the impact of such large quantities of records on SMF should not be underestimated.Â

The PLAN and AUTHID parameters identify what is to be traced. Specifying these parameters limits the amount of data generated by DB2 for this trace request. While this seems to be a good thing, it will not be so good if the trace must be rerun because it contains insufficient information. Up to eight PLANS and AUTHIDs may be specified on the start trace command. The CLASS, RMID, and IFCID parameters control what information is to be collected. Class and RMID are short-hand ways to specify sets of IFCIDS. Using the IFCID numbers themselves requires the user to be familiar with the contents of the IFCIDs by number. Probably the best way to specify is by CLASS. There are fewer classes (16) than IFCIDs (190+) or RMIDs (20) so the classes may be easier to remember. Specifying RMID might seem to be a good way to specify IFCIDS, but doing so might exclude IFCIDs that may be needed.

For example, you may want 031 EDM pool full records, so you specify RMID 14 (Data manager) to get them. Unfortunately the IFCID 105 DBID/OBID translation record needed to understand the 031 will not be collected, since it is produced by RMID 10 (Buffer manager). On the other hand, if you had specified Performance class 10, both IFCIDs would have been produced. Because of many such cases, the best specification appears to be performance class. Note that a given IFCID may be produced by more than one type of trace. For example, the DSNZPARM map (IFCID 106) is produced by both the accounting trace and the performance trace. Within a given type of trace, an IFCID may be produced by more than one trace class. Within the performance trace, IFCID 106 is produced by all classes from 1 through 14. Therefore there is no unique association of IFCID to type of trace, or class of trace. The only one to one relationship is between IFCID and RMID. An IFCID is produced by one and only one RMID.Â

The TDATA parameter controls the types of headers that are included in the Product section of the IFCID. CORR, CPU and DIST are recommended to obtain the most descriptive information available. If the DB2 is networked with other logical units, a LOCATION name may be given to limit traced events to those with location affinity to the specified name.Â

A method for obtaining IFCIDs from DB2

Â The following sample procedure assumes that GTF has been specified on the START TRACE command, and the appropriate decisions about the other command parameters have been made.

First GTF itself must be started. GTF allows a TIME parameter to be specified on it’s startupÂ procedure. This parameter causes GTF to place a timestamp on the records sent to it by DB2. Depending on the post processing analysis method, this parameter may or may not be specified. If the resulting GTF records are to be read into DB2PM, TIME=NO is recommended. If the IFCIDs are to be processed by a GPAR application, TIME=YES is recommended. For full flexibility, perhaps TIME=YES should always be specified. Since it is not required by DB2PM, it will be ignored, yet the time stamp will be there if an alternative post processor needs it. Once GTF is active, the START TRACE DB2 command must be issued. This order is important so that confusing DB2 „lost data“ messages are not generated. The unit of work must then be executed. While the unit of work (thread) is active, all IFCIDs specified by the start trace command, either explicitly or implicitly, will be collected by the IFI and presented to GTF. When the thread has terminated, or when the experiment has reached it’s logical conclusion, the STOP TRACE DB2 command must be issued. Although there are many forms for this command, the most simple requires only the Trace Number as an operand.

When any trace is started by DB2, it is assigned a trace number (TNO). Any subsequent reference to that trace, such as a MODIFY TRACE (or STOP TRACE) command needs only to specify the TNO for DB2 to associate the command with the object trace. Finally the STOP command must be issued to GTF. After GTF has stopped recording, the output GTF dataset is available for post production analysis by any available means. Because the volume of IFCIDs is often great, vendor products and locally developed application programs are usually given the task of trace data analysis. Report control language requests must be specified so that the program packages will produce the required reports. If the START TRACE request did not ask DB2 to include the correct mix of IFCIDs, the report generation phase may be disappointing. In that case, it may be necessary to repeat the trace procedure with different START TRACE parameters.Â

Understanding raw trace output

If no programming support is available, the analyst might print the reports directly via some MVS utility, and analyze the IFCIDs visually. This is a challenging activity at best, and impossible at worst. The structural complexity of IFCIDs makes field identification from its hexadecimal representation difficult. If the purpose of the trace is problem determination, time may be critical. On that basis, many installations invest in vendor programs for trace interpretation. You must be careful when doing manual analysis. For example, a popular technique restricts the trace output by PLAN name on the START TRACE command. Multiple threads might be concurrently executing the plan. In that case, all threads using that plan will have IFCIDs recorded. The ACE token (Agent Control Element) is an internal DB2 address that can be used externally as an association vehicle among IFCIDs. If PLAN is specified, the analyst must include a check of each IFCID’s ACE token value to insure that the IFCID associations are made to the correct thread.

One approach to organizing the trace output is to separate the IFCIDs into lists based on the value of ACE token contained in the Product Section Standard Header. Then for each list, identify the begin-end IFCIDs pairs associated with important events. If the CPU headers are included on the start trace command, wall clock and cpu time duration’s can be calculated from header information. For a given ACE, some IFCID pairs will occur within a larger context. For example, you may find a 059 Begin SELECT IFCID in the record stream. You know that a 058 will be produced to record the end of the SELECT statement processing, however you also note that the IFCID immediately following the 059 is an 006 Begin Read I/O, which has a 007 End Read I/O as it’s terminal record. Therefore, you conclude that any 006-007 pairs encountered before the 058 are all associated with the SELECT statement, and all resource consumption identified with these IFCIDs should be attributed to the SELECT SQL statement. In this way, the cost of each SQL statement can be evaluated in terms of the internal operations invoked within DB2 on behalf of the SQL statement. The cost of processing large (very general) traces must be balanced against the loss of time and resources needed to rerun the trace experiment. If the trace request is too specific, it may be useless. If the trace is too general, it may be intractable. Quite clearly, the selection of START TRACE command parameters is not a trivial exercise. Experience suggests that specifying trace collection criteria via the CLASS parameter is the wisest course. That is not to say that an experienced analyst might not be able to use highly customized sets of IFCIDs for meaningful analysis. However until that experience is gathered, CLASS specification should be used.Â

Conclusion

DB2 has more comprehensive tracing facilities than almost any other IBM system software. The choices for information collection available to the analyst are many. Accounting and Statistics traces have low Internal overhead and are recommended for continuous use. Injudicious use of highly granular performance traces can introduce enormous overhead into the system with little benefit. Any use of the Performance trace should be carefully evaluated, and a Test Plan should be developed to insure that the facility is optimally used.

7. April 2012 / DB2 for z/OS / Tags: DB2 Trac facilities, DB2 trace comands, GTF, IFCID, SMF

Comments

Comments are closed.