Infrastructure Tracing


Infrastructure Tracing is an invaluable tool for determining what is going on in a running system.    There are two types of tracing, high level from libInf.so and Utility tracing from libUt.so (used for stand-alone programs).    I will describe high level first.    Trace levels and masks may be changed while a system is running via the infCtl tool.    Trace messages may be created in 'C' or Perl code in order to track or trace a program's execution and data flowing through it.    The amount of trace information collected may be dynamically adjusted.    Trace calls (trc) are placed in the 'C' or Perl source code when it is written.    Tracing is enabled by both the system layer name and a trace level.    The Layer name, for example "APP1", is a mask bit which is defined in trc.h and compared on each trace call.    Since the masks are bits, multiple layers may be traced at the same time.    The trace level is an integer from 1 to 255 and is used within the layer to determine the granularity of information to be captured.    The intended way to use trace level is to use 5 at the entry point of a high level function and at it's exit point.    Higher trace levels should be used within the function and in low level or utility functions.    In this way the higher the trace level (number) the more information is collected.    This mask / level is passed to the trace call as the first variable and constructed at compile time by adding the trace layer mask to the trace level number (ie. TRC_APP1 + 5).

The environment vaiable "TRACECTL" either in the Unix environment or in the confuguration file msy be used to enable tracing at program startup. The format for TRACECTL is "TRACECTL APP1+IO,55" which enables tracing in both the application layer and the socket IO layer and sets the level to 55. The infrastructure control program (infCtl) may also be used to change the trace level or mask while a process is running.

Error messages may be placed in an error log using the trace error call (trcErr). Errors may be passed to a remote daemon by defining a TCP address with the logical name "errLog" within a process' configuration file.

Since both trace and error logs use the same binary format, they must be read / examined using the trace extract command (extr). If extr is called with no options it will open the "trcLog" file in the current directory. A fully qualified path/file name may be passed on the command line. If an 'e' is found on the command line the local file 'errLog' is opend and examined. Command line options are:
                         
[e] From current error log
[xxx.xxx] From specified file
[trcLog.2] From old trcLog.2 file
[-b hh:mm:ss.subsec]] Begining time
[-e hh:mm:ss.subsec]] Ending time
[-E]] Errors only
[-m] Explicate Message Objects
[-p proc1[,proc2]] Only rcds from named process(es)
[-s srchString[,cnt]] Display only lines containing "srchString" in text messages.    If "cnt" is present display the following "cnt" lines.
[-t] Truncate all STR and BIN message text and fields.
[-u] Unsorted display (don't sort output).
[-v] Verbose


The trace file ("trcLog") may be created in two ways, directly by the executable (if no trace daemon is defined) or by a remote trace daemon. The selection is accomplished by defining the TCP port named "trcLog" using the following format: "TCP trcLog hostName PortNumber" at the beginning of the configuration file for the target process. The trace daemon must be running on the indicated host and configured to listen on the configured port. If the process stays up beyond midnight the file will be automatically rotated (this applies to both application and trcd).

trc( int Mask/Level, char* "format string", ... )
The basic trace call. The format string and any following argumants are as specified in printf.

trcErr( char* "formatString", ... )
The trace error call will place the formatted message in the error log stream.

Message Object Tracing:
Message objects may be traced by flagging the message at it's source (called Source tracing) which will cause each process which receives this message to place it into it's associated trace stream. Message objects may also be traced by setting TRC_IN or TRC_OUT in a process' environment. This will cause the process to trace all in-bound messages or out-bound messages (or both) no matter what the source of the message object.

The extr command will by default display the time stamp and sender of a message object. Extr can optionally explicate the entire message, dsplaying all records and fields. The explicated display will will be indented and will contain record headers followed by a line for each field. Field lines contain the field name, its data type, and the actual value stored. Binary field contents are displayed using a hex dump format with 16 bytes per line and their possible ascii equivalents.

Example of extr output:
09:05:40.595 tt2:2 -- msg_rcv: begin, TimOut: 10000
09:05:40.595 tt2:2 -- msg_rcv: Loop begin, myTid 2
09:05:40.595 tt2:2 -- msg_rcvL: thd 2 about to wait timO 10000
09:05:40.595 tt2:1 -- msg_rcvL: return, mutex wake
09:05:40.595 tt2:1 -- msg_rcv: Loop begin, myTid 1
09:05:40.595 tt2:1 -- msg_rcv: returning msg
09:05:40.595 tt2:1 -- msgRcv: OK, hndl:1, mp: 0X084FB4B8
09:05:40.595 tt2:1 -- _msgGetFld: Nam:myTid Typ:1, bufSiz:4
09:05:40.595 tt2:1 -- thd: Msg Rcv'd OK, cnt 1
09:05:40.595 tt2:1 -- msgFree: hndl: 1
09:05:40.595 tt2 -- Message, Rcvd From: tt1@Paul
09:05:40.595 tt2 -- msg_rcvL: OK
09:05:40.595 tt2 -- msg_rcv: OK 84FB4B8
09:05:40.595 tt2 -- msg_rcv: passing msg to thd 2

This is from a process using configuration file tt2.cfg and running multiple threads.    Here, trc statements from two threads and the main line are seen.    The colon folowed by the logical thread ID indicates which thread called trc.    The format is the timestatmp (down to the millisecond), the process name, and the message text.    Note the function name inserted at the beginning of the text area.

Example of extr with -m explicating a traced message:
11:37:06.146 tt2:1 -- _msgPutFld: hndl: 1, Nam: Data
11:37:06.146 tt2:1 -- _msgPutFldi: hndl: 1, Nam: myTid
11:37:06.146 tt2:1 -- thd: Sending msg 1
11:37:06.146 tt2:1 -- Message, Sent To: tt1@Paul
MsgObj: Flags: 0, Len: 328, Rcds: 1
   Rcd: 1, flds: 8
         type   INT [3]
         code   INT [3]
         status   INT [0]
         reqId   INT [0]
         msgTid   INT [1]
         questor   STR [tt2@Paul]
         Data   STR [how now brindle bovine]
         myTid   INT [1]
11:37:06.146 tt2:1 -- msgSnd: hndl: 1
11:37:06.146 tt2:1 -- msg_rcv: begin, TimOut: 10000
11:37:06.146 tt2:1 -- msg_rcv: Loop begin, myTid 1
11:37:06.146 tt2:1 -- msg_rcvL: thd 1 about to wait timO 10000
11:37:06.149 tt2 -- launchThd: seq:2
11:37:06.149 tt2 -- launchThd: pthread_create
11:37:06.149 tt2 -- launchThd: end seq:2
11:37:06.149 tt2 -- msg_rcv: begin, TimOut: 10


Low Level or Utility Tracing

The utility library of the infrastructure supports not only the top infrastructure but also stand-alone processes.    It has a built-in trace facility similar to the top layer and supports utTrc(), utTrcErr(), and utTrcFatal().    Arguments for these are the same as their high-level counterparts, but their invocation does not require the layer as part of the 1st utTrc argument, only it's level.   

The low level trace functions are:
utTrc( int level, char *fmt, ... )
     The basic trace call which only traces if the process's trace level is equal to or smaller than this
     statement's level field.    The format "fmt" and following elipsis ":..."
     operate exactly as they do in sprintf.
utTrcErr( char *fmt, ... )
     Output an error to the trace log ignoring tracelevel.    The format "fmt" and following elipsis ":..."
     operate exactly as they do in sprintf.
utTrcFatal( char *fmt, ... )    The format "fmt" and following elipsis ":..." operate exactly as they do
     in sprintf.
utTrcDmp( int level, uchar *loc, int len, char *fmt, ... )
     If the process's trace levle is less than or equal to the "level" argument, output a hex dump of the area
     starting at "loc" and continuing for "len" bytes.    The format "fmt" and following elipsis ":..." operate
     exactly as they do in sprintf.    The resulting formatted messages is placed at the beginning of hex dump.
utTrcClos()
     Close the trace file.
In all cases above except utTrcClose, the function is a macro which calls the real function and includes the function name so that it appears in the trace log.    The Trace log is in the process's current directory and is named "trc.txt".    Tracing may be invoked two ways: on the command line using "-T Level" or by using environment variable "TRC_ENA=level".    In both cases the level is an integer value between 1 and 255 and is compared to the level value in utTrc, and utTrcDmp, so that tracing does not occur if the process'es level is below the function call's level.    Again this works just like high level tracing, the proces's level determines the amount of trace information.