How to Make an Application Easy to Diagnose
Cary Millsap Hotsos Enterprises, Ltd. Revised 18 February 2005
Diagnostic data collection can be the most difficult step in any system performance improvement project. But the collection of good diagnostic data doesn't need to be difficult. The determining feature is the application itself. The key to making application performance easy to diagnose lies in the hands of the application developer. This session shows how a few simple development decisions can make application performance easy to diagnose.
1 The Need for (Measuring) Speed You’ve probably heard the old saying, “If you can’t measure it, you can’t manage it.” (Peter Drucker said that.) It’s deeply true about software. The performance of a piece of software is a measure of its speed. Unless you can measure accurately how fast a program runs—that is, how long it takes—then you can’t measure the performance of that program. If you can’t measure response time, then you can’t manage performance. Software developers use profilers and tracers to determine how long their code runs and to determine exactly why their code runs for as long as it does. Tools like GNU gprof and strace are excellent application development tools, but the scope of what they can do for you is incomplete. You can improve your software considerably by building some additional instrumentation into your own code. With just a few good design decisions, you can make your code far easier to diagnose and repair. The result is better, faster software that’s easier to fix and that yields happier customers and lower support costs.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
1
The code instrumentation idea is not new. For example the Application Response Measurement (ARM) is a technical standard with the following history: The Application Response Measurement (ARM) working group of the Computer Measurement Group (CMG) published its ARM version 1.0 API specification in June 1996. It then issued its ARM version 2.0 API specification in November 1997. Then, in March 1998, it submitted the ARM version 2 API to The Open Group, which approved it for publication as an Open Group Technical Standard. (Source: http://www.opengroup.org/onlinepubs/009619299/toc.pdf)
Even before that, in 1992, Oracle Corporation introduced response time instrumentation into version 7 of the Oracle Database kernel. I work exclusively in the Oracle software industry segment, and my experience with application instrumentation is rooted deeply in how Oracle instruments its kernel. Not everybody knows it yet, but with Oracle’s instrumentation, a good analyst can see pretty much exactly how an Oracle kernel program spends its time.1 But to use the feature, you have to collect the trace data. The design of the application making Oracle database calls largely determines how easy it is to collect Oracle trace files.
2 Requirements I’ll begin with a few rules for trace data that I think may ultimately constitute a set of “immutable requirements.” You should regard this list as an early work in progress, which I expect to evolve as my company takes on application instrumentation projects. Exhibit 1. Application trace data requirements 1.
Trace data must be written to a file.
2.
The application user gets to decide where this file should be written and what its name shall be.
3.
The application user gets to decide whether to run a program with tracing turned on, or with tracing turned off.
4.
The file has a version number in it and whatever additional information is required (such as a field key) so that the application user (and his profiler software) can understand how to interpret the particular version of the data
1
See Millsap and Holt’s Optimizing Oracle Performance (O’Reilly & Associates, 2003) for details.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
2
he’s looking at. This allows the format of trace data to improve over time without breaking older profilers. 5.
The application vendor must fully support the application’s trace data. The vendor must fully document the format of the trace file and the meaning of its content.
6.
A trace file “event” line maps to a logical unit of work—usually a subroutine. The unit of work must be small enough that the reader of the trace data doesn’t require more detail about the unit of work than is rendered in the trace file. The unit of work must be large enough to minimize the measurement intrusion effect of the instrumentation.
7.
The collection of “event” lines in the trace file must provide complete coverage of the application’s code path.
8.
Each “event” line must have a timestamp. The trace documentation must explain to what event that timestamp refers. (Typically, it’s the time of the event’s conclusion.)
9.
If the trace file’s timestamp values aren’t human-readable, then the trace file must provide information that allows for easy conversion of timestamps into human-readable wall-clock values.
10. Each “event” line must show an elapsed time consumption. 11. Each “event” line must show resource consumption for both kernel mode and user mode CPU usage. 12. Each “event” line must show the name of the “event,” its call stack depth, and the name of its caller. 13. Each “event” line must have the provision for displaying context-sensitive values about the instrumented event. 14. The application must flush trace lines to the trace file as events complete. If the application can buffer its trace emissions, then there must exist a userselectable option to produce un-buffered output. 15. Trace data should be reasonably conservative about space consumption (and the time it takes to write the trace data). For example, a single key defining the meaning of delimiter-separated fields is more efficient than using a name=value style syntax for every field throughout the trace file. 16. Every time a business-level task begins or ends, the application must emit information to the trace file to signify the business task boundary. 17. If a business task is permitted to execute piecewise across two or more OS processes, then the trace data must contain markers sufficient to assemble the relevant fragments of trace data into one contiguous time-sequential description of the task’s response time.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
3
18. Each tier must be instrumented so that a user can compute end-to-end response time for the measured task. 19. The application instrumentation must be minimally invasive upon the response time of the application. 20. The application instrumentation must be minimally invasive upon the author of the application code. Exhibit 1 is a pretty strict-looking list, but I’ll show you a simple implementation that goes a tremendous distance in meeting all of these requirements.
3 Implementation I think this paper would fall far short of a useful target if I didn’t also provide specific ideas for how to meet the recommendations in the prior section. In the remainder of this paper, I focus on two aspects of application instrumentation: The 3GL code that you write The Oracle Database calls that you make in your 3GL code
3.1 Instrumenting Your 3GL Code As is the case with many things I design, this paper began with a prototype, my experience with which strongly influenced its own specification. In other words, I began with a general idea, and I wrote a program to implement that idea. Using that program bore new refinements of the original idea, which I implemented. Using those refinements bore new ideas, which bore new refinements. The requirements list described earlier is the result of just a few iterations of experience with my prototype, the code for which I describe later in this paper. In the following section, I describe one trace data format that meets the requirements outlined in the prior section. In the section following that, I describe some source code that emits that trace data format. 3.1.1
The trace file
I have learned an immense amount about Oracle trace data and how to get the most out of it, since I began focusing on this subject full-time in late 1999. Oracle is the world leader in the domain of RDBMS instrumentation. Nobody else even comes close. Unfortunately, though, the Oracle trace file format is difficult to use without software assistance. In fact, I make my living by teaching people and writing software to execute the very, very complicated process of converting raw Oracle trace data into usable information about performance. In spite of the complexity, the Oracle tracing feature is an indispensable tool for the performance analyst.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
4
Without it, you are unable to measure important details about program performance. But a few changes to the design of how Oracle tracing works would make Oracle trace data immensely easier for analysts to use. In the process of studying Oracle trace data, I’ve learned how to implement a tracing protocol that is both extremely informative and minimally invasive to your developers and your users. There are often many implementations that meet a given standard. The requirements shown above are no exception. For example, an obvious early choice in the design of a trace file format is the question whether to record the data in XML or some other format. Each choice has its own advantages: XML XML is an excellent structure for trace files, because there are so many tools for manipulating XML data. There are excellent XML schema design tools (I use XMLSpy from Altova), and it’s very easy to write an XML parser using libraries that are feely available for just about any programming language you can think of. Other formats One downside of XML is its storage requirement. It takes a lot more disk space (and read/write latencies) to store a million records that look like this: <event t="1107275841.885155" e="1.726098" usr="1.510000" sys="0.000000" dep="1" caller="dad" callee="randomer" p1="TX" p2="4" p3=""/>
…than to store a million records that look like this: 1107275841.885155
1.726098
1.510000
0.000000
1
dad
randomer
TX
4
I love XML. But the route I’ve chosen for now is a compact non-XML format that conserves storage and read/write latencies. However, recognizing the value of XML—especially for data sharing and easy processing with XSLT—I’ve designed the trace data format so that it would be easy to transform the raw trace data to an XML format. One advantage of this approach is that it relieves the application being measured of the complexities of XML output emission. The raw-trace-to-XML transformation process can (and should) take place on a system separate from the computer whose job is to provide the best possible service to your application end users. Exhibit 2 shows the format that I’ve chosen for my raw application trace data. The narrative following the exhibit explains. Exhibit 2. My raw application trace data format $ p.pl version=1.1 key=time ela
usr
sys
dep
caller
callee
p1
1107275831.899634=2005/02/01/10:37:11.899634 1107275831.899634 0.000472 0.000000 0.000000 1107275833.456488 1.556308 1.520000 0.000000 1107275833.456690 1.556574 1.520000 0.000000 1107275833.673210 0.216378 0.000000 0.000000
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
p2 0 1 0 0
p3
<> open trace STDOUT dad randomer TX 4 <> dad 1 <> sleeper 0.202584
5
1107275835.307857 1.634442 1.500000 0.000000 1107275836.901840 1.593879 1.510000 0.000000 1107275836.902033 3.228636 3.010000 0.000000 1107275837.073212 0.171043 0.000000 0.000000 1107275838.611881 1.538461 1.510000 0.000000 1107275840.158874 1.546890 1.500000 0.000000 1107275841.885155 1.726098 1.510000 0.000000 1107275841.885349 4.811947 4.520000 0.000000 1107275842.493227 0.607736 0.000000 0.000000 1107275845.500822=2005/02/01/10:37:25.500822 1107275845.500822 3.007385 1.510000 0.000000 1107275848.437560=2005/02/01/10:37:28.437560 1107275848.437560 2.936309 1.500000 0.000000 1107275851.376666=2005/02/01/10:37:31.376666 1107275851.376666 2.938823 1.480000 0.000000 1107275854.309974=2005/02/01/10:37:34.309974 1107275854.309974 2.933028 1.510000 0.000000 1107275857.229353=2005/02/01/10:37:37.229353 1107275857.229353 2.919083 1.500000 0.000000 1107275857.229638 14.736218 7.520000 0.000000 1107275858.220988 0.991273 0.000000 0.000000 1107275861.141928=2005/02/01/10:37:41.141928 1107275861.141928 2.920723 1.510000 0.000000 1107275864.081417=2005/02/01/10:37:44.081417 1107275864.081417 2.939066 1.520000 0.000000 1107275864.081698 5.860510 3.030000 0.000000 1107275865.087643 1.005867 0.000000 0.000000 1107275868.253205=2005/02/01/10:37:48.253205 1107275868.253205 3.165458 0.000000 0.000000 1107275868.253484 0.004535 0.000000 0.020000 $
1 1 0 0 1 1 1 0 0
dad dad <> <> dad dad dad <> <>
randomer TX 4 randomer TX 4 dad 2 sleeper 0.163678 randomer TX 4 randomer TX 4 randomer TX 4 dad 3 sleeper 0.599779
1
dad
randomer
TX
4
1
dad
randomer
TX
4
1
dad
randomer
TX
4
1
dad
randomer
TX
4
1 dad randomer TX 4 0 <> dad 5 0 <> sleeper 0.902717 1
dad
randomer
TX
4
1 0 0
dad randomer TX 4 <> dad 2 <> sleeper 0.911588
0 0
<> <>
sleeper 3.14159 write to trace STDOUT
The first lines of the trace file meet requirements 4 and 15. version=1.1 key=time ela
usr
sys
dep
caller
callee
p1
p2
p3
These lines instruct readers (or parsers) of the trace data about the file format and specific meaning of the data they’re about to read. A parser for version v trace data should be coded in such a manner that if it reads a file format version number greater than v, then the parser informs its user that the parser is not compatible with the trace data. For example, a version 1.1 parser should die without attempting to parse version 1.2 trace data, but a version 1.2 parser may contain backward-compatibility code path that allows it to parse version 1.1 data. The key line is a space-saving (and latency-saving) device. Rather than put the key on each line (like Oracle does with its c=v1,e= v2,p= v3,… format), the key here is rendered on a single line telling readers and parsers how to interpret the data it will get later on in the file’s body. This particular file has data in the form of a time column, followed by two blank spaces, followed by an ela column, followed by two blank spaces, and so on. As you’ll see later on, the user gets to choose the delimiter string. Here, I’ve set the delimiter to two blank spaces to make this paper easier for you to read, but normally I set it to a single tab character.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
6
There are probably loads of other lines that should be included in the preamble to the real trace data. For example, the Oracle Database kernel emits the name of the trace file itself (in case it’s moved), the Oracle version number, the installed Oracle Database options, information about the underlying host, and so forth. The next line meets requirement 9: 1107275831.899634=2005/02/01/10:37:11.899634
It equates the space-saving internal-format timestamp on the left to the humanreadable wall clock time on the right. This makes it easy for a reader or parser to convert timestamps to wall clock values and vice versa. The real guts of the content follow: 1107275831.899634 1107275833.456488 …
0.000472 1.556308
0.000000 1.520000
0.000000 0.000000
0 1
<> open trace STDOUT dad randomer TX 4
Each of these lines fits the format described in the key line near the beginning of the file. The second line above means this: •
The time of at which the event ended is 1107275833.456488 seconds since the Epoch.
•
The elapsed duration of the event is 1.556308 seconds.
•
The user-mode CPU consumption of the event is 1.52 seconds.
•
The kernel-mode CPU consumption of the event is 0.0 seconds.
•
The call stack depth of the event is 1.
•
The event’s caller is named dad.
•
The event itself is named randomer.
•
The first context sensitive data value is TX.
•
The second context sensitive data value is 4.
•
The third context sensitive data value is the empty string.
The remaining lines are just more of the same. Taken all together, the lines shown in Exhibit 2 describe the detailed decomposition of the 36.353850-second application execution (which you can verify by subtracting the first timestamp in the trace file from the last timestamp in the trace file). In trace data sequences like the following, you can plainly see the parent-child relationships between program subroutine calls: 1107275837.073212 1107275838.611881 1107275840.158874 1107275841.885155
0.171043 1.538461 1.546890 1.726098
0.000000 1.510000 1.500000 1.510000
0.000000 0.000000 0.000000 0.000000
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
0 1 1 1
<> sleeper 0.163678 dad randomer TX 4 dad randomer TX 4 dad randomer TX 4
7
1107275841.885349
4.811947
4.520000
0.000000
0
<>
dad
3
In this snippet, you can tell (more importantly, the application user can tell) exactly how the call stack is progressing during the application run. Specifically, you can see that the following code path was executed: sleeper finished, consuming 0.171043 seconds. dad started running (you can tell because dad is the caller of randomer). randomer finished (inside dad), consuming 1.538461 seconds. A second execution of randomer finished (inside dad): 1.54689 seconds. A third execution of randomer finished (inside dad): 1.726098 seconds. dad finished, consuming 4.52 seconds. You have everything you need in the trace data to see clearly that the execution time of dad includes the time that dad consumed running three executions of randomer. That is, dad spent only 0.000498 seconds executing code path outside the subroutines it called (4.811947 – (1.538461 + 1.546890 + 1.726098)). From the information shown here, you (or a simple profiler tool) could easily create a flat profile by subroutine or a call graph to explain in summary where every bit of your time went while this application was running. (See my article called “Profiling Oracle: how it works” for more details.) The caller id is a big improvement over Oracle trace data. For the most part, you can derive such relationships from Oracle trace data, but you have to guess sometimes who the caller is, because Oracle trace files don’t reveal this information explicitly. 3.1.2
The application
Now let’s look at the code that produced this trace data. I’ve written the example in the Perl programming language out of convenience; I write a lot of Perl, so it’s an environment in which I’m comfortable. Perl offers features that lend themselves nicely to efficient application instrumentation, but I expect that any other programming environment to which professional application developers would willingly submit themselves will offer similar features. Later in this paper, I’ll explain the code. You’re free to implement it if you wish, or something like it in any another language that you like. If you don’t know Perl, don’t panic. I’ll walk you through it. The code is shown in Exhibit 3. Exhibit 3. Source code for an instrumented application #!/usr/bin/perl use use use use
strict; warnings; Time::HiRes qw(time sleep); IO::Handle;
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
8
use Date::Format qw(time2str); # manifest constants that should be input parameters my $KSDDDT_TIMEOUT = 2; # seconds (Oracle uses 10) my $TRACE_AUTOFLUSH = 1; # flush output immediately instead of buffering my $TRACE_FILENAME = "STDOUT"; # global variables my $TRACE_VERSION my $DELIM my $NULL_PARENT # call ids use constant use constant use constant use constant use constant
= "1.1"; = " "; = "<>";
OPEN_TRACE WRITE_TRACE RANDOMER DAD SLEEPER
=> => => => =>
# version of this trace file format # trace line field delimeter # show this when there's no parent
"open trace"; "write to trace"; "randomer"; "dad"; "sleeper";
sub ksdddt($) { # just like Oracle's dbms_system.ksdddt my ($t) = @_; my $usec = $1 if sprintf("%.6f", $t) =~ /\d+\.([0-9]+)/; printf TRACE "%.6f=%s.%06d\n", $t, time2str("%Y/%m/%d/%T", $t), $usec; } { my $time_of_recent_write = 0; # always call ksdddt at trace file open my $trace_fmt; # format of trace file output my ($trace_e, $trace_s, $trace_u) = (0, 0, 0); # cumulative trace writing durations sub write_trace($$$$$$$$$$) { my ($e0, $u0, $s0) = (time(), times()); my ($t, $e, $u, $s, $dep, $caller, $callee, $p1, $p2, $p3) = @_; ksdddt($t) if $t - $time_of_recent_write >= $KSDDDT_TIMEOUT; $time_of_recent_write = $t; printf TRACE "$trace_fmt\n", $t, $e, $u, $s, $dep, $caller, $callee, $p1, $p2, $p3; my ($e1, $u1, $s1) = (time(), times()); $trace_e += $e1 - $e0; $trace_u += $u1 - $u0; $trace_s += $s1 - $s0; } sub open_trace() { my ($e0, $u0, $s0) = (time(), times()); open TRACE, ">&$TRACE_FILENAME" or die "can't open '$TRACE_FILENAME'"; autoflush TRACE $TRACE_AUTOFLUSH; printf TRACE "version=%s\n", $TRACE_VERSION; # following trick makes it easy to maintain field-format relationships my @trcfields = qw( time %.6f ela %.6f usr %.6f sys %.6f dep %d caller %s
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
9
callee p1 p2 p3
%s %s %s %s
); my (@fields, @formats) = (); for (my $i = 0; $i < $#trcfields; ) { push @fields, $trcfields[$i++]; # fields are even-numbered elements push @formats, $trcfields[$i++]; # formats are odd-numbered elements } printf TRACE "key=%s\n\n", join($DELIM, @fields); $trace_fmt = join($DELIM, @formats); my ($e1, $u1, $s1) = (time(), times()); my $e = $e1 - $e0; my $s = $s1 - $s0; my $u = $u1 - $u0; write_trace( $e1, $e, $s, $u, dep(), parent(), OPEN_TRACE, "$TRACE_FILENAME", "", "" ); } sub close_trace() { write_trace( time(), $trace_e, $trace_s, $trace_u, dep(), parent(), WRITE_TRACE, "$TRACE_FILENAME", "", "" ); close TRACE or die "can't open '$TRACE_FILENAME'"; } } { my my my my
@call_stack @e_stack @u_stack @s_stack
= = = =
(); (); (); ();
# # # #
stack stack stack stack
of of of of
call ids elapsed durations user mode CPU consumptions kernel mode CPU consumptions
sub begin_call($) { my ($callee) = @_; push @call_stack, $callee; my ($e, $u, $s) = (time(), times()); push @e_stack, $e; push @u_stack, $u; push @s_stack, $s; } sub end_call($$$) { my ($p1, $p2, $p3) = @_; my $callee = pop @call_stack; my $dep = scalar(@call_stack); my $caller = $call_stack[-1] || $NULL_PARENT; my ($e1, $u1, $s1) = (time(), times()); my $e = $e1 - pop @e_stack; my $u = $u1 - pop @u_stack; my $s = $s1 - pop @s_stack; write_trace( $e1, $e, $u, $s, $dep, $caller, $callee,
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
10
$p1, $p2, $p3 ); } sub dep() { return scalar(@call_stack); } sub parent() { return $call_stack[-2] || $NULL_PARENT; } } sub randomer() { begin_call(RANDOMER); # do the work here my $x = 3.14159; for (1..1000000) { $x *= 1 + rand if $_ % 2 == 0; $x /= 1 + rand if $_ % 2 == 1; } end_call("TX", "4", ""); } sub dad() { begin_call(DAD); # do the work here my $i = int(rand 5) + 1; randomer() for 1..$i; end_call($i, "", ""); } sub sleeper($) { my ($call) = @_; begin_call(SLEEPER); # do the work here my $r; if ($call eq "big sleep") { $r = 3.14159; sleep($r); } else { $r = sprintf("%.6f", rand); sleep($r); } end_call($r, "", ""); } open_trace(); for (1..10) { if ($_ % 2) { dad(); } else { sleeper("little sleep"); } } sleeper("big sleep"); close_trace; __END__
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
11
The application, of course, is total nonsense. I’ve invented three subroutines that don’t do anything useful except consume time and demonstrate how the instrumentation works. I’ll begin the walkthrough with the definition of the Perl subroutine called ksdddt. The name is no coincidence: dbms_system.ksdddt is the name of the Oracle stored procedure that writes a timestamp into a database trace file. My code here prints a timestamp that the subroutine receives as an argument to the trace file in the format timestamp=wall-time. The assignment to the variable called $usec just splits off the microseconds portion of the timestamp because the Perl time2str function only works to a 1-second resolution. The next block of code surrounded by braces is the trace file manipulation section. The block contains five private “static” variables that will keep their values for the duration of the application program, but which will be invisible to any subroutine outside the block. The $time_of_recent_write variable is kept current within the write_trace subroutine. If the current time minus the time of the most recent write exceeds the value of $KSDDDT_TIMEOUT, then the write_trace subroutine emits a timestamp by calling ksdddt. This is exactly how the Oracle kernel appears to work, except I’ve used a timeout value of 2 seconds where Oracle appears to use a 10-second timeout. Timings throughout this Perl program are acquired partly from a redefinition of the Perl time function available in the module called Time::HiRes. The standard Perl time function returns a time-since-Epoch value with one-second resolution, but the time function from Time::HiRes returns a time-since-Epoch value with one-microsecond resolution. The times function returns a two-element list containing the user-mode and kernel-mode CPU time consumptions for the process making the call. So the following statement assigns three values—the current time, the user-mode CPU consumption, and the kernel-mode CPU consumption—to the variables named on the left-hand side of the assignment operator: my ($e, $u, $s) = (time(), times());
Notice that all the trace file manipulation subroutines in the block pay careful attention to how long they themselves take, and they emit trace data to account for the time they’re consuming as well. The write_trace subroutine tallies how long it spends writing to the trace file, and the close_trace subroutine prints this tally to the file before it closes that file. The rest of the code is pretty simple except for perhaps the trick that I’ve used to make it easier for me to change the format of the trace output without having to change a lot of lines of code. The next brace-enclosed block of code contains the definitions of begin_call and end_call, along with their associated static variables and a couple of subroutines that provide controlled access to those static variables. The begin_call subroutine is what the application calls at the beginning of an instrumented event, and the end_call subroutine is what the application calls at the end of an instrumented event (requirement 6).
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
12
The encapsulation of the appropriate timer and trace writing calls into these subroutines is what makes the instrumentation minimally invasive into the application developer’s world (requirement 20). Literally the only thing the application developer has to do is include some code at the top of his program (for the most part, the two code blocks I’ve described previously), and write code using the following template: sub my_subroutine(…) { begin_call(MY_SUBROUTINE_NAME); # do the subroutine's work here end_call($p1, $p2, $p3); } open_trace; begin_call(MY_PROGRAM_NAME); # do the main program's work here end_call($p1, $p2, $p3); close_trace;
3.2 Instrumenting Your Oracle Calls Instrumenting your Oracle Database calls is really the easy part because most of the decisions have been made for you. There’s really only one database call tracing mechanism in the Oracle kernel; the obvious key is knowing how to turn it on and off (requirements 2 and 3). But there’s a more subtle point that many people miss: When you have a whole pile of Oracle trace data, how will you identify which files (or parts of files) represent the business task you’re trying to diagnose (requirement 17)?
A couple of well-informed decisions during application design can make Oracle trace data tremendously more useful to your users, your support staff, and your developers. 3.2.1
Turning on the trace
The first thing you need to know is how to make your application turn on the Oracle trace. Exhibit 4 shows how an application can manipulate its Oracle tracing attributes in Oracle version 10, and Exhibit 5 shows how an application can accomplish the same task in versions prior to 10. Exhibit 4. Tracing an Oracle Database 10g application $ sqlplus … SQL> show parameter… NAME --------------------------timed_statistics max_dump_file_size user_dump_dest …
TYPE ----------boolean string string
VALUE ------------------------------TRUE UNLIMITED /u01/app/oracle/admin/v10/udump
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
13
SQL> exec dbms_monitor.session_trace_enable(null,null,true,true); SQL> select 'hello' from dual; SQL> exec dbms_monitor.session_trace_disable(null,null); … $ ls -lt $UDUMP total 248 -rw-r----- 1 oracle …
oinstall
2396 Jan 17 11:18 v10_ora_5286.trc
Exhibit 5. Tracing an Oracle Database application prior to 10g $ sqlplus … SQL> show parameter… NAME TYPE VALUE --------------------------- ----------- ------------------------------timed_statistics boolean TRUE max_dump_file_size string UNLIMITED user_dump_dest string c:\Oracle\admin\v92\udump … SQL> exec dbms_support.start_trace(true,true); SQL> select 'hello' from dual; SQL> exec dbms_support.stop_trace; … $ ls –lt %udump% total 44 -rw-rw-rw- 1 user …
group
4083 Jan 17 02:29 v92_ora_2272.trc
I won’t belabor the details of turning on the tracing mechanism here. See the references listed at the end of the paper for further information. 3.2.2
Identifying the traced business task
It’s easy to turn on extended SQL trace. If nothing else, you can turn it on for an entire Oracle instance from SQL*Plus with an alter system command. You shouldn’t, but you could. If you were to do that, you would arrive immediately at the doorstep of a subtle and more complicated problem: which parts of which trace files should you be paying attention to? The answer to this question lies within the design of the application. Not the Oracle Database kernel, mind you, but the application that you’ve written, which calls the Oracle Database kernel: It is the application’s responsibility to identify to the Oracle Database kernel what business task is being traced.
The means for doing this has been around for more than a decade. The standard Oracle package called dbms_application_info provides “set and get” methods for manipulating attributes within v$session. The less-well-advertised benefit of using dbms_application_info is that it posts extra information to your trace file
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
14
(or files) that enable you to extract just the stuff you need to analyze a task’s performance. Here’s an example SQL*Plus session that shows how it works. Of course, you’re probably not going to use SQL*Plus as an application development tool, but this gives you an idea of the database calls that an Oracle-based application needs to make: exec dbms_monitor.session_trace_enable(null,null,true,true); exec dbms_application_info.set_module('demo','greeting'); select 'hello world' from dual; exec dbms_application_info.set_module('demo','real business'); select count(*) from dba_objects where owner='SYSTEM'; disconnect;
Exhibit 6 shows the Oracle trace file for this session. Exhibit 6. Oracle trace data for a 10g session that uses dbms_applciation_info /u01/app/oracle/admin/test/udump/test_ora_25390.trc Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /u01/app/oracle/product/10.1.0/Db_1 System name: Linux Node name: hotsos01.internal.hotsos.com Release: 2.4.21-20.ELsmp Version: #1 SMP Wed Aug 18 20:46:40 EDT 2004 Machine: i686 Instance name: test Redo thread mounted by this instance: 1 Oracle process number: 12 Unix process pid: 25390, image:
[email protected] (TNS V1-V3) *** ACTION NAME:() 2005-02-03 15:23:47.147 *** MODULE NAME:(SQL*Plus) 2005-02-03 15:23:47.147 *** SERVICE NAME:(SYS$USERS) 2005-02-03 15:23:47.147 *** SESSION ID:(151.4201) 2005-02-03 15:23:47.147 ===================== PARSING IN CURSOR #2 len=68 dep=0 uid=5 oct=47 lid=5 tim=1081509596823356 hv=3695927 928 ad='82ef303c' BEGIN dbms_monitor.session_trace_enable(null,null,true,true); END; END OF STMT EXEC #2:c=0,e=34542,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=1081509596823348 WAIT #2: nam='SQL*Net message to client' ela= 3 p1=1650815232 p2=1 p3=0 WAIT #2: nam='SQL*Net message from client' ela= 409 p1=1650815232 p2=1 p3=0 ===================== PARSING IN CURSOR #1 len=65 dep=0 uid=5 oct=47 lid=5 tim=1081509596864064 hv=5195029 30 ad='7ea9771c' BEGIN dbms_application_info.set_module('demo','greeting'); END; END OF STMT PARSE #1:c=10000,e=356,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1081509596864056 BINDS #1: *** ACTION NAME:(greeting) 2005-02-03 15:23:47.189 *** MODULE NAME:(demo) 2005-02-03 15:23:47.189 EXEC #1:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=1081509596864278 WAIT #1: nam='SQL*Net message to client' ela= 3 p1=1650815232 p2=1 p3=0 WAIT #1: nam='SQL*Net message from client' ela= 383 p1=1650815232 p2=1 p3=0 =====================
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
15
PARSING IN CURSOR #2 len=30 dep=0 uid=5 oct=3 lid=5 tim=1081509596866978 hv=11586221 43 ad='80ca41f0' select 'hello world' from dual END OF STMT PARSE #2:c=0,e=1975,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1081509596866971 BINDS #2: EXEC #2:c=0,e=63,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1081509596867169 WAIT #2: nam='SQL*Net message to client' ela= 3 p1=1650815232 p2=1 p3=0 FETCH #2:c=0,e=13,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=1081509596867263 WAIT #2: nam='SQL*Net message from client' ela= 298 p1=1650815232 p2=1 p3=0 FETCH #2:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1081509596867707 WAIT #2: nam='SQL*Net message to client' ela= 2 p1=1650815232 p2=1 p3=0 WAIT #2: nam='SQL*Net message from client' ela= 380 p1=1650815232 p2=1 p3=0 STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='FAST DUAL (cr=0 pr=0 pw=0 time=4 us)' ===================== PARSING IN CURSOR #1 len=70 dep=0 uid=5 oct=47 lid=5 tim=1081509596868572 hv=2802810 738 ad='82cea70c' BEGIN dbms_application_info.set_module('demo','real business'); END; END OF STMT PARSE #1:c=0,e=202,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1081509596868566 BINDS #1: *** ACTION NAME:(real business) 2005-02-03 15:23:47.193 *** MODULE NAME:(demo) 2005-02-03 15:23:47.193 EXEC #1:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=1081509596868780 WAIT #1: nam='SQL*Net message to client' ela= 3 p1=1650815232 p2=1 p3=0 WAIT #1: nam='SQL*Net message from client' ela= 357 p1=1650815232 p2=1 p3=0 ===================== PARSING IN CURSOR #1 len=37 dep=1 uid=0 oct=3 lid=0 tim=1081509596870131 hv=13986105 40 ad='8421e30c' select text from view$ where rowid=:1 END OF STMT PARSE #1:c=0,e=153,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1081509596870123 BINDS #1: bind 0: dty=11 mxl=16(16) mal=00 scl=00 pre=00 oacflg=18 oacfl2=0001 size=16 offset =0 bfp=b75dc22c bln=16 avl=16 flg=05 value=000016EB.0001.0001 EXEC #1:c=0,e=150,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1081509596870450 FETCH #1:c=0,e=62,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,tim=1081509596870553 STAT #1 id=1 cnt=0 pid=0 pos=1 obj=62 op='TABLE ACCESS BY USER ROWID VIEW$ (cr=0 pr= 0 pw=0 time=0 us)' ===================== PARSING IN CURSOR #2 len=53 dep=0 uid=5 oct=3 lid=5 tim=1081509596893149 hv=19947518 57 ad='83144730' select count(*) from dba_objects where owner='SYSTEM' END OF STMT PARSE #2:c=10000,e=23685,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,tim=1081509596893143 BINDS #2: EXEC #2:c=0,e=126,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1081509596893432 WAIT #2: nam='SQL*Net message to client' ela= 2 p1=1650815232 p2=1 p3=0 FETCH #2:c=10000,e=18100,p=0,cr=1005,cu=0,mis=0,r=1,dep=0,og=1,tim=1081509596911618 WAIT #2: nam='SQL*Net message from client' ela= 172 p1=1650815232 p2=1 p3=0 FETCH #2:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1081509596912028 WAIT #2: nam='SQL*Net message to client' ela= 2 p1=1650815232 p2=1 p3=0 WAIT #2: nam='SQL*Net message from client' ela= 336 p1=1650815232 p2=1 p3=0 XCTEND rlbk=0, rd_only=1 STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=1005 pr=0 pw=0 time=1813 0 us)' STAT #2 id=2 cnt=447 pid=1 pos=1 obj=0 op='VIEW (cr=1005 pr=0 pw=0 time=6602 us)'
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
16
STAT #2 id=3 cnt=447 pid=2 pos=1 obj=0 op='UNION-ALL (cr=1005 pr=0 pw=0 time=5704 u s)' STAT #2 id=4 cnt=447 pid=3 pos=1 obj=0 op='FILTER (cr=1002 pr=0 pw=0 time=3912 us)' STAT #2 id=5 cnt=475 pid=4 pos=1 obj=0 op='NESTED LOOPS (cr=602 pr=0 pw=0 time=2679 us)' STAT #2 id=6 cnt=1 pid=5 pos=1 obj=22 op='TABLE ACCESS BY INDEX ROWID OBJ#(22) (cr=2 pr=0 pw=0 time=78 us)' STAT #2 id=7 cnt=1 pid=6 pos=1 obj=44 op='INDEX UNIQUE SCAN OBJ#(44) (cr=1 pr=0 pw=0 time=47 us)' STAT #2 id=8 cnt=475 pid=5 pos=2 obj=18 op='TABLE ACCESS FULL OBJ#(18) (cr=600 pr=0 pw=0 time=1651 us)' STAT #2 id=9 cnt=175 pid=4 pos=2 obj=19 op='TABLE ACCESS BY INDEX ROWID OBJ#(19) (cr =400 pr=0 pw=0 time=3674 us)' STAT #2 id=10 cnt=199 pid=9 pos=1 obj=39 op='INDEX UNIQUE SCAN OBJ#(39) (cr=201 pr=0 pw=0 time=1607 us)' STAT #2 id=11 cnt=0 pid=3 pos=2 obj=0 op='NESTED LOOPS (cr=3 pr=0 pw=0 time=41 us)' STAT #2 id=12 cnt=1 pid=11 pos=1 obj=22 op='TABLE ACCESS BY INDEX ROWID OBJ#(22) (cr =2 pr=0 pw=0 time=20 us)' STAT #2 id=13 cnt=1 pid=12 pos=1 obj=44 op='INDEX UNIQUE SCAN OBJ#(44) (cr=1 pr=0 pw =0 time=9 us)' STAT #2 id=14 cnt=0 pid=11 pos=2 obj=106 op='INDEX RANGE SCAN OBJ#(106) (cr=1 pr=0 p w=0 time=15 us)'
The three highlighted sections show the result of using dbms_application_info. For example, if a user wanted to analyze the performance of only the action demo/greeting (expressed in module/action format), he could do so by considering only the Oracle trace data following the second highlighted section and preceding the third highlighted section. The dbms_application_info package is what you need to meet requirements 16, 17, and 18. However, there’s one problem with using dbms_application_info. To execute it, your application has to make extra database calls. The calls themselves are pretty cheap (a few hundred microseconds apiece, which you can confirm if you know how to read the trace data in Exhibit 6). The expensive part is the network latency required to push the database calls around between the application program (the client in the Oracle client-server architecture) and the Oracle Database (called the server). In the example shown above, the SQL*Net message from client latencies that account for network round trip latencies between database calls are small, because the client and server processes were on the same server hosts. But imagine the application’s behavior if the two processes had been connected by a WAN. Suddenly, the introduction of extra parse and execute calls to execute dbms_application_info.set_module doesn’t sound so great anymore. Thankfully, Oracle Corporation has provided an elegant solution by enhancing the Oracle Call Interface (OCI) for version 10. The OCI is the API through which Oracle-based applications request services from the Oracle Database, like parsing, executing, fetching, and committing. The version 10 OCI provides a new set of attributes that an application can set before making application calls to the database.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
17
You can see the difference by comparing the old code path to the new code path, as shown here: /* Old, less scalable code path, using dbms_application_info */ Parse set_module('demo','greeting'). Execute set_module('demo','greeting'). Do the application’s real work. /* New, more scalable code path, using the new OCI attributes. */ Set the OCI_ATTR_MODULE attribute to demo. Set the OCI_ATTR_ACTION attribute to greeting. Do the application’s real work. The old code path has to make database calls to get its dbms_application_info work done before it can move on to its real work. The new code path simply sets two attribute values locally in the client process’s 3GL code, and then subsequent calls to the Oracle Database are “tagged” with the module and action attributes. This new OCI attributes should help Oracle perform even better against requirements 19 and 20. I expect—but I haven’t tested it yet—that using the new OCI attributes will cause the Oracle Database kernel to emit the appropriate ACTION NAME and MODULE NAME information into the appropriate places in its trace file (or files). It will take a while before the various Oracle language interfaces catch up to the new OCI definition. The Oracle JDBC driver will probably be among the first to incorporate the new feature (perhaps past tense is already appropriate for this sentence). The Microsoft ODBC driver will probably be among the last. I of course hope that the Perl DBI will use the feature sooner rather than later.
4 Conclusion You can’t manage what you can’t measure. For many applications, this means that you can’t manage performance. It can be easy to measure application performance if the application participates in the measurement. In this paper, I set forth a draft list of requirements that application performance instrumentation must meet. I showed an example of a 3GL program that meets most of these requirements with very little extra work for the developer and for the computer running the application. Finally, I showed how to instrument calls to the Oracle Database in such a manner that a user can get an end-to-end decomposition of a task’s total time consumption. By attacking the measurement problem intelligently in the application 3GL code, the application performance measurement problem goes away, yielding faster software that’s cheaper to diagnose and repair, which is good for customers and vendors alike.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
18
Further Reading Finnigan, P. 2004. “How to set trace for others’ sessions, for your own session, and at instance level.” http://www.petefinnigan.com Millsap, C. 2004. “How to activate extended SQL trace.” http://www.hotsos.com Millsap, C.; Holt, J. 2003. Optimizing Oracle Performance. Sebastopol CA: O’Reilly & Associates The Open Group. 1988. ARM 2.0 Technical Standard. http://www.opengroup.org/tech/management/arm/ A description of the “Application Response Measurement (ARM) API,” an application measurement system implemented in C and Java.
Acknowledgments I’m indebted to Juan Loaiza, the original architect of the timing instrumentation in the Oracle Database, to Matt Seaton for the lead on the Application Response Measurement project, to Karen Morton and Jeff Holt for helping me every day when I get stuck, to Gary Goodman for the stimulus to pursue the application instrumentation avenue of study, and to Larry Klein for his cooperation in the application instrumentation project at Hotsos.
About the Author Cary Millsap is a software developer, and instructor for Hotsos Enterprises, Ltd, a company that he co-founded in 1999. He is the author of Optimizing Oracle Performance, with Jeff Holt, a work for which the authors received the Oracle Magazine Authors of the Year award for 2004.
About Hotsos Hotsos Enterprises, Ltd. is a Dallas-based company with more than 600 customers and clients worldwide that use the company’s software products, classroom and on-site education, and consulting services to improve the performance of their Oracle systems. Hotsos’s revolutionary methods and tools help database administrators and application developers solve performance problems faster and more permanently than ever before. For more information on Hotsos products and solutions, visit www.hotsos.com or call +1.817.488.6200.
Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved.
19