DIAGNOSING J2EE PERFORMANCE PROBLEMS T H R O U G H O U T T H E A P P L I C AT I O N L I F E C Y C L E
Ta b l e o f C o n t e n t s Challenges for J2EE Applications and Performance
3
Typical J2EE Application Performance Problems
4
Alternative Technologies for Capturing J2EE Performance Data
5
Diagnosing Performance Problems with Mercury for J2EE
6
Summary
8
ABSTRACT MANY LARGE-SCALE AND COMPLEX ENTERPRISE APPLICATIONS ARE NOW BUILT AND DEPLOYED USING THE J2EE ARCHITECTURE. HOWEVER, MANY OF THESE APPLICATIONS SUFFER FROM POOR PERFORMANCE AND SCALABILITY BECAUSE THE FOCUS OF THE DEVELOPMENT PROCESS IS ON FUNCTIONALITY, WHILE PERFORMANCE AND SCALABILITY ARE DEALT WITH AS AN AFTERTHOUGHT.
THIS PAPER WILL PRESENT TECHNIQUES FOR DELIVERING HIGH PERFORMANCE APPLICATIONS TO PRODUCTION, MANAGING AND MEASURING THE PERFORMANCE OF APPLICATIONS, AND DIAGNOSING THE TOUGHEST J2EE PROBLEMS THROUGHOUT THE ENTIRE APPLICATION LIFECYCLE. THE PAPER WILL EXAMINE THE VARIOUS TYPES OF PERFORMANCE ISSUES THAT NEED TO BE DEALT WITH AT EACH STAGE OF THE LIFECYCLE AND WHAT DIFFERENT DIAGNOSTIC TOOLS AND TECHNIQUES CAN BEST RESOLVE THEM.
2
CHALLENGES FOR J2EE APPLICATIONS AND PERFORMANCE
Many applications have not been sufficiently
Today’s enterprises are choosing to build business
thorough consideration of design and usage
applications that leverage the power, portability,
patterns, and adequate attention to planning
and rapid development of Java technology.
and testing performance against well defined
J2EE offers many advantages to developers, but
service objectives. J2EE scalability capabilities,
introduces new challenges for the development,
while extensive, do not substitute for such
performance diagnosis, tuning, deployment, and
efforts. Later in the lifecycle, applications may
management of applications. Successful delivery
be pushed into production to meet deadlines,
and management of typically complex J2EE-based
with insufficient validation of performance or
applications requires evaluating performance
scalability and inadequate tools to see into
throughout the entire application lifecycle.
application internal behavior, making it difficult
An application may perform well in the development and QA environment, but fail to scale or may exhibit transient performance problems in production. It is important to understand the impact of the infrastructure in which the
architected for performance and scalability, with
and expensive to fix problems. Getting to the root of performance problems in the complex, distributed, and dynamic J2EE environment is truly a challenge.
application runs and the behavior of the many
Performance Evaluation and Diagnostic Needs Throughout the Application Lifecycle
application components as they interact under
During the application lifecycle, there are many
load. From the diagnostic perspective, it is
IT stakeholders in application performance. They
important to be able to isolate the problem by
include application architects, developers, load
tier of the application architecture, by application
testers and engineers in QA, application support
component, and to have progressive drill-down
staff, and site operations in production. They
visibility into J2EE performance problems, the
share common requirements for performance
J2EE environment, and into the actual code,
diagnostics, but also have specific roles and
with sufficient detail to determine the root
needs unique to their environments.
cause of the problems. There are additional factors that can increase the difficulties of application delivery and management. The deployment lifecycle for many
• In the design and specification phases, a topic outside the scope of this diagnostic paper, performance and scalability issues need to be considered, incorporated, and specific objectives set.
Web-facing J2EE applications is compressed, due to increased pressures for quick time to market. Boundaries between development, QA, deployment, and production stages and IT groups are blurred. Centralized IT organizations may be managing hundreds of applications,
• In the development and unit-testing phase, profiling tools can be useful to validate performance, along with functionality. Developers should test the performance of components against identified latency and system resource utilization criteria.
with little depth of each. IT staff skills for J2EE may not be developed enough.
3
• During quality assurance cycles, load testing typically follows integrated functional and regression testing. A complete application, including all interfaces with external systems, should be fully load tested prior to software release. Objectives include scalability and capacity estimation under load that realistically represents expected live usage, along with visibility into the internal performance behavior of the application and actionable data on bottlenecks. This should include transaction breakdown of latencies for each J2EE tier and method, along with additional specific root cause diagnostic information. Profiling tools used at the developer desktop cannot be used in QA during load testing for performance diagnosis, due to the high overhead they impose. J2EE diagnostic tools used at this stage need to be designed for load and should be integrated with load testing tools to boost productivity and testing effectiveness. • During staging or pre-production, deployment teams should determine the performance and scalability of the application in the specific IT environment and configuration that will serve live users, verify projected peak load capability, and tune the application and infrastructure to meet its designed transaction throughput and response time service levels. Load testing and performance diagnostic tools that can optimize both application and infrastructure, and drill down to solve problems, are essential during this stage. • In live production, operations and application managers need the ability to continuously monitor the health of the entire system, measure the performance of the application, and, if something starts to go wrong, be able to quickly alert, triage, isolate, and pinpoint the root cause of the problem. Needs include real-time monitors that provide drill-down visibility into the application, and in-depth, offline analytic tools for application support to dig into deep details of performance problems.
production problems in staging or QA test beds.
TYPICAL J2EE APPLICATION PERFORMANCE PROBLEMS
A toolset that provides drill-down from a broad
There are a wide variety of problems that can
user and business transaction view all the way to
surface during the application lifecycle. For
detailed internal diagnostics can be essential for
J2EE Web applications in production, user
eliminating persistent or recurring performance
experience of performance is affected by many
problems. The toolset should provide a combi-
external network infrastructure factors that are
nation of agentless infrastructure monitoring,
independent of application behavior. External
a. JDBC Connection pool size
low-overhead J2EE monitoring agents, and the
monitors can improve isolation of problems,
b. JVM Heap size
ability to narrowly target problem areas for deep
assisting triage. From the J2EE application
measurements. Such toolsets should be able
diagnostic perspective, it is essential to be
to diagnose the toughest problems, such as
able to capture and correlate specific external
intermittent slow methods, full transaction
parameters, such as HTTP arguments, that can
tracing including arguments, memory leaks,
drive performance problems in a J2EE method
synchronization, and deadlock issues.
or sequence of transactions. Specific latencies
It is sometimes difficult to reproduce tough
and parameter captures are also frequently Throughout the entire application lifecycle, J2EE-specific visibility and diagnostic capability should be integrated with and complementary to
needed to identify problems at J2EE interfaces
c. Memory problems, including memory thrashing and memory leaks d. Coding practices, such as using exceptions as a means to transfer control in the application 2. Application Server configuration problems:
c. Thread pool sizes 3. Architecture and design problems, with a wide range of issues, such as: a. Data marshalling problems resulting from filtering at the wrong tier b. Single-threading resulting from inadequate synchronization design in custom code
to external systems, such as backend databases, legacy systems, and packaged software.
An effective diagnostic toolset must provide capabilities and techniques to be able to isolate
multi-platform, multi-protocol tools for functional testing, load testing, and application performance
Within the J2EE environment, some of the most
and identify the root cause of each of these
management. Common tools and measurements
common problems include:
common problems, regardless of when they
help facilitate communication, while each team
1. Code problems:
emerge, from development through production. Capturing J2EE performance data sufficient for
member needs specific features and capabilities suited to their role.
a. Slow methods
solving this range of problems is a significant
• Consistently slow methods • Intermittently slow methods, related to specific user/data values driving problematic application behavior b. Synchronization problems, including both under synchronization and over synchronization for locks and threads
4
technical challenge.
ALTERNATIVE TECHNOLOGIES FOR CAPTURING J2EE PERFORMANCE DATA There are three types of measurement interfaces for capturing J2EE application performance data.
BYTE CODE INSTRUMENTATION
JVM
The first is the JMX interface provided by the
SAMPLING
application server vendors, exposing fixed, fairly
APPLICATION
coarse-grained performance counters from the
EVENTS
EJB Container. While useful for aggregated,
APPLICATION SERVER
high-level performance monitoring at very low overhead, diagnostic capability is quite limited.
AGGREGATION EVENTS
JMX JVMPI
The second is a set of highly detailed, dynamic data exposed by the JVM provider though JVMPI and JVMDI interfaces. These interfaces, used by code profilers, provide excellent diagnostic detail to developers, including specific instance transaction tracing, but at a heavy overhead
SIMPLE MONITORS
PROFILERS
Low overhead, fixed, coarse grained
Detailed view, high overhead, unsuitable under load
price that prevents their use in load testing or production. The third is byte-code instrumentation,
Monitoring focus: first-level diagnostics
Figure 1. J2EE Application Visibility: Alternatives for Capturing Performance Data
used by vendors of diagnostic tools for use in later stages of the lifecycle. of all common events well, but loses granularity Within byte-code instrumentation, several
for diagnosing problems related to specific data
techniques can be used to manage the balance
values and individual transaction events, such as
of diagnostic detail versus overhead. Sampling
a specific method invocation with a long latency.
a proportion of events from an event stream is
Total Trace refers to a unique technology to
a common technique, which effectively reduces
capture and record data for each event executed
overhead but loses the ability to reliably capture
within instrumented code. It provides the most
a number of problems. If a problem occurs in an
granular detail for diagnosing problems, such
event that isn’t sampled, or is due to a sequence
as a method that is only slow when processing
of events, sampling misses the problem.
a specific data value, a synchronization problem
Intermittent or sequence-related problems are
when transactions contend for a resource, or
particularly intractable with this technique.
memory problems, such as thrashing or leaks.
Aggregation captures and combines a sequence
Total Trace includes automated techniques
of events into a single recorded data value, such
to narrow the scope of instrumented code, to
as an average. This technique reflects the impact
manage overhead.
5
TOTAL TRACE
Deep Diagnostics: at low overheads
DIAGNOSING PERFORMANCE PROBLEMS WITH MERCURY FOR J2EE
to the root cause for the full range of J2EE
best suited to the performance needs of IT users
performance problems. Total Trace allows
through the lifecycle. Together, they provide the
Mercury for J2EE provides solutions for functional
capturing of every event (including arguments)
broadest and deepest performance testing, diag-
testing, performance testing, monitoring, and
at low overheads and is distinguished from other
nosis, and management capability available for
diagnosis of J2EE applications. It combines
transaction tracing techniques, which use either
J2EE applications. Though these products are
Mercury Interactive’s industry leading application
a threshold for collecting only slow events from
most effective when used together throughout
delivery and management solutions, including
the event stream or a ‘tag and follow’ mechanism.
the lifecycle, they are also available individually.
LoadRunner™ and Topaz™, with diagnostics
The latter techniques can show slow method
designed specifically for the J2EE environment.
instances, but cannot show details of synchro-
The solutions provide a common, consistent
nization problems or complete information for
foundation of shared assets, metrics, scripts,
diagnosing arbitrary memory allocation problems.
and monitors, as well as specific solutions for the differing requirements of application delivery and application management.
J2EE Transaction Breakdown for LoadRunner The J2EE Transaction Breakdown module is fully integrated with LoadRunner to expose and diagnose the most common J2EE performance
Mercury Interactive provides a combination and
problems under load testing. The same J2EE
choice of instrumentation techniques, to provide
transaction breakdown capability is included in
both breadth of diagnostic information at low
Topaz for J2EE, described in the following section.
Mercury for J2EE uses a combination of JMX,
overhead and unmatched depth, with manageable
aggregation, and unique Total Trace technology,
overhead. J2EE-specific solutions in the Mercury
which can capture data to diagnose all of the
suite include J2EE Transaction Breakdown, Topaz™
types of problems described previously. It is
for J2EE, and J2EE Deep Diagnostics. Each one
the only solution that can capture, under load,
complements the others in providing functionality
necessary and sufficient information to get
J2EE Transaction Breakdown first correlates enduser problems to the Web transaction tier, for fast isolation and resolution, then pinpoints the J2EE problems. J2EE Transaction Breakdown analyzes the TCP/IP stack to break down a transaction’s end-to-end response time to determine whether poor transaction time is
DELIVERY
caused by external network issues or by a specific
MANAGEMENT
server tier. It then guides users through an Performance Assurance Platform: LoadRunner , Quick Test Professional, Test Director
Mercury Interactive Management
intuitive top-down analysis process across multiple application layers, tracing business performance
Monitoring, Problem Identification:
problems from the end user all the way to the
Topaz for J2EE
for JSP/servlet, EJBs, classes, methods, and
problematic component. It shows latencies
JDBC connections, isolating and identifying slow components. Aggregated data capture is appropriate for monitoring continuously through
Mercury Interactive Foundation
a load test run. Common Diagnostics Platform: Transaction Breakdown
Deep Diagnostics
Shared Scripts, J2EE Probe Technology
Figure 2: Mercury for J2EE combines J2EE-specific diagnostics with application delivery and management solutions.
6
J2EE Deep Diagnostics J2EE Transaction Breakdown helps solve most of the configuration and slow method problems
End-to-End Transaction Response Time
that occur in load testing and live production. But a significant number of the more complicated J2EE performance problems will require deeper diagnostics. Complex problems, such as finding
Web Page Breakdown • DNS lookup • Time to Connect • Time to First buffer • Network time • Download time • SSL handshake • FTP authentication • Client time • Error time
Database Transaction A
the cause of thread deadlocks or objects that aren’t deallocated and removed by Java Garbage
App Server
Collection, are responsible for the majority of
Web Server
the time and cost spent in diagnosing the root Web Server Time Client
• Servlet • Method
App Server Time • EJB • Method • JNDI lookup
Database Time • JDBC • Connect • Execute • SQL Query
cause of performance problems. J2EE Deep Diagnostics uses Mercury Interactive’s Total Trace data capture, which captures every event in a selected area of the application with low overhead, by providing carefully tuned, predefined sets of byte-code instrumentation and a GUI for custom selection and automatic application of narrowly targeted instrumentation.
Figure 3. J2EE Transaction Breakdown provides response time statistics for all transactions.
It is not used for 24x7 production monitoring or continuous capture in load testing runs, but
Topaz for J2EE — Application Management and Diagnostics in Production
in application performance and availability.
Topaz for J2EE addresses the needs of both
components can optimize the total cost of systems
operations and applications support teams by
management. In addition, it can facilitate
combining real-time performance and availability
capacity planning by ensuring that purchases
monitoring along with J2EE-specific diagnostics.
of hardware and software are optimized.
Agentless monitoring for most infrastructure
with visibility across user, transaction, application component, and system tiers.
term, deep data capture when a tough problem is encountered. In production, a Topaz alert can trigger a deep diagnostic data capture for offline analysis. This stops the common frustration of persistently recurring production performance problems that cannot be reproduced in a test
Topaz for J2EE is designed specifically for business processes and their J2EE components
is an essential complement for focused, short-
Topaz for J2EE provides correlation of end-user
bed. The same J2EE Deep Diagnostics works
performance with root cause in the infrastructure
with LoadRunner when performance problems
and application layers. Its J2EE-specific
under load need better characterization for
diagnostics identify problems involving EJB
effective bug definition and rapid fixes.
The solution provides the industry’s broadest
components, methods, and JDBC calls with SQL
monitoring capabilities, including server,
statements. J2EE diagnostics integrated with
application, and system monitoring, real-time
Topaz for J2EE provides the same transaction
rapid triage functionality, and proactive alerting.
breakdown capabilities as LoadRunner, using
It allows 24x7 monitoring of the entire application
aggregated data capture, while optimized for
infrastructure, including the J2EE environment,
low-overhead, real-time monitoring, triage, and
from a single Web-based console. Topaz for J2EE
problem isolation, along with configuration tuning
also provides trending and correlation analysis
information such as heap utilization.
J2EE Deep Diagnostics provides multi-layer correlation of HTTP, servlets, JSPs, EJBs/objects, methods, and SQL calls. It correlates these internal component measurements with JMX and OS metrics to show problem details. It delivers graphical views of transaction traces, call chains, memory leaks and thrashing problems, synchronization details of locks held and threads
across all tiers, enabling continuous improvements
7
SUMMARY With Mercury for J2EE, enterprises can improve the readiness of their J2EE applications before they are placed into production, and dramatically accelerate problem identification, isolation, and resolution. Mercury for J2EE allows companies to reduce their costs of operations by providing a business process view into the application and by isolating and resolving problems before business users and the bottom line are impacted. The solutions can also help companies optimize their existing infrastructure and avoid unnecessary hardware or software expenditures. The choice of the appropriate application delivery and management tools is critical for guaranteeing
Figure 4. J2EE Deep Diagnostics root cause analysis for memory problems.
application quality and ensuring that serviceblocked, latency charts, and source code views
Additional drill-down screens show the lines of
of problem methods. J2EE Deep Diagnostics
code where objects are being allocated.
captures details of specific instances of method invocations, including parameters passed and individual latencies, to pinpoint the source of intermittent problems caused by specific user data values or usage patterns. An example of the root cause analysis capability is illustrated in Figure 4. This screen is showing memory problems with the Live Object Distribution report, selectively captured for any Java objects — not just Java collections, as provided by another vendor’s tool. It shows three methods that are probably leaking and one that is thrashing.
level agreements can be met. Mercury for J2EE is the industry’s first and most complete application delivery and management suite
For further examples and details on how
for the J2EE ecosystem that improves the
synchronization problems, intermittent slow
quality, performance, and scalability of J2EE
methods, and other J2EE performance problems
applications across the entire application delivery
are solved using the Mercury for J2EE solution,
and management lifecycle.
contact your Mercury Interactive representative. J2EE Deep Diagnostics provides the level of information expected from a code profiler, collected selectively from an application under heavy load that makes code profilers unusable. This results in unique application visibility for diagnosing the toughest J2EE performance problems in load testing, staging, and production.
8
Mercury Interactive —The Global Leader in BTO Software Mercury Interactive is the global leader in business technology optimization (BTO). Our Optane suite of enterprise testing, production tuning, and performance management solutions enable companies to unlock the value of their IT investments by optimizing business and technology performance to meet business requirements. With Mercury Interactive, customers can measure the quality of their IT-enabled business processes, maximize technology and business performance at every stage of the application lifecycle, and manage their IT operations for continuous optimization throughout the lifecycle. Our leading-edge BTO solutions, complemented by technologies and services from our global business partners, are used by over 30,000 customers—including 75% of the Fortune 500—to improve quality, reduce costs, and align IT with business goals. Mercury Interactive, Topaz, TotalTrace, and LoadRunner are registered trademarks or trademarks of Mercury Interactive Corporation or its wholly-owned subsidiaries, Freshwater Software, Inc. and Mercury Interactive (Israel) Ltd. in the United States and/or other countries. All other product and company names are used herein for identification purposes only, and may be trademarks of their respective companies. © 2003 Mercury Interactive Corporation. Patents pending. All rights reserved. WP-0???-0903
M E R C U R Y I N T E R A C T I V E C O R P O R AT E H E A D Q U A R T E R S 1325 Borregas Avenue, Sunnyvale, CA 94089 U.S.A. Phone: 408-822-5200 or 800-837-8911 www.mercuryinteractive.com