This document was uploaded by user and they confirmed that they have the permission to share
it. If you are author or own the copyright of this book, please report to us by using this DMCA
report form. Report DMCA
What you should know after this talk You should understand at a high level how the Java Virtual Machine (JVM) operates underneath your application server and the differences between the JVM’s on platforms
You should understand what the IBM J9 JVM is and what new features it provides to the runtime as well as when to use those features.
You will understand how different garbage collection schemes work and how to use them to effect your applications response times You will have a knowledge and understanding of some critical JVM tuning parameters that effect the runtime performance of the JVM
Lastly you will get to know which debugging tools are available for the JVM as well as understand when and where to use them most effectively. 2
JVM Basics Highest Level Overview Java is a Write Once Run Anywhere (WORA) 3rd generation Object Oriented programming language that is executed on a virtual machine The Java Virtual Machine (JVM) runs applications written in Java after the Java code has been compiled to bytecode via the javac process. The JVM in conjunction with other components performs optimization on your compiled Java code to attempt to make it as fast as native code The JVM performs automatic memory management (Garbage Collection) to ensure that system wide memory leaks do not occur and to allow for easier development by allowing developers not to explicitly have to perform memory management. There are multiple implementations of the JVM which all “should” execute any application written for the Java specification level that JVM was developed for. 4
JVM Basics Which JVM do I have?
The different platforms that WebSphere Application Server runs on have different JVM implementations in some cases
The IBM J9 JVM is the runtime environment on the following Operating Systems or Platforms 9 AIX, Windows, Linux (x86), Linux (PPC), iSeries, zSeries The Sun JVM is the runtime environment on all platforms running the Solaris Operating System
The HP JVM (which is a very simple Sun JVM port) is the runtime environment on all platforms running the HP-UX Operating System
5
JVM Basics The Overall Java Application Stack JVM is built using OO design. Building Block components providing higher level function for simplified end user development and runtime JVM’s core runtimes are developed in C or C++ and execute a large majority of function in native code 9 Garbage collector, MMU, JIT, etc 9 IO subroutines, OS calls The J2SE/J2EE APIs all exist at the Java Code layer. 9 Makes data structures available 9 Gives users access to needed function 9 Allow black box interactions with system 6
Java application code
Uses 1 of many possible configurations
Java calls
Pluggable components that dynamically load into the virtual machine
JNI
J2SE Virtual machine
JVM Profiler Debugger Realtime Profiler JIT
Thread model
Class loader
JNI
Interpreter Exception handler
JCL natives
Garbage collector
Port Library (file IO, sockets, memory allocation, etc.)
Operating system 7
Native applications
OS-specific calls
Calls to C libraries
JVM Basics Class loader basics
Uses parent-delegation model to allow additional security as well as allow users to implement their own class loaders 9 Loads classes in a hierarchical manner by delegating to the current class loaders parent recursively What does the class loader do? 9 Loading – Obtaining of the byte code from the java class file 9 Verification – Validates that the code inside the class file does not attempt to perform any illegal operations in the JVM and that the class file is well formed and not corrupt 9 Preparation – Performs allocation and initialization of storage space for static class fields. Also creates method tables and object templates 9 Initialization – Executes the classes initialization method and initializes static fields if your java class has them
8
JVM Basics JIT Basics
The just-in-time compiler (JIT) is not really part of the JVM but is essential for a high performing Java application 9 Java is Write Once Run Anywhere thus it is interpreted by nature and without the JIT could not compete with native code applications
The JIT works by compiling byte code loaded from the class loader when it is access by an application. 9 Due to different platforms having different JITs there is no standard method for when a method is compiled. 9 As your code accesses methods the JIT determines how frequently specific methods are accessed and compiles those touched often quickly to optimize performance
IBM Toronto Laboratory has a long history (30+ years) of expertise in programming language compilation and optimization technologies. 9 C, C++, Fortran, XML Parser, HPJ (statically compiled Java) 9 Language independent, interprocedural optimizers 9 Parallelization technology 9 Low-level compiler backends: optimizers, linkers, and code generators 9 Dynamic compilation: Java just-in-time (JIT) compilers Close relationships with: 9 Research : productizing innovative ideas and experimental technologies. 9 Hardware : understanding how to achieve the best possible performance with the underlying system and processor. 9 IBM Middleware : performance analysis Deep IP portfolio 9 Java JIT group alone filed 14 U.S. patents in 2004, 6 in 2005.
11
Overview of IBM’s J9 JVM What is the J9 JVM? Sun IP-free, but Java 2 (1.3) compliant (J2ME) and J2SE (1.4.2, 5.0) Highly configurable class library implementation Multi-platform 9 PowerPC, IA32, x86-64, and 390 (Linux or z/OS) 9 More applications than the above outside of the middleware space Flexible and sophisticated technology oriented to: 9 Performance (throughput and application startup) 9 Scalability 9 Reliability and Servicability (RAS)
12
Overview of IBM’s J9 JVM Scalability Garbage collector enhancements 9 Incorporates for the first time generational garbage collection Fine-grained locking of VM data structures Asynchronous compilation 9 Compilation of Java methods proceeds on a background thread • Other application threads do not have to wait to execute the method 9 Improves startup time of heavily multithreaded applications on SMPs Compile-time optimizations to remove contention 9 escape analysis, lock coarsening, … 9 architectural support to limit its effect Superior JIT (Just in time) compiler 9 Multiple optimization methods from application profiling to more intelligent and better code optimization algorithms
13
Overview of IBM’s J9 JVM Key Highlights for WAS
Superior Java application execution performance 9 Just-In-Time (JIT) compiler technology • Far improved over JDK 1.4.2 and Sun’s JIT • Maximized performance with minimized runtime overhead – multiple optimization levels, multiple recompilations of the same method, many new optimizations – dynamic observation of the execution of code via profiling to aggressively improve hot code – Interpreter profiling to adapt compilation to compiled methods for block reordering, loop unrolling, etc.
Memory Management / Garbage Collection Overview Garbage Collection (GC) - the main cause of memory–related performance bottlenecks in Java. Two things to look at in GC: frequency and duration 9 Frequency depends on the heap size and allocation rate 9 Duration depends on the heap size and Java heap footprint GC algorithm – it is critical to understand how it works so that tuning is done more intelligently. How do you eliminate GC bottlenecks 9 minimize the use of objects by following good programming practices 9 Set your heap size properly, memory-tune your JVM
JVM OS Interface Operating System Operating System Typical JVM Architecture 17
Java Heap
Memory Management / Garbage Collection Total Application Memory Footprint Application classes and objects Application Resources, etc.
• Fixed Cost - WebSpere Runtime - XML Parser - ORB, JCE Security - JMX - Classloaders, etc • User-Controlled - Thread pool - Connection pool - Monitoring/Logging - EJB cache (growable?) - Other cache (prepared statement, security, etc )
Application Footprint
WAS Footprint
• Application Dependent - number of classes - dynacache - security - resources - HTTP session size - WLM 18
JVM Footprint
Native Implementation (e.g. C++) Data Structures, Code, Runtime artifacts, OS Interface
Memory Management / Garbage Collection What factors effect memory performance the most
Memory management – how efficient does the system manage memory ? Total available memory – is there enough memory to satisfy every request for memory ? Allocation Rate – how often does the application requests for memory ? Object Size – how big are these objects ? Object Lifetime – how long do these objects stay reserved by the application ?
19
Memory Management / Garbage Collection Parallel VS Concurrent Collectors
Parallel Collectors – two or more threads run at the same time to perform garbage collection 9Still uses the “stop-the-world” model but instead of only one GC thread, there are helper threads as well. Concurrent Collectors – collector threads are triggered to run while applications are running 9Does not use “Stop-the-world” but threads can be asked to perform garbage collection once in a while
20
Memory Management / Garbage Collection What garbage collection algorithms are available on my JDK?
IBM J9 JDK Platforms 9 Memory management is configurable using four different policies with varying characteristics 1. 2. 3. 4.
Sun/HP JDK 5.0 Platforms 9 Garbage collector always Generational but implementation is chosen based on class of system out of the box 1. 2. 3.
21
Optimize for Throughput – flat heap collector focused on maximum throughput Optimize for Pause Time – flat heap collector with concurrent mark and sweep to minimize GC pause time Subpool – a flat heap technique to help increase performance on multiprocessor systems , commonly greater than 8. Available on IBM pSeries™ and zSeries™ Generational Concurrent – divides heap into “nursery” and “tenured” segments providing fast collection for short lived objects. Can provide maximum throughput with minimal pause times
Serial – Collects objects one at a time in both new and old generations Throughput - Uses a parallel model for collecting objects in the new generation Concurrent – Uses parallel collection in the new generation and concurrent in old.
Memory Management / Garbage Collection How the IBM Mark and Sweep Garbage Collector Works
Wilderness Thread B Used HeapHeap Thread Local
Stack
Garbage Used Heap Collector Global Heap Heap lock
Used Heap
Thread A Thread Local Heap
Stack
System Heap (JDK 1.4.2)
22
Thread Local Heap
Memory Management / Garbage Collection How the IBM J9 Generational and Sun/HP Garbage Collectors Work
JVM Heap Nursery/Young Generation
IBM J9: -Xmn (-Xmns/-Xmnx) Sun: -XX:NewSize=nn -XX:MaxNewSize=nn -Xmn<size>
Old Generation
IBM J9: -Xmo (-Xmos/-Xmox) Sun: -XX:NewRatio=n
Permanent Space
Sun JVM Only: -XX:MaxPermSize=nn
• Minor Collection – takes place only in the young generation, normally done through direct copying Æ very efficient • Major Collection – takes place in the old generation and uses the normal mark and sweep algorithm 23
Java Objects in the heap are in most cases moveable 9 In other words they are not tied to a single space in memory Some objects in the heap however cannot be moved either permanently or temporarily 9 Known as “pinned objects” What does J9 do to prevent fragmentation 9 With the addition of new garbage collection strategies as well as a new runtime memory management unit, pinned objects can be moved during compaction and accounted for in a much better manner with JDK 5.0 thus nearly eliminating the fragmentation problem seen in JDK 1.4.2 for the most part.
Tuning the JVM properly is a process that takes time and must be tailored to your application. 9 HOWEVER you can typically get 80% of the maximum performance with 20% of the work by ensuring that you are making good choice on a few key settings 9 To truly extract maximum performance from your application you must know your applications memory allocation and runtime needs The JVM must be tuned in two iterative steps over a testing cycle 9 Step 1: Heap Size tuning 9 Step 2: Applying runtime optimization 9 Applying these two steps repeatedly will lead you to a JVM tuned for your application
26
Runtime Performance Tuning Key Parameters The key setting for the IBM JVM that effects performance most on all Java application and should get you near 80% of your maximum performance if set correctly is: 9 Heap Size (-Xms / -Xmx) 9 Ensure that you are setting your minimum and maximum to values that are under you physical memory limitation but allow you to have a substantially large interval between GC’s • Typical low end bound on frequency of GC’s is 10sec • Typical high end bound on duration of GC’s is 1-2sec For the Sun/HP JVM a lot more work is required to get optimal performance than just tuning the heap size as you need to tune the garbage collector and runtime as well 9 A new JVM setting was introduced in JDK 1.4.1 that for Sun has shown promise in automatically tuning the rest of heap settings for your machine • -XX:+AggresiveHeap is issued at the command line and it makes decisions on GC algorithms, Young/Old Generation spaces, and other resources to use. 9 One must also issue the –server parameter to the Sun/HP JVMs to get them to run in their highest performing mode. 27
Runtime Performance Tuning What GC Policy should I choose for the J9 JVM?
I want my application to run to completion as quickly as possible. 9 -Xgcpolicy:optthruput My application requires good response time to unpredictable events. 9 -Xgcpolicy:optavgpause My application has a high allocation and death rate. 9 -Xgcpolicy:gencon My application is running on big metal and has high allocation rates on many threads. 9 -Xgcpolicy:subpool
28
Runtime Performance Tuning Real world examples WebSphere 6.1 - Trade 6
Some WebSphere applications perform better with Generational – however some applications degrade in performance.
120 100 80 60 40 20 0 optthruput
Customer may still be interested in generational if it delivers lower GC pause times.
Numbers are approximate and only intended to show a general behaviour seen when running Trade6 compared to SPECjAppServer 29
Runtime Performance Tuning Other IBM JVM Tuning Parameters -Xgcthreads
(default is n-1 for n processors)
-Xnoclassgc
- turns off class garbage collection (default is false)
-Xnocompactgc
- turns off compaction which can lead to fragmentation (default is false)
-Xoss<size>
- set the max Java stack size of any thread
-Xss<size>
- set the max native stack size of any thread
-Xlp
- enables large page support on supported Operating Systems
-Xdisableexplicitgc - turns System.gc() calls into no-ops -Xifa: - enables the Java code to run on z/OS zAAP processors -Xmaxe / -Xmine
30
- sets the maxium or minimum expansion unit during allocation
Runtime Performance Tuning What GC Policy should I choose for the Sun JVM?
I want my application to concurrently with a lot of other JVM’s (hoteling). 9 Use default serial collector as the GC algorithm is single threaded I need my application to perform good on a large number of processors. 9 -XX:+UseParallelGC I need my application to return near constant response times on machines that have a large number of processors. 9 -XX:+UseConcMarkSweepGC I need my application to return near constant response times on machines that have a small number of processors. 9 -XX:+UseTrainGC
31
Runtime Performance Tuning Other Sun/HP JVM Tuning Parameters
32
-Xincgc
- incremental GC, uses the Train algorithm (default is disabled)
-Xnoclassgc
- disable class garbage collection
-Xss
- set the stack size of each thread (512K)
-XX:+DisableExplicitGC - no System.gc() will be executed
-XX:TargetSurvivorRatio - sets threshold in survivor space for promotion to kick in
-XX:+UseAdaptiveSizePolicy - JVM determines good size for Eden, Survivor Spaces
-XX:+UseISM
-XX:+UseMPSS (used only for Solaris 9)
-XX:+AggressiveHeap -maximizes heap size and algorithms for speed
-Xoptgc
- allows for bigger pages (4MB)
- optimizes GC in Young Generation (HP only)
Runtime Performance Tuning How to tune a generation GC setup – General
We need to consider the respective size of the nursery and the tenured space. Two approaches 9 Dynamic • Specify the mininum and maxiumum heap size (e.g. –Xms512m – Xmx1024m) and in the Sun JDK case -XX:+AggressiveHeap • The JVM will dynamically size the nursery and tenured space. • May not give optimal performance • Could be good for low response times. 9 Fixed • Be more specific on the nursery and/or tenured space sizes. • Recommended approach for performance sensitive, server-side applications.
33
Runtime Performance Tuning How to tune a generation GC setup – Setting the tenured/old space
The tenured space must be large enough to hold all persistent data of the application. Too small will cause excessive GC or even out of memory conditions. For a typical WebSphere Application Server application this is ~100400Mb. One way to determine the tenure space size is to look at the amount of free heap exists after each GC in default mode 9 %free heap x Total heap size Analyse verbosegc to understand how frequently the tenured space gets collected. 9 An optimal generational application will never have a collection in the tenured space. 9 In the lab some WAS applications collect every ~15min. 34
Runtime Performance Tuning How to tune a generation GC setup – Setting the nursery/new generation space
Large nursery Æ “good for throughput” Small nursery Æ “good for low pause times” Good WebSphere performance (throughput) requires a reasonable large nursery. • A good starting point would be 512 megabyte. • Move up or down to determine optimal value. – Measure throughput and/or response times Analyse verbosegc to understand frequency and length of scavenges.
35
Tuning Heap size options Fix both nursery and tenured space -XmnAm
-XmoBm
Nursery 0
Tenured A
0
B
Allow them to expand/contract -XmnsAm -XmnxBm
A
36
-XmosBm -XmoxCm
B
C
Runtime Performance Tuning Process for tuning heap settings Set your Performance Requirements
Start
Adjust mx and/or ms and possibly switch GC policy
Give your best estimate for mx and ms
Stress test your application
No Profile objects If needed
GC profile is good?
Yes Done. 37
Analyze GC behavior
Tips Tips
Runyour yourapplication, application, ••Run analyzeheap heapusage usage analyze anddetermine determinethe the and steadystate. state. steady Setyour your ms ms to tothe the Set steadystate. state. steady Makesure sureyour yourheap heap ••Make neverpages. pages.Monitor Monitor never yourpaging pagingactivities. activities. your ruleof ofthumb thumbisisto to ••AArule keep30% 30%of ofyour your keep heapfree freemost mostof of heap thetime. time. the
Runtime Performance Tuning Process for other runtime tuning settings Set your Performance Requirements
Start
Set your baseline tuning parameters
Stress test your application
Apply new Tuning parameter To runtime
No Remove tuning Parameter if Negative effect
Is throughput as expected?
Yes Done. 38
Analyze Runtime behavior
Tips Tips
Measurethroughput throughput ••Measure duringsteady steadystate stateof of during thebenchmark benchmarkto to the ensureconsistent consistent ensure results results Sometuning tuning ••Some
parameterswill willeffect effect parameters performancenegatively negatively performance asthey theymight mightbe be as targetedfor foran an targeted applicationwith withdifferent different application runtimecharacteristics characteristics runtime thanyour yourapplication application than Addonly onlyone onetuning tuning •• Add parameterat ataatime timeto to parameter measureits itsimpact impactalone alone measure
Your most indispensable tool Æ directly from the JVM runtime
Enabled by issuing –verbose:gc on the java command line
Pros 9 can give a lot of detailed low-level information for serious debugging, enough for initial investigation 9 readily available and it is free Cons 9 Have to restart your server Æ not suitable for production environments 9 does not give object-level information for further analysis
Debugging Tools IBM JDK Debugging/Analysis Tools Thread dumps 9 Available on all JVM’s by issuing kill -3 on the command line where the is your servers process id 9 In essence a snap shot in time of what your system is executing. Used to debug and find where threads are spending time in your system, or are hung in your system Heap dump 9 Can be enabled to occur with a thread dump by setting the following JVM properties • Click on Application Server -> server1 -> Process definition -> custom properties -> • Enter Name = IBM_HEAPDUMP • Value = true • Enter Name = IBM_JAVA_HEAPDUMP_TEXT (this enables generating heapdump in txt format, which can be analyzed using heaproots) • Value = true 9 Can be analyzed using HeapRoots at http://www.alphaworks.ibm.com/tech/heaproots 43
Debugging Tools IBM JDK Debugging/Analysis Tools
Class loader runtime diagnostics 9 -verbose:class – Gives you information about which classes are loaded 9 -Dibm.cl.verbose= - Gives you specific information about how a class name you define is attempted to me loaded. Runtime Performance Analysis 9 A variety of third party tools will hook up to the IBM JVM to provide runtime level profiling • Jprobe, Jprofiler, etc 9 Hprof if built into the JDK as a profiler but is limited in function however still good for debugging simple unit test case performance issues.
44
A few VERY useful URLs
http://www-106.ibm.com/developerworks/java/jdk/diagnosis/ 9 Contains all the diagnostic guides for our JVMs 9 PDF on GC and Memory usage http://java.sun.com/docs/performance 9 Contains a large amount of documentation and tuning for the Sun JVM 9 Reference to all SUN JVM flags as well as an explanation of them http://www.hp.com/products1/unix/java/infolibrary/index.html 9 Wealth of information on tuning and configuring the HPUX JVM
45
Thank you Any questions ?
46
Backup and Extras
47
JVM Basics The high level JVM Building Blocks – Part 1
IBM JVM
Core Interface – Encapsulates all interactions with user, external programs and operating environment
Execution Management – Provides process control and management 9 Threading engine resides here
48
Execution Engine - – Provides all methods of executing Java Byte Code both compiled and interpreted.
JVM Basics The high level JVM Building Blocks – Part 2
IBM JVM
Diagnostics – Encapsulates all debug and diagnostic services in the JVM
Execution Management (XM)
9 Tracing, FFDC, RAS, Debug APIs
Execution Engine (XE)
Core Interface (CI)
Diagnostics (DG)
Class Loader – Provides support for loading and unloading of Java binaries 9 Performs loading, validation, initialization, and implements methods for reflection APIs 49
Data Conversion – Supports
Data Conversion
Class Loader (CL)
JVM Basics The high level JVM Building Blocks – Part 3
IBM JVM
Lock – Provides locking and synchronization services
Core Interface (CI) Execution Management (XM)
Storage – Encompasses all support for storage services the JVM needs 9 Heap management, and allocation strategies
Execution Engine (XE) Diagnostics (DG)
Class Loader (CL)
Data Conversion
Lock (LK)
Storage (ST)
50
HPI – A set of well defined functions that provide low level facilities and services in a platform neutral way. 9 This interface is defined by
HPI
Memory Management / Garbage Collection How the Sun/HP Garbage Collector Works – Part 2 -XX:SurvivorRatio=nn