Performance Testing: Step on it! by Nadine Pelicaen
Introduction The quality attribute ‘performance’ is often overlooked in the design phase of a system. The result is that response time for both the client and the server goes below user expectations. In addition, system resources become a major problem. It is critical to know whether your multiclient system is performing within welldefined standards under varying loads and how the system holds when running under extreme conditions. This paper describes the different types of performance testing and the role they play in risk management. It also gives some hints and tips on how certain performance testing techniques can be incorporated in the overall test approach to detect the bottleneck in your system. The paper also discusses how to choose the right tool for performance test automation for websites as well as for client/server systems with multi-tier architecture. Performance testing helps you to determine if response times and throughput live up to the expectations, and lets you detect bottlenecks in the system to let it keep pace with the user. To clarify these aspects, we use the 7 W’s: Why, Who, What, Way, When, With and Where.
Why?– The role of Performance testing in the overall business process
March 2002
General Business Benefits Performance testing is to be part of the overall business process because its benefits can be significant. Why should you do performance testing? First, your end users expect an error-free and fast-operating application. Especially in the e-business world, the organization has to reach the required Quality Of Service. Second, incorporating performance testing in the test process guarantees the stability of the system when projected business-growth scenarios are realized. When the traffic grows, the budgets will have to be adjusted based on calculations made during the performance test. Risk Management Performance testing is also a critical part of risk management, especially when deploying mission critical software, with significant associated revenue. First, the costs associated with the downtime of the software application can be tremendous. Next to this, the potential loss of future business has to be taken into account. When a user is unable to get the expected service efficiently, he’ll take his business elsewhere. Second, performance testing has proven its use in the deployment of web sites. If a web site goes down, it’s a public event. In other words, the company’s image is at stake!
http://www.testinginstitute.com
Third, testing should not be a one-time effort just prior to deployment. If major design flaws are discovered late in the development process, this may take development back to square one, with heavy fixing costs and delayed deployment as a consequence. All the above arguments have to be considered when calculating the Return On Investment. Since the investment associated with performance testing is high, you only want to go through with it if the cost of possible failure of your application is higher than the estimated cost of the test. Service Level Agreements In a world in which business based on ASP models is getting increasingly popular, many companies are bound by Service Level Agreements (SLAs). By setting SLAs, you guarantee that certain actions can be done in a specific time frame. As such, it is crucial for the provider to get an idea of the performance and availability he can manage. The provider has to avoid violated tolerance levels and complaining customers.
Who?– The people Involved To successfully manage the risk using automated performance testing, an interdisciplinary project group is required: • Project Leader: He/she is essential to coordinate all the efforts, gather people
Journal of Software Testing Professionals
19
TERM
DEFINITION
What?– The Measurements Load Testing
Load testing is applying a particular load on the system to observe the software’s behavior and ability to perform as required. It is useful to provide insight into how the application behaves under expected peack conditions and to get an idea of the average response time your users will get in the real world.
Stress Testing
Stress testing refers to running the application under extreme conditions to detect the system’s breaking point. This kind of testing is used to check deterioration effects, like connectivity problems and diminished shared resources.
Scalability Testing
Scalability testing is based on applying an increased load to determine if the application scales gracefully and to define the maximum traffic that the system can reliably handle. This test type is used to determine the usage level limitations, from the angle of capacity planning. Table 1 - Types of performance testing
and resources and keep track of the budget. • IT: This is essential because you want to test the complete architecture (hardware, OS, software, middleware, database, network, etc.), to detect all possible bottlenecks. The system administrator is also required for backup/restore of databases and other files that will be messed up during test execution. • DBA: Whenever a database is involved, you will need the DBA to tune the settings on the database side to check the implications on performance. • Developers: People who are involved in the development of all tiers are needed (client, business logic server, database server, middleware, etc.). • Tool Specialist: Since it is likely you will be using an automated test tool, you need someone who knows this tool in detail; accurate configuration is essential and creating good, maintainable scripts isn’t easy. Test expert: You need the creator of the test plan and the associated test designs for the performance quality attribute. • Analysts: Analysts are needed to interpret graphs, reports and all kinds of statistical data that you retrieve at the end. • Configuration Manager: Close collabo-
22
ration in this field is vital because you want to know that you are testing the version that will go into production. Moreover, you might want to put all your test-related data in a version tracking system in view of benchmarking. • Marketing Manager: Input from the marketing department is crucial since you want to simulate expected usage patterns and help to decide on the quantity of virtual users. The marketing department might also give you information concerning predictions of future use, oncoming marketing campaigns, and similar developments. Real users: They can be of great help when scripting their behavior using automated test tools.
Journal of Software Testing Professionals
In order to come to the metrics that we want to obtain through performance measurements, we first have to get some terminology straight. There are different types of performance testing with their own specific goal and interests. Types of Performance Testing Performance testing is a loose term and the different types are distinguished based on the objective of the test. The most commonly used definitions are represented in Table 1. Measurements and Metrics The following is a non-exhaustive list of possible measures and metrics: • Throughput (expressed in requests processed per unit of time) • Average response time • Number of concurrent users when degradation starts (a slow down is noticed) (figure 1) • Number of concurrent users when degradation is complete (system fails) • Errors reported (number + type + rate) • For web sites • Number of hits • Number of failed hits • Number of page views • Page download speed The following things are harder to express in absolute terms but should also be checked:
Fig 1.Graph showing response time under load [7]
http://www.testinginstitute.com
March 2002
• Accessibility of the application (e.g. can you still access the menu’s) • Session independence (users should not compromise each other’s work) You can also check the following resource information, if applicable: • • • • • • •
Memory usage Average CPU load Number of database connections Connection time Network saturation Disk utilization Bandwidth information (average Kbytes received and sent per time unit) • DNS lookup time • HTTP server connection time You should decide which measurements are relevant and try to get those with sufficient detail. Next to this, you need to register a baseline to compare performance results before and after an alteration of the software or a system component.
Way?– Approach for Web Sites and Client/server Most of the things said in this article are valid for the testing of web sites and standard multi-tier applications. Both types of architecture are to be tested for scalability and the capacity constraints have to be defined. However, there are significant differences, which impact the way you should handle performance testing.
Most of the things said in this article are valid for the testing of web sites and standard multitier aplications....However there are significant difference, which impact the way you should handle performance testing. Some other, non technical factors, also play a role in web site performance testing: • An unpredictable audience with diverse needs. • Higher demands since the user will only wait a number of seconds before leaving. • A higher risk because of the high visibility when something goes wrong. • Unpredictable transaction volumes, which result in less control over the application. Enterprise client-server systems, on the other hand, can suffer from performance problems because overhead is created when hopping from one architectural layer to another. Distributed processing is complex and creates overhead because of the delays between processes running on different machines.
When?- Incorporation in Web site performance engineering has to the Overall Test Strategy take into account some extra technical issues:
An overall structured testing approach, with integrated performance validation, is advised to identify problems.
• When a security layer is added to transactions, this impacts on response times. • To have representative numbers, measurements have to be taken at different times during the day and from different remote locations. • The delivery time of dynamic pages is a magnitude slower than delivery of static pages.
Requirements You should already be aware of performance issues when gathering the user requirements, so explicit requirements on this subject have to be stated. They have to be related to the measurements that will be taken later on: number of users, throughput, response time, etc.
March 2002
http://www.testinginstitute.com
Design The end result of an application is often of poor performance quality, because the quality attribute “performance” was overlooked during the design phase. This is the phase in which you can prevent key bottlenecks from coming to life and win a lot of time when the application is about to be deployed. Scalability is an issue that is to be addressed here. Development In all stages of development, performance engineering should be applied, along with ongoing performance analysis and tuning. Allocate enough time to start performance testing early in the development process, is the only way to avoid scalability surprises at the end. Test Preparation During test preparation, a comprehensive test strategy is advised where performance testing is integrated in a coherent future proof test plan. If you do it by trial and error, you might as well not do it. The largest part of the scalability testing will be done in the system test phase. It is better, however, if you can start earlier on a prototype. This is in order to determine the viability of the system that will be built and to try out some automated test scripts. The test strategy should not solely be focused on the application under test. The test should be designed in such a way that the involved hardware components are stressed as well. This means testing the machines (processors, memory, disks, etc.) and the network (i.e. routers, firewalls, switches).
Journal of Software Testing Professionals
23
Test Execution Test execution needs an iterative approach: found issues have to be fixed, the tests ran again to check if the solution works and to hunt for new bottlenecks. Some system tuning or a small source code optimization can have a serious impact on the measurements. Iterations will be necessary to obtain something that corresponds to the scalability requirements. In this phase, some benchmarks that can be used in the future will be established, to check that changes in the application or the environment do not result in degradation. Maintenance The existing tests should be rerun regularly, especially after architectural modifications or the addition of new features. Therefore, it is essential to have four things: • A set of automated regression tests, so that repeatability can be guaranteed. • Test data that goes with it. • Test environment, whether it is the same one that was used to do the performance testing before deployment, or the production environment itself. • Benchmarks obtained in passed runs, to compare your measurements against.
With? -The Tools The Basics Most of the load testing tools have as paradigm that virtual users are generated that imitate the business processes performed by real users. There is a central point of control (master controller), and several distributed workstations (agents) that drive a large number of virtual client applications. To represent realistic scenarios, the agents are capable of running a number of scripts (that you have to create) that are parameterized to test with different sets of data. Tool Selection Selecting an appropriate tool is not an easy task. Here is what has to be kept in mind:
24
Fig 2.Automated performance test of Web sites [6]
The abilities of the tool under investigation must be evaluated against the needs of your software application and the system architecture. Check for (in)compatibilities with the technology and the platforms used, but also for the complexity of installation and use. • In some cases, a set of tools will be required, especially when working with a complex architecture. One of the tools must be able to emulate real transactions. • Make sure that the tool does not become the bottleneck! When the total number of virtual users becomes too big, the master controller might have difficulties to cope with all the data and become the bottleneck.
Journal of Software Testing Professionals
• The cost does not only come from the tool licenses, but also from the training costs, the time spent on writing decent scripts, the test platforms, the debugging time in order to resolve the bottlenecks, not to mention possible human resources who are blocked in their job because of the performance tests being run. • Check if the virtual users will consume the same amount of processor time and memory as the application (e.g. a browser) that is being mimicked. • Check for the amount of intrusiveness of the tool (for instance overhead, effect of running the tool itself on the reported response times,…). • What are the facilities the tool provides for the analysis phase? Does it provide detailed and reliable results in a straightforward way? Does it produce real-time graphs and reports enabling you to easily detect the bottlenecks? • When selecting a tool for Web site performance testing (figure 2), this tool should be able to emulate the browser features, e.g. optional caching and possibility to select the type of browser. The conclusion can be that there is no commercial tool available for your operating system, your protocols or whatever it is that makes your architecture special and not the type that belongs to the market selected by the tool vendors. In that case, you will have to develop custom code and scripts yourself, if this is technically feasible.
http://www.testinginstitute.com
March 2002
Scripting The scripts should represent the real world as best as possible: • The recording of the fundamentals of the script is done by manually carrying out the desired scenario. The recorded script is then modified, by parameterization and by e.g. adding timers, synchronization points, parsing functions and error handling. • Delays between execution transactions should be realistic (except in the case of stress testing when the delays are removed). These delays represent the time the user needs in his decisionmaking process. • Scripts must include all the possible transactions that users can initiate. For web site visitors different patterns can be seen: people that e.g. browse, buy, or register. But also for enterprise client/server systems, categorization is necessary: people that take care of the input, that retrieve the existing information, that start batch jobs… • Scripts should be repeatable and scalable. • The test data that is used as input for the scripted scenarios should be a mix of small and large volumes. For maintenance purposes, it is best to import this test data from an external source, like a spreadsheet. • Upfront, unambiguous pass/fail criteria must be defined. Analysis At the end of the test, measurements are consolidated and prepared for analysis. This includes comparing the measurements to the baseline measurements gathered during the benchmarking phase. Graphs and reports will have to be interpreted, as well as error and log files. The main part, however, is to pinpoint the bottlenecks and possible scalability issues and to fine-tune your system to cope with these. It is crucial to alter only one thing at a time; otherwise you can’t know what alteration impacted the performance.
March 2002
Where? The Test Environment Having the right test environment is a fundamental (but also expensive) asset for the success of your performance testing project. The main criterion is that it should portray the real production environment. Often, emulating the production environment as closely as possible will have to suffice. One solution, for instance, is to use a scaled down version of the production system (e.g. 5 instead of 20 Web servers) and extrapolate the results. Using the real production environment is not a good solution. First, you can’t afford the downtime caused by breaking the system. Second, you want to play around with the environment to see the impact of some modifications on the measurements. Third, you don’t want to share your environment with others. Not only might they influence your results, but most likely they will be blocked because of your experiments.
Conclusion Performance testing appears to be a combination of science and art which can be crucial for your business, since poor application performance might result in revenue losses and lost opportunities. The right combination of a tool, a test environment, a good test strategy and the right people, is needed to accurately predict the system behavior and performance.
International conference on software testing analysis and review – 2000 – Orlando [5] Design for scalability – IBM high volume web site team – December 1999 [6] Minimize Risk with Proactive Performance Testing - Bill Jaeger (E-business advisor magazine): http://www.advisor.com/Articles.nsf/aidp/JAEGB01 [7] Load Testing to Predict Web Performance – White paper – Mercury Interactive: http://www-heva.mercuryinteractive.com/resources/library/whitepapers/load_testing/
About the Author: Nadine Pelicaen is a Test Consultant in structured software testing at ps_testware. She has a university degree in informatics and more than 12 years of experience as software developer, QA engineer and manager of a software department. Nadine has experience in Test Assessments and the organization of Acceptance Testing of large software applications.
Performance testing appears to be a combination of science and art which can be crucial for your business,
References [1] Getting Started with Rational Suite PerformanceStudio, Rational manual [2] Ensuring the performance of hosted Ebusiness applications - white paper, Mercury Interactive website [3] Automated performance testing – the importance of planning, David Goldberg, ImagoQA, article in Professional Tester March 2001 [4] The science and art of web site load testing – Alberto Savoia – presentation
http://www.testinginstitute.com
since poor application performance might result in revenue losses and lost opportunities.
Journal of Software Testing Professionals
25