Agile Testing Monday, February 28, 2005


Performance vs. load vs. stress testing Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms interchangeably, but they have in fact quite different meanings. This post is a quick review of these concepts, based on my own experience, but also using definitions from testing literature -- in particular: "Testing computer software" by Kaner et al, "Software testing techniques" by Loveland et al, and "Testing applications on the Web" by Nguyen et al. Update July 7th, 2005 From the referrer logs I see that this post comes up fairly often in Google searches. I'm updating it with a link to a later post I wrote called 'More on performance vs. load testing'. Performance testing The goal of performance testing is not to find bugs, but to eliminate bottlenecks and establish a baseline for future regression testing. To conduct performance testing is to engage in a carefully controlled process of measurement and analysis. Ideally, the software under test is already stable enough so that this process can proceed smoothly. A clearly defined set of expectations is essential for meaningful performance testing. If you don't know where you want to go in terms of the performance of the system, then it matters little which direction

you take (remember Alice and the Cheshire Cat?). For example, for a Web application, you need to know at least two things: •

expected load in terms of concurrent users or HTTP connections

acceptable response time

Once you know where you want to be, you can start on your way there by constantly increasing the load on the system while looking for bottlenecks. To take again the example of a Web application, these bottlenecks can exist at multiple levels, and to pinpoint them you can use a variety of tools: •

at the application level, developers can use profilers to spot inefficiencies in their code (for example poor search algorithms)

at the database level, developers and DBAs can use databasespecific profilers and query optimizers

at the operating system level, system engineers can use utilities such as top, vmstat, iostat (on Unix-type systems) and PerfMon (on Windows) to monitor hardware resources such as CPU, memory, swap, disk I/O; specialized kernel monitoring software can also be used

at the network level, network engineers can use packet sniffers such as tcpdump, network protocol analyzers such as ethereal, and various utilities such as netstat, MRTG, ntop, mii-tool

From a testing point of view, the activities described above all take a white-box approach, where the system is inspected and monitored "from the inside out" and from a variety of angles. Measurements are taken and analyzed, and as a result, tuning is done. However, testers also take a black-box approach in running the load tests against the system under test. For a Web application, testers will use tools that simulate concurrent users/HTTP connections and measure response times. Some lightweight open source tools I've used in the past for this purpose are ab, siege, httperf. A more heavyweight tool I haven't used yet is OpenSTA. I also haven't used The Grinder

yet, but it is high on my TODO list. When the results of the load test indicate that performance of the system does not meet its expected goals, it is time for tuning, starting with the application and the database. You want to make sure your code runs as efficiently as possible and your database is optimized on a given OS/hardware configurations. TDD practitioners will find very useful in this context a framework such as Mike Clark's jUnitPerf, which enhances existing unit test code with load test and timed test functionality. Once a particular function or method has been profiled and tuned, developers can then wrap its unit tests in jUnitPerf and ensure that it meets performance requirements of load and timing. Mike Clark calls this "continuous performance testing". I should also mention that I've done an initial port of jUnitPerf to Python -- I called it pyUnitPerf. If, after tuning the application and the database, the system still doesn't meet its expected goals in terms of performance, a wide array of tuning procedures is available at the all the levels discussed before. Here are some examples of things you can do to enhance the performance of a Web application outside of the application code per se: •

Use Web cache mechanisms, such as the one provided by Squid

Publish highly-requested Web pages statically, so that they don't hit the database

Scale the Web server farm horizontally via load balancing

Scale the database servers horizontally and split them into read/write servers and read-only servers, then load balance the read-only servers

Scale the Web and database servers vertically, by adding more hardware resources (CPU, RAM, disks)

Increase the available network bandwidth

Performance tuning can sometimes be more art than science, due to the sheer complexity of the systems involved in a modern Web application. Care must be taken to modify one variable at a time and redo the measurements, otherwise multiple changes can have subtle interactions that are hard to qualify and repeat. In a standard test environment such as a test lab, it will not always be possible to replicate the production server configuration. In such cases, a staging environment is used which is a subset of the production environment. The expected performance of the system needs to be scaled down accordingly. The cycle "run load test->measure performance->tune system" is repeated until the system under test achieves the expected levels of performance. At this point, testers have a baseline for how the system behaves under normal conditions. This baseline can then be used in regression tests to gauge how well a new version of the software performs. Another common goal of performance testing is to establish benchmark numbers for the system under test. There are many industry-standard benchmarks such as the ones published by TPC, and many hardware/software vendors will fine-tune their systems in such ways as to obtain a high ranking in the TCP top-tens. It is common knowledge that one needs to be wary of any performance claims that do not include a detailed specification of all the hardware and software configurations that were used in that particular test. Load testing We have already seen load testing as part of the process of performance testing and tuning. In that context, it meant constantly increasing the load on the system via automated tools. For a Web application, the load is defined in terms of concurrent users or HTTP connections.

In the testing literature, the term "load testing" is usually defined as the process of exercising the system under test by feeding it the largest tasks it can operate with. Load testing is sometimes called volume testing, or longevity/endurance testing. Examples of volume testing: •

testing a word processor by editing a very large document

testing a printer by sending it a very large job

testing a mail server with thousands of users mailboxes

a specific case of volume testing is zero-volume testing, where the system is fed empty tasks

Examples of longevity/endurance testing: •

testing a client-server application by running the client in a loop against the server over an extended period of time

Goals of load testing: •

expose bugs that do not surface in cursory testing, such as memory management bugs, memory leaks, buffer overflows, etc.

ensure that the application meets the performance baseline established during performance testing. This is done by running regression tests against the application at a specified maximum load.

Although performance testing and load testing can seem similar, their goals are different. On one hand, performance testing uses load testing techniques and tools for measurement and benchmarking purposes and uses various load levels. On the other hand, load testing operates at a predefined load level, usually the highest load that the system can accept while still functioning properly. Note that load testing does not aim to break the system by overwhelming it, but instead tries to keep the system constantly humming like a well-oiled machine.

In the context of load testing, I want to emphasize the extreme importance of having large datasets available for testing. In my experience, many important bugs simply do not surface unless you deal with very large entities such thousands of users in repositories such as LDAP/NIS/Active Directory, thousands of mail server mailboxes, multi-gigabyte tables in databases, deep file/directory hierarchies on file systems, etc. Testers obviously need automated tools to generate these large data sets, but fortunately any good scripting language worth its salt will do the job. Stress testing Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it (in which case it is sometimes called negative testing). The main purpose behind this madness is to make sure that the system fails and recovers gracefully -- this quality is known as recoverability. Where performance testing demands a controlled environment and repeatable measurements, stress testing joyfully induces chaos and unpredictability. To take again the example of a Web application, here are some ways in which stress can be applied to the system: •

double the baseline number for concurrent users/HTTP connections

randomly shut down and restart ports on the network switches/routers that connect the servers (via SNMP commands for example)

take the database offline, then restart it

rebuild a RAID array while the system is running

run processes that consume resources (CPU, memory, disk, network) on the Web and database servers

I'm sure devious testers can enhance this list with their favorite ways of breaking systems. However, stress testing does not break the

system purely for the pleasure of breaking it, but instead it allows testers to observe how the system reacts to failure. Does it save its state or does it crash suddenly? Does it just hang and freeze or does it fail gracefully? On restart, is it able to recover from the last good state? Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes? Is the security of the system compromised because of unexpected failures? And the list goes on. Conclusion I am aware that I only scratched the surface in terms of issues, tools and techniques that deserve to be mentioned in the context of performance, load and stress testing. I personally find the topic of performance testing and tuning particularly rich and interesting, and I intend to post more articles on this subject in the future. posted by Grig Gheorghiu at 7:33 AM


Thank you for this nice overview. I would have liked JMeter to be mentioned along with OpenSTA and TheGrinder.

Thanks for sharing very information on all three aspects of testing: Performance, load and stress testing.

You can look at Load Testing Terminology by Scott Stirling Still don't see a clear cut difference between performance and load testing, they look more like synonyms for me with some small difference in, perhaps, goals of testing.

Hi, I still didnt get clear abt the difference btn performance and load testing. What are the parameters we should consider in both testings? Can we run single user/single iteration for performance? Is it neccessary to increase the load? Can we run the performance scripts for a fixed time?

Senthil, I usually start my performance testing with a single client machine and I constantly increase the number of simulated concurrent users -- for example, I start with 10 concurrent connections to the server and I increase that number to 100 in increments of 10. At some point, either the client or the server will become the bottleneck. If the server is capable of serving a large number of concurrent users, then the client machine will become the bottleneck. At that point, you need to add more clients to your test and continue to increase the load on the server. As for the main difference between performance and load testing: you do performance testing in order to find any bottlenecks in your application code and eliminate them. When you can't possibly optimize your application anymore, you start doing load testing, which tells you for example how many Web servers you

need behind a load balancer, how many database servers you need, etc. -- all this so that you can sustain some pre-defined load that your customers/users require. By Grig Gheorghiu, at 6:56 AM

Hi Grig Gheorghiu, Thanx for your explanation.But I have one more question to you. Let me explain the way how I am doing the performance/load(I dont know how to name it) testing. We are doing performance testing of a web application from a client machine. steps: 1) first I will prepare the test case scenarios for the testing based on the need of customer(for example some searches, Reports and some navigations) 2) I will record the steps using OpenSTA. 3)Develop the test task and run the individual tests from 1 v.user to max of 100 users. 4)Before reaching 100 v.users only I will get errors in the results and graph start decreasing, 5)So based on this graph and data I will find out the bottleneck for that scenario. 6) usually we are running each test for a constant period like 10mts or 15 minutes. 7)Here how I can distinguish btn client and server bottleneck. 8) Am I doing the testing in a right way or not. We are calling this as performance testing only. 9) Here how to do load test? Sorry for putting lot of queries. Thanks and regds, Senthil By senthil, at 11:24 PM


As I said before, it's sometimes hard to distinguish between performance and load testing. Many people use them interchangeably and I can't really blame them. One of the points I was trying to make in my blog entry was that by performance testing people generally mean tuning your application so that you eliminate bottlenecks in your code. In your example, you would need to profile your Web app while you are sending requests from your client. You will then be able to see what functions/methods in your code are the hot spots (i.e. where most of the time is spent). At that point, you can try optimizing your code, or maybe you database access, if the code involves calls to the database. If there's no possible way to further optimize your code/database logic, then you need to look at scaling the application either vertically (by throwing more hardware at the server), or horizontally (for ex. by adding more Web servers behind a load balancer). Load testing is what you're doing after your eliminated the bottlenecks in your code. You want to know at that point what the maximum capability of your application is. So I think in your case, since it doesn't seem like you're doing any profiling/benchmarking within the application, we can say you're doing load testing. Your methodology is sound, but at the same time you need to monitor the server via OS-specific utilities (vmstat, top, iostat or Windows Perfmon) and also, as I said, you need to monitor your app via a profiler. As to how to see whether the client or the server is the bottleneck, you can try running another network-bound process on the client against a different server. If the client can sustain the second process, it means the first server was the bottleneck. If the resources on the client are exhausted, it means the client is the bottleneck. For more info on client-related bottlenecks, see By Grig Gheorghiu, at 10:24 AM

Hi Grig Gheorghiu,

Thanks for your clean explanation. Actually Our work starts from discussing with the technical team for getting the right scenarios where we need to do performance testing. After getting the results we will do the analysis using the HTTP DATA List in Excel sheet. For CPU monitoring we are using a tool called "PerfView" where we can get all server (CPU time,network,I/O usuage and etc)related parameter values. We will get both app server and db server details in that tool. My Question is running for a constant period is right or not. I feel this time is need to scaleup the vusers in server. But I am not sure how we can decide like running for 10 mts or 15mts. Will it make any change in results. We are taking the average time of each urls in excel sheet started from 1 user to max users. Clearing the time duration to run will make me comfort. I am also looking your other guidance points which definitly will improve our quality and use of testing. Thanks and Regds, Senthil By senthil, at 11:55 PM

Good concept on Load and Performance testing. Now can we say load testing is some thing where we test the systems to check how they work under specified load? and performance testing is testing an application with slow ramp up to see the response times under the load? Correct me if i am wrong. The terms Performance,Volume,Load,Spike, stress testing always confuse me. Suresh Chatakondu By Anonymous, at 1:49 PM

Suresh, I think you're right in your characterization. I'd also add that for performance testing, you need to be profiling your app and monitoring your servers while you're increasing the load on the system. I tried to explain these terms in a better way in another blog entry: By Grig Gheorghiu, at 1:52 PM

Hi I have still confussion with Performance,Load Testing, Volume Testing Can you plz one example for each testing Thanks S Rajkumar By Anonymous, at 3:34 AM

Another cool tool for do performance testing is Eclipse project called Hyades. With that you can easy test web applications, I really like to use it, its more simple and more especific than OpenSta.

Is there any open soure tool for performance testing. If It there, Can support IE 6.0. I heard abt open sta will support only 5.5.

How do "concurrent users" as set in a load test, relate to real traffic numbers? e.g. if i want my site to cope with a peak traffic of 4000 hits/hour, what number of concurrent hits should it be able to cope with? By my reckoning there is only one answer: it has to cope with 4000 concurrent users, incase those 4000 hits/hour all come at the same time. But then that's ridiculous!

The explanation was great and upto the mark. I would like to know how to identify Matrices(like bottle necks, performance of the appilcation).

I am a gaduate student and my project is about web application testing tools. can anybody let me know if Hyades creates a model from each web site and creates the test cases from this model; or it duplicates the action sequences directly from weblog or mimics recorded user actions?

1. double the baseline number for concurrent users/HTTP connections randomly shut down and restart ports on the network switches/routers that connect the servers (via SNMP commands for example) : Here how it will justify that you have stress on system as the word itself say that you have to give some stress on system and not the RECOVERY TIME, as this will consider how much time the system will take to retain its stable working. take the database offline, then restart it Again this is recovery step... run processes that consume resources (CPU, memory, disk, network) on the Web and database servers This step is correct but how will justify the statement where you have said "take away resources ...". I am confused about these statement. Please clear me.

Ive been a performance and functional testing consultant for about 12 years now. Here is the problem. Many people forget that performance and stress are not interchangable. I dont think Grig couldve been more clearer. The tools I use are LoadRunner, WinRunner, QuickTest Pro, Scapa, and others. Right now I am on a project in Woking, UK, where the new system architechture is being tested for performance, load and resilience. During this round of testing we are loading the system with 5000 concurrent users performing normal user actions on the site. The webservers, appservers and DB performance levels are constantly monitored. Now do you call this a performance test, load test or a stress test. This scenario is a performance test. The system is being evaluated for its performance under a constant load of users. They want to see if the proposed new architechture is sufficient enough to support this load of users. However, this can be classified as a load test as well, simply because there is a sustained load of users (clients) hitting the system over a period of time. Another scenario is this: The system being tested is a new system architechture and tested with increasing number of users to guage the performance of each component compared with the load provided to the system. This is a load test. Because there is no defined target and the only goal is to see where the breaking point of the system is. This set of testing is not stress test because no part of the system is in distress while the test is conducted. Is everyone still unclear? Michael Sky

From what I understand in the context of a website -Performance Testing aims to answer (one of) the question(s) - How many concurrent users can my site handle and still provide a n average response time to the user of under 1 second? Stress Testing aims to answer (one of) the question(s) - What would happen if the database server crashes when hundreds of users are transacting online? Am I thinking right? ;-) By Siva, at 4:51 PM

Has anyone come across a server-client testing tool, I mean not a web application. Tool which could test application developed in Java. But mainly I require that whatever test scenrios are developed should be shown in Graphics and also the result should be mesureable. Would be glad if anyone can provide details of such a tool.

to replicate production test, just limit your resources in proportion to your users. So if you want to test for 4000 sym. users in test, you look at your production machine and half or quarter it. (ie. if production has a 2.7 Ghz box with 2 Gs of ram. In test use a 1.3 Ghz machine with 1 G of ram. Then use 2000 users and so on.)

This seems focused on web testing. Any suggestions for tools to load test Windows thick-client applications?

Daryl -- sorry, can't help you very much there, since I haven't used thick Windows clients in a while....One approach would be to roll your own scripts that simulate key presses, mouse events, etc. Check out the tools (admittedly Python-based all of them) in the "GUI Testing" section of the Python Testing Tools Taxonomy page I'm maintaining (I split the URL in multiple lines, please join them together): wiki/PythonTestingToolsTaxonomy HTH, Grig By Grig Gheorghiu, at 7:26 AM

Hello Grig, your blog is very informative. Thanks a lot for providing such useful information. But I still have a doubt, is load testing and volume testing the same? is

there any specific difference between them. I do not know weather it is true but i read somewhere that in volume testing we test large sets of data using one user whereas in load testing large sets of data is tested using multiple concurrent user. Please suggest. waiting for your response By Anonymous, at 2:30 AM

Anonymous -- I don't think there is much difference between volume and load testing. I would say though that load testing is usually applied to the system as a whole, in order to see what its behavior is under constant load. Volume testing is applied to specific sub-systems, to see how they react in the presence of high volumes of data. But as with performance and load testing, these terms are sometimes interchanged. As far as I'm concerned, a sound testing strategy should incorporate these types of testing, paying less attention to their names and more attention to actually running them. By Grig Gheorghiu, at 10:18 AM

i like very much your syntetic and clear explanation about difference in PLS testing approach. I translate in Italian your article to explain to non-technical users the difference between these tests. Because I see in it.wikipedia that this definition are absent, I would like to insert my traslation from your article on it. I've seen you use CC license, but I'm not sure about its limitation, can you give me your permission to public on Wikipedia?

Giovanni -- sure, you have my permission to put your translation on Wikipedia. Thanks! Grig

Grig By Grig Gheorghiu, at 10:40 AM

Hey Grig, like most of the readers (I think) I still don't see a clean line in the sand between the three methodologies. What I think is clear is that depending on the situation, you have to mix and match to get to where you want. If my understanding is clear, we have 3 main metrics here: . nominal performance . throughput . resilience that can in turn be mapped to: . performance testing (assessing a baseline) . load testing (how many clients before performance degrades or, as customers would ask, how many boxes do I need for my X customers) . stress testing (what are the parameters that need to be monitored to make sure the service remains available) With that being said, I would like to come back to one of your first sentences: "Ideally, the software under test is already stable..." Performance is a very interesting metric to have even as the project just starts. It will allow to pinpoint, as features get committed, which has a steep price in terms of performance. Or, when the team focuses on performance improvements throughout the project, it will allow to keep chaos under control by being able to quantify improvements over baseline or over previous measurement. Thank you for an interesting post that fosters and focuses the interest of the community on one important aspect of testing.

Arnaud -- thanks for the thoughtful comments, your taxonomy makes sense. I agree that doing performance testing early in the game has lots of benefis, but in my experience very few people/teams do this. Grig By Grig Gheorghiu, at 9:19 AM

Great article! I tried OpenSTA and Grinder, but found there was a lot of scripting and debugging to get my tests to run properly as we have a very dynamic site. I eventually came upon OpenLoad, which was much easier to use and relatively inexpensive.

a good overview. as per my knowledge performance testing usually evaluates 3 things 1 throughput 2 response time and 3 latency thanks & regards murali

I do not why people have a confusion between Performance and Stress. Performance is you are veryfying the performance of an application for a set of users i.e. say when 2000 users acess the system the system is serving them in 5 seconds. Stress is the same above system is designed for 2000 users but there can be some time in the business calendar when there will be 3000 users, as the application is only designed for 2000 users the application can be slower than regular load may be the system may take 9 seconds to serve a page, now if there are still more users say 1000 more that is total load is 4000, the system can run out of resources and may crash. Hope you got it By Srinidhi, at 9:20 AM

The companies I worked as performance engineer have been doing the following Performance tests: Baseline/Benchmarks testing Load/Stress load testing In general: Baseline/Benchmarks testing: need to be exec from build to builds, releases to releases under the same hardware settings, the same test tools, the same workload profiles, the same running time 10-30mins, collected the same required performance matrices, mostly measuring the peak time product matrices, the purpose is to see if the results fit product release requirement (such as web page latency < 3sec, concurrent connections/users > 1k or 5k, %CPU usage < 85%), then comparing the results between releases, build#, different venders' similar products. Most of time, company will use this tests to collect the best numbers to advertise its product. Sale, Marketing depts. are more interesting on those results. In real life, lots of time people may never experience such good numbers. Such as mileages declared by automobile company. Load/Stress Load testing: we may use the same setup like benchmarks, but mostly we used more complicated workload profiles, the test settings may be modified over the time as long as the tests will be designed to excise the most of product functionalities, the tests running with less intensive load, say if benchmark showing the product can handle 2000 concurrent connections w/ latency less than 3-5sec, the stress load with more complicated workload may only running 50-70% of 2000 conn. But the load duration will be running from overnight about 12hrs to 72hrs or weeks. Under the load, lots of functional tests may be performed. This kind of tests is more

simulate real life scenarios. During both above tests, we all collect product system resources usages by using Win:PerfMon, Unix/Linux:vmstat/mpstat/iostat/netstat/lsof/ps/sar. %CPU, context switch, memory, IO wait/queue, num of established/time-wait sockets are counters helping to identify product’s bottlenecks & defects. It seems like the different companies; even different projects within the same company may have different process & methodologies to do their performance tests. Performance release requirements & goals are very important as starting point. By Anonymous, at 11:45 PM

Thanxs 4 good but explanation I would like 2 add 1 more thing that take ex of total tesing of chair==> normal tresting ==if chair is dsnd 4 100 kg wt,n my wt is 70 kg then that teting is called as nornal testing. if my wt is 100 kg then that testing is called as load testing, if my wt is 120 kg then that testing called as stress testing..

Hi Neelkanth, Thanks for clear comments on Stress and load testing. But what about Performance testing. Thanks, Rajiv

By Rajiv Walia, at 10:56 PM

Can we check Performance testing by using: 1. Load testing 2. Stress Testing Right?

Simha -- check out iperf at Grig By Grig Gheorghiu, at 7:34 AM

Let me give you a scenario to help understand my question, There is a web intranet application that is being accessed from different locations. When the users from the Head Office are accessing this application they have no problem, however the remote location users have a prblem of delayed response. The hardware on all the locations including the HO is the same. How do I conclude if the problem is because of slow network access or slow cpu processing. I have tried trace route and that indicates a

mojor time lag between the hops for the problematic locations, however the HO tracert is good. Also I have tried accessing a static page from all the locations and the response time for this page is very big from the problematic location. Can I conclude from this that the problem is with the network or is there anything else that I can try. Very importantly, how can I clearly eliminate the possibility of slow cpu response. By Gokul Nadar, at 11:32 PM

Gokul -- I think there's a simple test you can run to eliminate the possibility of a CPU bottleneck. Run some utility on the server running the Web intranet application. If the server is Unix/Linux-based, run top. If the server is Windows-based, run Performance Monitor. Access your application from the HO and note the CPU utilization. Access your app from a remote office and also note your CPU utilization. My guess is that the 2 numbers will be almost identical. This means the bottleneck is not CPU-bound, but, as you noted from the traceroutes, network-bound. I'd also run iperf between the clients and the server -- it will tell you what kind of network throughput you have on those links, and it will be another argument for the network bottleneck. Iperf is available here:

