An Introduction to Performance Testing By Leslie Segal As with regression testing, it seems that everyone today is looking for a performance testing tool that will take care of all of their performance testing needs. As it turns out (as with most testing), the tool is the least important piece of the effort. Performance testing is 90% analysis and 10% tool implementation. However, unlike most other testing, performance testing is nearly impossible to perform without a tool. Almost all of the performance testing tools on the market today are based upon the concept of simulating the traffic from real users with virtual users. You know that you need a performance test tool, but how many virtual users do you really need to test the performance of your application? What defines a virtual user? What is the correlation of real users to virtual users? If I can write a test script using one virtual user that will log onto my system as 100 different users, what does a virtual user really mean? It is important to have the answers to these questions before you begin performance testing; the purpose of this article is to help you understand why. Why should you even care about what a virtual user is? For one thing, almost all of the performance tools have a pricing structure based upon the number of virtual users that you license. The cost of a license for 250 virtual users is significantly less than one for 2,500 virtual users. So, if you don’t really need 2,500 virtual users, why spend the additional money on both the tool and all of the hardware that you will need to support
running the additional virtual users? (Expect to need about one PC for each 200-250 virtual users). Some companies assume that the relationship of real users to virtual users is 1:1. They look at the cost of the tools and then make the decision not to do any performance testing, because they feel that their perceived risk by not doing the testing will cost them less than buying the tool. This can prove to be a very costly mistake, as many e-business Web sites have found out (painfully so). In addition to saving money, understanding your real users and their correlation to virtual users will allow you to generate more accurate scripts and, in turn, more precise results. One of the pitfalls of performance testing is that - just like with functional test automation - you can create really bad scripts. However, in the case of performance testing, you may think (and have run scripts that show) your Web site can support 1,000 concurrent users, when in reality, ten users executing a particular function may cripple your site.
time. Creating and executing scripts using performance test tools is extremely easy to do. At most, they take about half a day to master the basic record and playback concepts. The hard part is figuring out what to script and then what to do with the copious data that will be generated in order to reach the right conclusions about the true performance of your system.
Performance Testing Overview
Throughput – how much activity can the system support, often measured in transactions or hits/second; and,
First, let’s take a look at performance testing in general. Functional testing is a fairly easy concept to grasp and the results are fairly concrete. Most functional tests pass or fail, as the expected result is very black and white. Performance testing on the other hand is neither concrete nor black and white. What does ‘passing’ mean for a performance test? Does it mean that with 300 simultaneous users hitting a site, if the average response time is less than ten seconds, then it passes? What if the maximum response time is 90 seconds and the standard deviation is 35 seconds? Does the test still pass? For some, passing means that with n concurrent users on their site, it doesn’t crash. The (first) hard part in embarking upon performance testing is to define what you want to measure and then what criteria you will use for determining whether or not the test passes.
Round Trip Time – how long does the entire user requested transaction take, including connection and processing
We sat down with one client to discuss exactly what their criteria were; in performance testing their site, what was it
The performance of your system should be measured in three areas: Response Time – how long does it take the system to respond to a request for a piece of information;
queries, but you don’t have a lot of memory in your application server, the first scripts that you write may just be queries to the database. On the other hand, if you know that certain processes in your application are CPU intensive, you may want to create scripts that kick off these processes and then try and perform other tasks while the CPU is busy.
Figure 1: Expected Performance Response take you to get through the intersection where the accident occurred) is increasing, but you aren’t even moving. You are not doing anything except standing still. It is not until three of the cars in front of you have moved that you can even think about moving. At this point, your travel time is extremely long and you start to crawl through the intersection. This also explains why when you are in car number 50, by the time you pass through the intersection, you have no idea what happened and why you were stuck in traffic for so long to begin with. So what does being stuck in traffic have to do with the performance of your application? Plenty. They are both governed by basic queuing theory. Your application takes requests; if the requests come in faster than they can be processed, the application will start to queue them up. That’s when the properties of queues and queuing theory take over to explain the behavior (or non-behavior) of your sys-
tem. The principles of queuing theory are beyond the scope of this article.
Understanding Your System and Creating a Transaction/User Profile So what does all this have to do with virtual users? A lot. Before you can begin to create scripts to be executed by the virtual users, you need a basic understanding of your system. If you understand the hardware architecture (NT server, UNIX server, load balancing, firewalls, separate database server, number of CPUs and memory, etc.) of your system and the implementation (use of proxies, indexing in the database, load balancing algorithms, etc.) then you can assess where probable bottlenecks will occur and start your performance testing in these areas. For example, if you know that your application will be asked to perform a lot of
Figure 2: Actual Performance Response Approaching a Bottleneck
Before you create the scripts to test your performance, you also need to understand how your users will actually use the system. What are typical scenarios users perform? If you already have a live Web site, using a tool like WebTrends® will provide you with useful user profile information. If you aren’t sure how users will use your application, then you need to make some guesses based upon their behaviors in similar environments. John Musa has written many articles and books on the subject of creating operational profiles. Operational profiles are extremely useful in helping to analyze the probable transactions to your application. If you are also trying to determine the maximum load that your Web site will need to support, it is often useful to visit your competitors’ Web sites. They will often boast of their traffic (especially if they are selling ad time on their Web site). We have found Web sites that boast that they received 200 million hits in the last 30 days. That gives you a good starting place from which to measure. Assuming a linear distribution of those 200 million hits (note that a linear distribution is extremely unlikely, but it’s a start), then the 200,000,000/ 30 days = 6666667 hits/day = 277778 hits/hour = 4630 hits/sec. While the traffic to some web sites may be indeed be linear over a particular period of time (8AM-8PM), an exponential, or Poisson distribution may be more realistic.
Creating the scripts involves defining a scenario, or transaction, that your user will execute. A scenario may be comprised of one or more transactions and a transaction may invoke one or more hits. For example, one transaction may be defined as loading the home page. Actually executing this transaction may generate 1-50 hits. (We have seen home pages that load so many GIF files that they literally account for 75 hits.) Next, you need to assess how many times this scenario will be executed over a defined
they logon, a 2K “chunk” is sent every five minutes. They also receive a 10K “chunk” every ten minutes. Determining the transaction profile (assuming linear distributions):
90
80
70
60
1 sigma
50
40
One
transaction every 5 minutes x 10,000 users = 10,000 transactions / 5 minutes = 2,000 / 1 minute = 33.33 / 1 second.
30
20
10
avg
2 sigma
0
One
transaction every 10 minutes x 10,000 users = 10,000 transactions/ 10 minutes = 1,000 transactions/ minute = 16.67 / 1 second
Total
traffic per second = 33.33 + 16.67 = 50 transaction (hits) / second If the response time for each transaction is 10 seconds, then we need 500 virtual users to generate a load of 50 transactions/second (50 transactions/sec * 10 seconds = 500 unique transactions) Buying a 500 virtual user license was much more palatable than a 10,000 virtual user license, and certainly a better option than not testing at all.
Data: The End Result – What Does it Mean? Once you’ve determined the profiles of your real users, created your virtual user scripts and run them, you will be the proud owner of profuse amounts of data with a variety of interpretations and many questions about whether the tests passed or failed. One of the more common metrics used in measuring performance is the average response time. The average response time is fine - unless you are the one user that pulled the average response time higher. Average is the sum of all of the response times divided by the number of values. The average response time for
Figure 4: Is Average Good Enough? the following two sets of data is essentially the same – five seconds: Data
Set # 1: 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
Data
Set #2: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
If you are the user in data set 2 that always takes ten seconds, you don’t really care that the average response is five seconds. (See Figure 4.) If the maximum and minimum response times deviate from the average response time, then it is also important to graph and analyze the maximum response times along with the averages or to plot the standard deviations of the responses. (See Figure 5.) This graph also helps visualize the pain that the user with the maximum response time is experiencing. The maximum response time is also going to be closer to the potential bottleneck in your system. Plotting virtual users against response time (especially when there is not a 1:1 correlation between real and virtual users) can be misleading and result in different impressions of the actual performance of the application. For example, this graph shows that with 500 concurrent virtual users, your system response time is
Figure 5: Average Response Time Plotted with Standard Deviation Average Response w/ std deviation
seconds
60.00
40.00
60
30.00 40 20.00 20 10.00 0 1
50
100 150 200 250 300 400 500 600 70
This graph portrays a possible bottleneck at a throughput rate of 31 hits/sec. After that threshold, the response time starts increasing. The throughput of 31 hits per second may correlate to n “real” users, but it may not. It may correlate to 500 real users loading a homepage, but only 100 real users logging on to the system. There is no easy formula for determining the number of virtual users that you need to adequately performance test your site. There are no rules of thumb. Each site must be analyzed to determine what is reality and what is virtual. In the end what really matter is the reality of your company and your job. Leslie Segal, Associates, Inc.
0.00
President,
Testware
References: The Practical Performance Analyst, Neil Gunther, McGraw-Hill, 1998. Web Server Technology, Nancy Yeager and Robert McGrath, Morgan Kaufmann Publishers, 1996
Figure 6: Average Response Time Plotted with Hits/Second
50.00
80
less than 30 seconds. You need to correlate the 500 virtual users to the real users on your site. Sometimes it makes more sense to plot the response times against the number of hits to the site, as in Figure 6.