Gathering Performance Requirements 1 Introduction The Quality of Service attributes like performance, availability, reliability and supportability are becoming an increasingly important measure of project success. Several studies have indicated that the inability to meet the desired/required performance is one of the main reasons for software deployment failures. A study conducted by Infosys Technologies Ltd. on several poorly performing Internet applications indicated that more than 50% of the problems were caused by a lack of clear performance goals when the software was designed and developed. Further, root cause analysis revealed that these defects were caused by incomplete performance specifications and poorly understood constraints under which the system operates. This has been reiterated in Tom Gilb’s principle, "Projects without clear goals will not achieve their goals clearly." [1] So, what are the factors that hinder the process of a comprehensive performance requirements gathering? While there are hordes of techniques for functional requirements gathering and analysis, there seems to be no industry-standard approach or technique for gathering performance requirements. There are some notational models like RT-UML and Coleman’s Use Case Template. [2] That extends a simple use case with a field for representing nonfunctional requirements. However, restricting this exercise to be purely based on the use case view misses the essence of some important dimensions of performance such as fluctuations in the workload and the constraints under which the system will operate. Thus, a shop-usable approach is not readily available. The other factor that impedes the creation or adoption of a standardized approach is the inherent variability in context of each software development. Some of these are driven by the different software solution patterns, like client server application, Enterprise Application Integration, Web-based, or real time; and their development scenario like re-engineering, new development, maintenance, etc. In this article we describe a systematic approach for gathering performance requirements for an Internet application, and illustrate it with a case-study of reengineering of an Internet Banking application from ASP to ASP.NET. 2 Problem Formulations Consider the following statement that was extracted from one real-life project document that laid out the performance objectives of the software system - "The ability to handle a large workload with minimal processor time, and to increase the maximum achievable workload utilizing standard hardware scaling techniques." There are several pitfalls in this description of the required system performance that lead to ambiguities in its interpretation, and verification. For example, it does not quantify the workload the system needs to meet, or the expected processor cycles per unit of workload, or the maximum workload till the system is expected to scale linearly. For the architects and designers, incomplete specification makes it difficult for them to ascertain whether the architecture will meet the desired goals or analyze the tradeoffs with respect to other quality of service requirements such as security, or usability that the system is expected to meet. For the IT managers, it could manifest as uncertainties in timelines, budget, available resources etc. The above requirements specification should be more precise in terms of the workload to be supported (for example, number of concurrent users an d/ or expected transaction throughput), the maximum acceptable processor utilizations and the
forecasted workload. These requirements could then be verified in the subsequent lifecycle stages. For example, performance testing would validate the system’s adherence to these targets for the operational workload. As mentioned by Keepence et al [3], any system requirement should be Specific, Measurable, Attainable, Realizable and Traceable; he refers this as the SMART criteria. 3 Problem Solution The performance of a system is a function of its operational workload, the underlying hardware and software infrastructure and the application’s persistent data volume. We choose to define these as: (a) Deployment View to capture the hardware and software environment, constraints and service-level agreements for the deployment environment. (b) Operational Workload View to capture the different types of workloads that the system is subjected to, the pattern and intensity of the requests during the different workloads intervals, the desired responsiveness and the growth projections. (c) Persistent Data View to capture the current data size and the data growth over a period of time. These views capture the key attributes from business and technological standpoint. Though we define these views, it is difficult to generalize the set of parameters and the level of granularity required to suit every situation. For example, the performance requirement document for a software system designed to be used by tellers in a bank may need to capture the users, and their think time. However, for an Internet online shopping site, the workload is best quantified as the expected number of request arrivals per unit of time, and its arrival pattern. In the following case study, we demonstrate the process of capturing performance requirements using these three views for an Internet application that was being re-engineered. 4 Case Studies The case study is of an Internet Banking application that was undergoing a major reengineering initiative due to certain business and technological drivers. The user interface and communication layers of the current application that were based on Microsoft technologies ASP, COM, VC++ and VBScript, was re-engineered to use Microsoft .NET framework. One critical piece in this re-engineering exercise, was defining the performance objectives of the new system in a manner that it can be verified in the downstream project lifecycle stages. However, the only performance requirement that was quantified by the business at the start of the project was that the re-engineered application should be as good as the existing application in terms of its responsiveness. Clearly, the situation required us to baseline the current system’s performance along with the existing operational workload and data volumes. This could be used as a baseline to validate the performance of the re-engineered system. The next section details the process adopted to capture the several attributes of these three views mentioned in the previous section.
Figure 1: Performance Requirements Gathering 4.1 Gathering Data for Existing System The exercise began by identifying an analyst as a single point of contact. The analyst was responsible for organizing interviews with the relevant stakeholders, and ensuring that their responses were provided on time. These stakeholders were developers of the application development and maintenance team, business analysts, system and database administrators, and program manager of the re-engineering initiative. Before kick starting the interviews, a briefing session was organized to detail the purpose of the performance requirement gathering exercise. This session helped obtaining relevant documentation about the existing system from the stakeholders, it provided an opportunity to clarify stakeholders’ queries, concerns, and in communicating the need for their commitment, in providing support throughput this exercise. The next steps were to create and circulate questionnaires with the objective of gathering data and information required for creating the three views described earlier. Fig. 1 illustrates the relationship between the stakeholder profiles and the different views for which they can provide their inputs. 4.1. A Deployment View The existing system architecture document forms a good starting point for the creation of the deployment view. However, most often this needs to be supplemented with a questionnaire to the system administrators and IT infrastructure hosting and management team to capture more details. Some sample questions extracted from this case study are given below: What is the bandwidth of the Internet link between the end users and the Web infrastructure?
What is the bandwidth of network link (LAN) between the Web and application servers and between the application servers and the database server? Provide the configuration of each Web server, application server and the database server in terms of: (I) Number of processors (II) Type of processor (processor speed) (III) Memory capacity (IV) Hard disk: disk size, number of disk controllers and RAID level. Is there any infrastructure change/upgrade planned for reengineered application? Are the vital server resources like processor utilization, memory utilization, network utilization and resources utilization for key processes monitored on Web, application and database servers? If yes, then can this data for the peak workload durations be provided to us? What are the maximum acceptable utilizations for the resources mentioned above? Are the server logs like the Web server access logs available? A snapshot from a sample deployment view is shown below:
Figure 2: Snapshot from sample Deployment View 4.1. B Operational Workload View Operational Workload View captures the information related to the business transactions of the application. The first step in the creation of this view is to identify all the critical business transactions from the functional specification documentation, or with the help of the development team. Further questionnaires are sent to the business stakeholders to get expected response times, transaction volume projections for the future, and other known trends or seasonality in the workload. However, the creation of this view will require analysis of workload data spanning the over past few months and/ or several geographically distributed users. We will describe the approach for this analysis in section 4.2. Some sample questions extracted from this case study are given below: List the different usage periods such as Heavy, Medium, Light and Very light load and identify the time intervals during which they occur. List the business transactions that occur during the above usage periods. Provide the arrival rate (number of transactions/hour) for each business transaction in the above periods Provide any growth projections expected for each transaction for the growth periods (12, 18 months) for each of the above scenario. What is the expected total number of users of the system? What is the expected maximum number of logged-in users? Other than the workload mentioned above, are there any other applications/transactions that generate additional load on the servers? List the business criticality of each business transaction.
Please provide the current or the expected average, maximum and 90-percentile of response time for the above transactions. What are the response time limits of the external systems on which the application is dependant? A snapshot from a sample operational workload view is shown below:
Figure 3 Snapshot from sample Operational Workload View 4.1. C Persistent Data View This view captures the current data volumes and expected growth. It is very important to understand the data retention policies in order to plan for purge or archival cycles. A questionnaire was circulated to the DBAs to get the information on the database size and organization (both physical and logical). Some sample questions extracted from this exercise are given below: What are the main data entities that are used in the application (this may correspond directly to a database table or a set of related tables)? What is the current database size? What is the size of some of the largest tables in the database? What is the data archival and purge frequency? What is the expected database growth in the next few years? A snapshot from a sample Persistent Data View is shown below:
Figure 4 Snapshot from a sample Persistent Data View
The responses to the questionnaire were reviewed and taken up for further analysis. However, it is important to realize that stakeholders can answer not all the questions. In the case study, some of the important inputs that could not be provided by the stakeholders were: Arrival pattern of requests and the percentile distribution of arrival rate for different workloads. Operational workload mix determined by the users' navigation pattern in the site. Current and expected response times of the transactions in-terms of average, maximum and 90th percentile values. The next section details the techniques we used to capture this missing information. 4.2 Analysis Using Statistical and Analytical Techniques In order to understand the arrival pattern and the percentile distribution of workload and the workload mix we used existing historic data from Web server access logs of the existing application. The log files for the representative time periods were analyzed to get a percentile distribution table of the different arrival rate to the site. For more details on the analysis technique please refer to [4]. Further, the mix and the intensity of the different components in the workload can also be determined. In the case study, the transaction mix was found to be almost constant for each hour of the day. Using the business workload forecast and contingency factor, the growth factors for two different growth rates spanning twenty-four months were identified. The request execution time for the various requests recorded in the access logs gives an indication of the current processing time of the various transactions. These numbers were recorded as the baseline for comparison against the re-engineered application's transaction. Now, with all the required information, it was time to document and communicate the results of the analysis. The results were presented in a manner that was easy to understand because of several graphs, and charts. For example graphs could be used to represent the arrival rate of the top 10 transactions, transaction mixes, and the growth rate over the period of next two years etc. Sample data for the growth projection is given in the table below.
Figure 5: A snapshot from sample growth projections data 5 Conclusions Capturing the performance goals in a systematic manner is a pre-requisite for building performance into an application right from the start of application development. Unfortunately, lack of a well-defined approach has made it a difficult proposition for many requirements analysts. In this article, we discussed a shop-usable approach and illustrated it with the case study of re-engineering of an Internet banking application. The performance requirements document produced in the case study, with the help of the three views, provided specific and measurable performance requirements. Some of the ways in which this helped in subsequent lifecycle stages were as follows:
We were able to establish the current capacity baseline by correlating the system resource utilization from the Deployment View and arrival rate of the business transactions from the Operational Workload View. It provided us the ability to set performance targets for the reengineered system, by base lining the current response times of the system. We had specific throughput requirements for performance validation of the re-engineered system. We were able to apply growth factors to workload, which helped us in determining the required capacity forecast due to growth projections. The data volume requirement for the new system helped in testing the system against close to real life data volumes and also for planning for database organization accordingly. We could identify business transactions in the existing system that had unacceptable response times, and re-examine the application design and code when they were taken up for re-engineering. This approach proved to be quite effective in identifying the required Quality of Service attributes and can be applied in any similar situation.