Types Of Testing

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Types Of Testing as PDF for free.

More details

  • Words: 15,645
  • Pages: 47
Black box testing Black box testing takes an external perspective of the test object to derive test cases. These tests can be functional or non-functional, though usually functional. The test designer selects valid and invalid input and determines the correct output. There is no knowledge of the test object's internal structure. This method of test design is applicable to all levels of software testing: unit, integration, functional testing, system and acceptance. The higher the level, and hence the bigger and more complex the box, the more one is forced to use black box testing to simplify. While this method can uncover unimplemented parts of the specification, one cannot be sure that all existent paths are tested.

Contents • •

1 Test design techniques 2 User input validation



3 Hardware

Test design techniques Typical black box test design techniques include: • • • • • • •

Equivalence partitioning Boundary value analysis Decision table testing Pairwise testing State transition tables Use case testing Cross-functional testing

User input validation User input must be validated to conform to expected values. For example, if the software program is requesting input on the price of an item, and is expecting a value such as 3.99, the software must check to make sure all invalid cases are handled. A user could enter the price as "-1" and achieve results contrary to the design of the program. Other examples of entries that could be entered and cause a failure in the software include: "1.20.35", "Abc", "0.000001", and "999999999". These are possible test scenarios that should be entered for each point of user input.

Other domains, such as text input, need to restrict the length of the characters that can be entered. If a program allocates 30 characters of memory space for a name, and the user enters 50 characters, a buffer overflow condition can occur. Typically when invalid user input occurs, the program will either correct it automatically, or display a message to the user that their input needs to be corrected before proceeding.

Hardware Functional testing devices like power supplies, amplifiers, and many other simple function electrical devices is common in the electronics industry. Automated functional testing of specified characteristics is used for production testing, and part of design validation.

STRESS TESTING CODE COVERAGE Code coverage is a measure used in software testing. It describes the degree to which the source code of a program has been tested. It is a form of testing that looks at the code directly and as such comes under the heading of white box testing. Code coverage techniques were amongst the first techniques invented for systematic software testing. The first published reference was by Miller and Maloney in Communications of the ACM in 1963. There are a number of different ways of measuring code coverage, the main ones being: • • • •

Statement Coverage - Has each line of the source code been executed and tested? Condition Coverage - Has each evaluation point (such as a true/false decision) been executed and tested? Path Coverage - Has every possible route through a given part of the code been executed and tested? Entry/Exit Coverage - Has every possible call and return of the function been executed and tested?

Safety critical applications are often required to demonstrate that testing achieves 100% of some form of code coverage. Some of the coverage criteria above are connected; for instance, path coverage implies condition, statement and entry/exit coverage. Statement coverage does not imply condition coverage, as the code (in the C programming language) shows:

void foo(int bar) { printf("This is "); if (bar < 0) { printf("not "); } printf("a positive integer.\n"); return; }

If the function "foo" were called with variable "bar = -1", statement coverage would be achieved. Condition coverage, however, would not. Full path coverage, of the type described above, is usually impractical or impossible. Any module with a succession of n decisions in it can have up to 2n paths within it; loop constructs can result in an infinite number of paths. Many paths may also be infeasible, in that there is no input to the program under test that can cause that particular path to be executed. However, a general-purpose algorithm for identifying infeasible paths has been proven to be impossible (such an algorithm could be used to solve the halting problem). Techniques for practical path coverage testing instead attempt to identify classes of code paths that differ only in the number of loop executions, and to achieve "basis path" coverage the tester must cover all the path classes. Usually the source code is instrumented and run through a series of tests. The resulting output is then analysed to see what areas of code have not been exercised, and the tests are updated to include these areas as necessary. Combined with other code coverage methods the aim is to develop a rigorous yet manageable set of regression tests. Code coverage is ultimately expressed as a percentage, as in "We have tested 67% of the code." The meaning of this depends on what form(s) of code coverage have been used, as 67% path coverage is more comprehensive than 67% statement coverage.The value of code coverage as a measure of test quality is debated (see external links).

Defect tracking - In engineering, defect tracking is the process of finding defects in a product, (by inspection, testing, or recording feedback from customers), and making new versions of the product that fix the defects. Defect tracking is important in software engineering as complex software systems typically have tens or hundreds or thousands of defects: managing, evaluating and prioritizing these defects is a difficult task: defect tracking systems are computer database systems that store defects and help people to manage them. IBM Rational ClearQuest is an industry leading defect tracking tool. VMS & Bugzilla are bug tracking tools.

Software release life cycle A software release is the distribution, whether public or private, of an initial or new and upgraded version of a computer software product. Each time a software program or system is changed, the programmers and company doing the work decide on how to distribute the program or system, or changes to that program or system. Software patches are one method of distributing the changes, as are downloads and compact discs.

Software release stages The software release life cycle is composed of different stages that describe the stability of a piece of software and the amount of development it requires before final release. Each major version of a product usually goes through a stage when new features are added, or the alpha stage; a stage when it is being actively debugged, or the beta stage; and finally a stage when all important bugs have been removed, or the stable stage. Intermediate stages may also be recognized. The stages may be formally announced and regulated by the project's developers, but sometimes the terms are used informally to describe the state of a product. Conventionally, code names are often used by many companies for versions prior to the release of the product, though the actual product and features are rarely secret.

Contents



1 Software release stages o 1.1 Pre-alpha o 1.2 Alpha o 1.3 Beta  1.3.1 Origin of 'alpha' and 'beta' o 1.4 Release candidate o 1.5 Gold/general availability release o 1.6 RTM / RTW  1.6.1 Box copy o 1.7 Stable/unstable 2 See also



3 External links



Software release stages Pre-alpha Sometimes a build known as pre-alpha is issued, before the release of an alpha or beta. In contrast to alpha and beta versions, the pre-alpha is usually not "feature complete". When it is used, it refers to all activities performed during the software project prior to software testing. These activities can include requirements analysis, software design, software development and unit testing.

Alpha The alpha version of a product still awaits full testing of all its functionality but satisfies all the software requirements. As the first major stage in the release lifecycle, it is named after alpha, the first letter in the Greek alphabet. The alpha build of the software is the build delivered to the software testers, that is persons different from the programmers, but usually internal to the organization or community that develops the software. In a rush to market, more and more companies are engaging external customers or value-chain partners in their alpha testing phase. This allows more extensive usability testing during the alpha phase. In the first phase of testing, developers generally test the software using white box techniques. Additional validation is then performed using black box or grey box techniques, by another dedicated testing team, sometimes concurrently. Moving to black box testing inside the organization is known as alpha release.

Beta A beta version is the first version released outside the organization or community that develops the software, for the purpose of evaluation or real-world black/grey-box testing. The process of delivering a beta version to the users is called beta release. The users of a beta version are said beta testers. They are usually customers or prospective customers of the organization that develops the software. They receive the software for free or for a reduced price, but act as free testers. Beta versions test the supportability of the product, the go-to-market messaging (while recruiting Beta customers), the manufacturability of the product, and the overall channel flow or channel reach. Beta version software is likely to be useful for internal demonstrations and previews to select customers, but unstable and not yet ready for release. Some developers refer to this stage as a preview, a prototype, a technical preview (TP) or as an early access. As the second major stage in the release lifecycle, following the alpha stage, it is named after the Greek letter beta, the second letter in the Greek alphabet. Often this stage begins when the developers announce a feature freeze on the product, indicating that no more feature requirements will be accepted for this version of the product. Only software issues, or bugs and unimplemented features will be addressed. Beta versions stand at an intermediate step in the full development cycle. Developers release either a closed beta or an open beta; closed beta versions are released to a select group of individuals for a user test, while open betas are to a larger community group, usually the general public. The testers report any bugs that they found and sometimes minor features they would like to see in the final version.

An example of a major public beta test was when Microsoft started releasing regular Windows Vista Community Technology Previews (CTP) to beta testers starting in January 2005. The first of these was build 5219. Subsequent CTPs introduced most of the planned features, as well as a number of changes to the user interface, based in large part on feedback from beta testers. Windows Vista was deemed feature complete with the release of build 5308 CTP, released on February 22, 2006, and much of the remainder of work between that build and the final release of the product focused on stability, performance, application and driver compatibility, and documentation. When a beta becomes available to the general public it is often widely used by the technologically savvy and those familiar with previous versions as though it were the finished product. Usually developers of freeware or open-source betas release them to the general public while proprietary betas go to a relatively small group of testers. Recipients of highly proprietary betas may have to sign a non-disclosure agreement. A release is called feature complete when the product team agrees that functional requirements of the system are met and no new features will be put into the release, but significant software bugs may still exist. Companies with a formal software process will tend to enter the beta period with a list of known bugs that must be fixed to exit the beta period, and some companies make this list available to customers and testers. As the Internet has allowed for rapid and inexpensive distribution of software, companies have begun to take a more flexible approach to use of the word "beta". Netscape Communications was infamous for releasing alpha level versions of its Netscape web browser as public beta releases. In February 2005, ZDNet published an article about the recent phenomenon of a beta version often staying for years and being used as if it were in production-level [1]. It noted that Gmail and Google News, for example, had been in beta for a long period of time and were not expected to drop the beta status despite the fact that they were widely used; however, Google News did leave beta in January 2006. This technique may also allow a developer to delay offering full support and/or responsibility for remaining issues. In the context of Web 2.0, people even talk of perpetual betas to signify that some software is meant to stay in beta state.

Origin of 'alpha' and 'beta' The term beta test applied to software comes from an early IBM hardware product test convention dating back to punched card tabulating and sorting machines. Hardware first went through an alpha test for preliminary functionality and small scale manufacturing feasibility. Then came a beta test to verify that it actually correctly performed the functions it was supposed to and could be manufactured at scales necessary for the market, and then a c test to verify safety. With the advent of programmable computers and the first shareable software programs, IBM used the same terminology for testing software. Beta tests were conducted by people or groups other than the developers. As other companies began developing software for their own use, and for distribution to others, the terminology stuck and now is part of our common vocabulary.

Release candidate The term release candidate refers to a version with potential to be a final product, ready to release unless fatal bugs emerge. In this stage, the product features all designed functionalities and no known showstopper class bugs. At this phase the product is usually code complete. Microsoft Corporation often uses the term release candidate. During the 1990s, Apple Computer used the term "golden master" for its release candidates, and the final golden master was the general availability release. Other terms include gamma (and occasionally also delta, and perhaps even more Greek letters) for versions that are substantially complete, but still under test, and omega for final testing of versions that are believed to be bug-free, and may go into production at any time. (Gamma, delta, and omega are, respectively, the third, fourth, and last letters of the Greek alphabet.) Some users disparagingly refer to release candidates and even final "point oh" releases as "gamma test" software, suggesting that the developer has chosen to use its customers to test software that is not truly ready for general release. Often, beta testers, if privately selected, will be billed for using the release candidate as though it were a finished product. A release is called code complete when the development team agrees that no entirely new source code will be added to this release. There may still be source code changes to fix defects. There may still be changes to documentation and data files, and to the code for test cases or utilities. New code may be added in a future release.

Gold/general availability release The gold or general availability release version is the final version of a particular product. It is typically almost identical to the final release candidate, with only last-minute bugs fixed. A gold release is considered to be very stable and relatively bugfree with a quality suitable for wide distribution and use by end users. In commercial software releases, this version may also be signed (used to allow end-users to verify that code has not been modified since the release). The expression that a software product "has gone gold" means that the code has been completed and "is being mass-produced and will be for sale soon." Other terms for the version include gold master, gold release, or gold build. The term gold anecdotally refers to the use of "gold master disc" which was commonly used to send the final version to manufacturers who use it to create the massproduced retail copies. It may in this context be a hold-over from music production. In some cases, however, the master disc is still actually made of gold, for both aesthetic appeal and resistance to corrosion.

RTM / RTW Microsoft and others use the term "release to manufacturing" (RTM) to refer to this version (for commercial products, like Windows XP, as in, "Build 2600 is the Windows XP RTM release"), and "release to Web" (RTW) for freely downloadable products. Typically, RTM is at least one or two weeks before GA because the RTM version must be burnt to disc and boxed etc. Box copy A box copy is the final product, printed on a disc that is included in the actual release, complete with disc graphic art. This term is used mostly by reviewers to differentiate from gold master discs. A box copy does not necessarily come enclosed in the actual boxed product - it refers to the disc itself.

Stable/unstable In open source programming, version numbers or the terms stable and unstable commonly distinguish the stage of development. The term stable refers to a version of software that is substantially identical to a version that has been through enough realworld testing to reasonably assume there are no showstopper problems, or at least that any problems are known and documented. On the other hand, the term unstable does not necessarily mean that there are problems - rather, that enhancements or changes have been made to the software that have not undergone rigorous testing and that more changes are expected to be imminent. Users of such software are advised to use the stable version if it meets their needs, and to only use the unstable version if the new functionality is of interest that exceeds the risk that something might simply not work right. In the Linux kernel, version numbers take the form of three numbers, separated by a decimal point. Prior to the 2.6.x series, an even second number was used to represent a stable release and an odd second number used to represent an unstable release. As of the 2.6.x series, the even or odd status of the second number no longer holds any significance. The practice of using even and odd numbers to indicate the stability of a release has been used by many other open and closed source projects.

Dynamic program analysis Dynamic code analysis is the analysis of computer software that is performed with executing programs built from that software on a real or virtual processor (analysis performed without executing programs is known as static code analysis). Such tools may require loading of special libraries or even recompilation of program code.

Examples •

• •



Valgrind, performances run on a virtual processor, can detect memory errors (e.g. connected with misuse malloc and free) and race conditions in multithread programs. Dmalloc, library for checking memory allocation and leaks. Software must be recompiled, and all files must include the special C header file dmalloc.h. VB Watch injects dynamic analysis code into Visual Basic programs to monitor their performance, call stack, execution trace, instantiated objects, variables and code coverage. What is dynamic analysis and how can it be automated? Dynamic Analysis uses test data sets to execute software in order to observe its behaviour and produce test coverage reports. This assessment of source code ensures consistent levels of high quality testing and correct use of capture/playback tools.

Exploratory testing Exploratory testing is an approach in software testing with simultaneous learning, test design and test execution. While the software is being tested, the tester learns things that together with experience and creativity generates new good tests to run.

Contents • • •

1 History 2 Description 3 Benefits and drawbacks



4 Usage

History Exploratory testing has been performed for a long time, and has similarities to ad hoc testing. In the early 1990s, ad hoc was too often synonymous with sloppy and careless work. As a result, a group of test methodologists (now calling themselves the Context-Driven School) began using the term "exploratory" seeking to emphasize the dominant thought process involved in unscripted testing, and to begin to develop the practice into a teachable discipline. This new terminology was first published by Cem Kaner in his book Testing Computer Software. Exploratory testing can be as disciplined as any other intellectual activity.

Description Exploratory testing seeks to find out how the software actually works, and to ask questions about how it will handle difficult and easy cases. The testing is dependent on the testers skill of inventing test cases and finding defects. The more the tester knows about the product and different test methods, the better the testing will be. To further explain, comparison can be made to the antithesis scripted testing, which basically means that test cases are designed in advance, including steps to reproduce and expected results. These tests are later performed by a tester who compares the actual result with the expected. When performing exploratory testing, there are no exact expected results; it is the tester that decides what will be verified, critically investigating the correctness of the result. In reality, testing almost always is a combination of exploratory and scripted testing, but with a tendency towards either one, depending on context. The documentation of exploratory testing ranges from documenting all tests performed to just documenting the bugs. During pair testing, two persons create test cases together; one performs them, and the other documents. Session-based testing is a method specifically designed to make exploratory testing auditable and measurable on a wider scale.

Benefits and drawbacks The main advantage of exploratory testing is that less preparation is needed, important bugs are found fast, and is more intellectually stimulating than scripted testing. Disadvantages are that the tests can't be reviewed in advance (and by that prevent errors in code and test cases), and that it can be difficult to show exactly which tests have been run.When repeating exploratory tests, they will not be performed in the exact same manner, which can be an advantage if it is important to find new errors; or a disadvantage if it is more important to know that exact things are functional.

Usage Exploratory testing is extra suitable if requirements and specifications are incomplete, or if there is lack of time. The method can also be used to verify that previous testing has found the most important defects. It is common to perform a combination of exploratory and scripted testing where the choice is based on risk. An example of exploratory testing in practice is Microsofts verification of Windows compatibility.

Category:SoftwareTestingFormal Verification In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.

Contents • • • •

1 Explanation 2 Usage 3 Approaches to formal verification 4 Validation and Verification



5 Program verification

Explanation Software testing alone cannot prove that a system does not contain any defects. Neither can it prove that it does have a certain property. Only the process of formal verification can prove that a system does not have a certain defect or does have a certain property. It is impossible to prove or test that a system has "no defect" since it is impossible to formally specify what "no defect" means. All that can be done is prove that a system does not have any of the defects that can be thought of, and has all of the properties that together make it functional and useful.

Usage Formal verification can be used for example for systems such as cryptographic protocols, combinatorial circuits, digital circuits with internal memory, and software expressed as source code. The verification of these systems is done by providing a formal proof on an abstract mathematical model of the system, the correspondence between the mathematical model and the nature of the system being otherwise known by construction. Examples of mathematical objects often used to model systems are: finite state machines, labelled transition systems, Petri nets, timed automata, hybrid automata, process algebra, formal semantics of programming languages such as operational semantics, denotational semantics, axiomatic semantics and Hoare logic.

Approaches to formal verification There are roughly two approaches to formal verification. The first approach is model checking, which consists of a systematically exhaustive exploration of the mathematical model (this is only possible for a model that is finite). Usually this consists of exploring all states and transitions in the model, by using smart and domain-specific abstraction techniques to consider whole groups of states in a single operation and reduce computing time. Implementation techniques include state space enumeration, symbolic state space enumeration, abstract interpretation, symbolic simulation, abstraction refinement. The second approach is logical inference. It consists of using a formal version of mathematical reasoning about the system, usually using theorem proving software such as the HOL theorem prover or Isabelle theorem prover. This is usually only partially automated and is driven by the user's understanding of the system to validate. The properties to be verified are often described in temporal logics, such as linear temporal logic (LTL) or computational tree logic (CTL).

Validation and Verification Verification is one aspect of testing a product's fitness for purpose. Validation is the complementary aspect. Often one refers to the overall checking process as V & V. • •

Validation: "Are we building the right product?", i.e., does the product do what the user really requires? Verification: "Are we building the product right?", i.e., does the product conform to the specifications?

The verification process consists of static and dynamic parts. E.g., for a software product one can inspect the source code (static) and run against specific test cases (dynamic). Validation usually can only be done dynamically, i.e., the product is tested by putting it through typical usages and atypical usages ("Can we break it?"). See also Verification and Validation

Program verification Program verification is the process of formally proving that a computer program does exactly what is stated in the program specification it was written to realize. This is a type of formal verification which is specifically aimed at verifying the code itself, not an abstract model of the program.

For functional programming languages, some programs can be verified by equational reasoning, usually together with induction. Code in an imperative language could be proved correct by use of Hoare logic.

Fuzz testing Fuzz testing or fuzzing is a software testing technique that provides random data ("fuzz") to the inputs of a program. If the program fails (for example, by crashing, or by failing built-in code assertions), the defects can be noted. The great advantage of fuzz testing is that the test design is extremely simple, and free of preconceptions about system behavior. Fuzz testing was developed at the University of Wisconsin-Madison in 1989 by Professor Barton Miller and the students in his graduate Advanced Operating Systems class.

Contents • •

1 Uses 2 Fuzz testing methods o 2.1 Advantages and disadvantages o 2.2 Event-driven fuzz o 2.3 Character-driven fuzz o

2.4 Database fuzz

Uses Fuzz testing is often used in large software development projects that perform black box testing. These usually have a budget to develop test tools, and fuzz testing is one of the techniques which offers a high benefit to cost ratio. Fuzz testing is also used as a gross measurement of a large software system's quality. The advantage here is that the cost of generating the tests is relatively low. For example, third party testers have used fuzz testing to evaluate the relative merits of different operating systems and application programs. Fuzz testing is thought to enhance software security and software safety because it often finds odd oversights and defects which human testers would fail to find, and even careful human test designers would fail to create tests for. However, fuzz testing is not a substitute for exhaustive testing or formal methods: it can only provide a random sample of the system's behavior, and in many cases passing a fuzz test may only demonstrate that a piece of software handles exceptions without

crashing, rather than behaving correctly. Thus, fuzz testing can only be regarded as a bugfinding tool rather than an assurance of quality.

Fuzz testing methods As a practical matter, developers need to reproduce errors in order to fix them. For this reason, almost all fuzz testing makes a record of the data it manufactures, usually before applying it to the software, so that if the computer fails dramatically, the test data is preserved. If the fuzz stream is pseudo-random number generated it may be easier to store the seed value to reproduce the fuzz attempt. Modern software has several different types of inputs: • • • •

Event driven inputs are usually from a graphical user interface, or possibly from a mechanism in an embedded system. Character driven inputs are from files, or data streams such as sockets. Database inputs are from tabular data, such as relational databases. Inherited program state such as environment variables

There are at least two different forms of fuzz testing: • • •

Valid fuzz attempts to assure that the random input is reasonable, or conforms to actual production data. Simple fuzz usually uses a pseudo random number generator to provide input. A combined approach uses valid test data with some proportion of totally random input injected.

By using all of these techniques in combination, fuzz-generated randomness can test the un-designed behavior surrounding a wider range of designed system states. Fuzz testing may use tools to simulate all of these domains.

Advantages and disadvantages The main problem with fuzzing to find program faults is that it generally only finds very simple faults. The problem itself is exponential and every fuzzer takes shortcuts to find something interesting in a timeframe that a human cares about. A primitive fuzzer may have poor code coverage; for example, if the input includes a checksum which is not properly updated to match other random changes, only the checksum validation code will be verified. Code coverage tools are often used to estimate how "well" a fuzzer works, but these are only guidelines to fuzzer quality. Every fuzzer can be expected to find a different set of bugs. On the other hand, bugs found using fuzz testing are frequently severe, exploitable bugs that could be used by a real attacker. This has become even more true as

fuzz testing has become more widely known, as the same techniques and tools are now used by attackers to exploit deployed software. This is a major advantage over binary or source auditing, or even fuzzing's close cousin, fault injection, which often relies on artificial fault conditions that are difficult or impossible to exploit.

Event-driven fuzz Normally this is provided as a queue of datastructures. The queue is filled with data structures that have random values. The most common problem with an event-driven program is that it will often simply use the data in the queue, without even crude validation. To succeed in a fuzztested environment, software must validate all fields of every queue entry, decode every possible binary value, and then ignore impossible requests. One of the more interesting issues with real-time event handling is that if error reporting is too verbose, simply providing error status can cause resource problems or a crash. Robust error detection systems will report only the most significant, or most recent error over a period of time.

Character-driven fuzz Normally this is provided as a stream of random data. The classic source in UNIX is the random data generator. One common problem with a character driven program is a buffer overrun, when the character data exceeds the available buffer space. This problem tends to recur in every instance in which a string or number is parsed from the data stream and placed in a limited-size area. Another is that decode tables or logic may be incomplete, not handling every possible binary value.

Database fuzz The standard database scheme is usually filled with fuzz that is random data of random sizes. Some IT shops use software tools to migrate and manipulate such databases. Often the same schema descriptions can be used to automatically generate fuzz databases. Database fuzz is controversial, because input and comparison constraints reduce the invalid data in a database. However, often the database is more tolerant of odd data than its client software, and a general-purpose interface is available to users. Since major customer and enterprise management software is starting to be open-source, databasebased security attacks are becoming more credible.

A common problem with fuzz databases is buffer overflow. A common data dictionary, with some form of automated enforcement is quite helpful and entirely possible. To enforce this, normally all the database clients need to be recompiled and retested at the same time. Another common problem is that database clients may not understand the binary possibilities of the database field type, or, legacy software might have been ported to a new database system with different possible binary values. A normal, inexpensive solution is to have each program validate database inputs in the same fashion as user inputs. The normal way to achieve this is to periodically "clean" production databases with automated verifiers.

White noise Integration testing Integration testing (sometimes called Integration and Testing, abbreviated I&T) is the phase of software testing in which individual software modules are combined and tested as a group. It follows unit testing and precedes system testing. Integration testing takes as its input modules that have been unit tested, groups them in larger aggregates, applies tests defined in an integration test plan to those aggregates, and delivers as its output the integrated system ready for system testing.

Contents •

1 Purpose



2 Limitations

Purpose The purpose of integration testing is to verify functional, performance and reliability requirements placed on major design items. These "design items", i.e. assemblages (or groups of units), are exercised through their interfaces using black box testing, success and error cases being simulated via appropriate parameter and data inputs. Simulated usage of shared data areas and inter-process communication is tested and individual subsystems are exercised through their input interface. Test cases are constructed to test that all components within assemblages interact correctly, for example across procedure calls or process activations, and this is done after testing individual modules, i.e. unit testing. The overall idea is a "building block" approach, in which verified assemblages are added to a verified base which is then used to support the integration testing of further assemblages.

The different types of integration testing are big bang, top-down, bottom-up, and back bone. Big Bang: In this approach, all or most of the developed modules are coupled together to form a complete software system or major part of the system and then used for integration testing. The Big Bang method is very effective for saving time in the integration testing process. However, if the test cases and their results are not recorded properly, the entire integration process will be more complicated and may prevent the testing team from achieving the goal of integration testing. Bottom Up: All the bottom or low level modules, procedures or functions are integrated and then tested. After the integration testing of lower level integrated modules, the next level of modules will be formed and can be used for integration testing. This approach is helpful only when all or most of the modules of the same development level are ready. This method also helps to determine the levels of software developed and makes it easier to report testing progress in the form of a percentage.

Limitations Any conditions not stated in specified integration tests, outside of the confirmation of the execution of design items, will generally not be tested. Integration tests can not include system-wide (end-to-end) change testing.

Software testing Test case In software engineering, the most common definition of a test case is a set of conditions or variables under which a tester will determine if a requirement or use case upon an application is partially or fully satisfied. It may take many test cases to determine that a requirement is fully satisfied. In order to fully test that all the requirements of an application are met, there must be at least one test case for each requirement unless a requirement has sub requirements. In that situation, each sub requirement must have at least one test case. This is frequently done using a Traceability matrix. Some methodologies, like RUP, recommend creating at least two test cases for each requirement. One of them should perform positive testing of requirement and other should perform negative testing. Written test cases should include a description of the functionality to be tested, and the preparation required to ensure that the test can be conducted. If the application is created without formal requirements, then test cases can be written based on the accepted normal operation of programs of a similar class. In some schools of testing, test cases are not written at all but the activities and results are reported after the tests have been run. What characterizes a formal, written test case is that there is a known input and an expected output, which is worked out before the test is executed. The known input should test a precondition and the expected output should test a postcondition. Under special circumstances, there could be a need to run the test, produce results, and then a team of experts would evaluate if the results can be considered as a pass. This happens often on new products' performance number determination. The first test is taken as the base line for subsequent test / product release cycles. Written test cases are usually collected into Test suites. A variation of test cases are most commonly used in acceptance testing. Acceptance testing is done by a group of end-users or clients of the system to ensure the developed system meets their requirements. User acceptance testing is usually differentiated by the inclusion of happy path or positive test cases.

Structure of test case Formal, written test cases consist of three main parts with subsections: •

Information contains general information about Test case. o Identifier is unique identifier of test case for further references, for example, while describing found defect. o Test case owner/creator is name of tester or test designer, who created test or is responsible for its development

Version of current Test case definition Name of test case should be human-oriented title which allows to quickly understand test case purpose and scope. o Identifier of the requirement which is covered by the test case. Also here could be an identifier of a use case or a functional specification item. o Purpose contains short description of test purpose, what functionality it checks. o Dependencies Test case activity o Testing environment/configuration contains information about configuration of hardware or software which must be met while executing test case o Initialization describes actions, which must be performed before test case execution is started. For example, we should open some file. o Finalization describes actions to be done after test case is performed. For example if test case crashes database, tester should restore it before other test cases will be performed. o Actions step by step to be done to complete test. o Input data description Results o Expected results contains description of what tester should see after all test steps has been completed o Actual results contains a brief description of what the tester saw after the test steps has been completed. This is often replaced with a Pass/Fail. Quite often if a test case fails, reference to the defect involved should be listed in this column. o o





Not all written tests require all of these sections. However, the bare bones of a test can be reduced to three essential steps: • • •

Establish the preconditions Exercise the item under test Verify the postconditions

It is important to note that if the preconditions cannot be established, the item cannot be tested according to its software requirements specification and the test must not proceed. Verifying the postcondition is equivalent to establishing that the actual results are as expected. Note that several tests may need to be run to challenge the postcondition. For example, to test a user login routine would need at least a case of a known usernamepassword pair and a second case of an unknown username-password pair. •

System testing

and the other is Unit testing

Traceability matrix In a software development process, a Traceability Matrix is a table that correlates any two baselined documents that require a many to many relationship to determine the completeness of the relationship. It is often used with high-level requirements (sometimes known as Marketing Requirements) and detailed requirements of the software product to the matching parts of high-level design, detailed Design, test plan, and test cases. Common usage is to take the identifier for each of the items of one document and place them in the left column. The identifiers for the other document are placed across the top row. When an item in the left column is related to an item across the top, a mark is placed in the intersecting cell. The number of relationships are added up for each row and each column. This value indicates the mapping of the two items. Zero values indicate that no relationship exists and that one must be made. Large values imply that the item is too complex and should be simplified. To ease with the creation of traceability matrices, it is advisable to add the relationships to the source documents for both backward traceability and forward traceability. In other words, when an item is changed in one baselined document, it's easy to see what needs to be changed in the other.

Sample traceability matrix Requireme reqs nt tested Identifiers

REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 REQ1 UC UC UC UC UC UC UC UC UC UC UC TECH TECH TECH 1.0 1.2 1.3 2.1 2.2 2.3.1 2.3.2 2.3.3 2.4 3.1 3.2 1.1 1.2 1.3

Test Cases

3

21

2

3

x

x

1

1

1

1

1

1

2

3

1

1

1

tested implicitly 0

1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.6 1.1.7 1.2.1 1.2.2 1.2.3 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 etc… 5.6.2

1

x

2 2

x

x

1 2 1 1 2 2 2

x x

x x x X

x x

x x

x

1

x

1

x

1

x

1

x

1

x

1

x

Unit testing In computer programming, unit testing is a procedure used to validate that individual units of source code are working properly. A unit is the smallest testable part of an application. In procedural programming a unit may be an individual program, function, procedure etc, while in object-oriented programming, the smallest unit is always a Class; which may be a base/super class, abstract class or derived/child class. Units are distinguished from modules in that modules are typically made up of units. Ideally, each test case is independent from the others; mock objects and test harnesses can be used to assist testing a module in isolation. Unit testing is typically done by the developers and not by end-users.

Contents •

• •

1 Benefit o 1.1 Facilitates change o 1.2 Simplifies integration o 1.3 Documentation o 1.4 Separation of interface from implementation 2 Limitations of unit testing 3 Applications o 3.1 Extreme Programming o 3.2 Techniques o

3.3 Unit testing frameworks

Benefit The goal of unit testing is to isolate each part of the program and show that the individual parts are correct. A unit test provides a strict, written contract that the piece of code must satisfy. As a result, it affords several benefits.

Facilitates change Unit testing allows the programmer to refactor code at a later date, and make sure the module still works correctly (i.e. regression testing). The procedure is to write test cases for all functions and methods so that whenever a change causes a fault, it can be quickly identified and fixed. Readily-available unit tests make it easy for the programmer to check whether a piece of code is still working properly. Good unit test design produces test cases that cover all paths through the unit with attention paid to loop conditions.

In continuous unit testing environments, through the inherent practice of sustained maintenance, unit tests will continue to accurately reflect the intended use of the executable and code in the face of any change. Depending upon established development practices and unit test coverage, up-to-the-second accuracy can be maintained.

Simplifies integration Unit testing helps to eliminate uncertainty in the units themselves and can be used in a bottom-up testing style approach. By testing the parts of a program first and then testing the sum of its parts, integration testing becomes much easier. A heavily debated matter exists in assessing the need to perform manual integration testing. While an elaborate hierarchy of unit tests may seem to have achieved integration testing, this presents a false sense of confidence since integration testing evaluates many other objectives that can only be proven through the human factor. Some argue that given a sufficient variety of test automation systems, integration testing by a human test group is unnecessary. Realistically, the actual need will ultimately depend upon the characteristics of the product being developed and its intended uses. Additionally, the human or manual testing will greatly depend on the availability of resources in the organization.

Documentation Unit testing provides a sort of "living document". Clients and other developers looking to learn how to use the module can look at the unit tests to determine how to use the module to fit their needs and gain a basic understanding of the API. Unit test cases embody characteristics that are critical to the success of the unit. These characteristics can indicate appropriate/inappropriate use of a unit as well as negative behaviors that are to be trapped by the unit. A unit test case, in and of itself, documents these critical characteristics, although many software development environments do not rely solely upon code to document the product in development. On the other hand, ordinary narrative documentation is more susceptible to drifting from the implementation of the program and will thus become outdated (e.g. design changes, feature creep, relaxed practices to keep documents up to date).

Separation of interface from implementation Because some classes may have references to other classes, testing a class can frequently spill over into testing another class. A common example of this is classes that depend on a database: in order to test the class, the tester often writes code that interacts with the database. This is a mistake, because a unit test should never go outside of its own class boundary. As a result, the software developer abstracts an interface around the database connection, and then implements that interface with their own mock object. By abstracting this necessary attachment from the code (temporarily reducing the net

effective coupling), the independent unit can be more thoroughly tested than may have been previously achieved. This results in a higher quality unit that is also more maintainable.

Limitations of unit testing Unit testing will not catch every error in the program. By definition, it only tests the functionality of the units themselves. Therefore, it will not catch integration errors, performance problems or any other system-wide issues. In addition, it may not be easy to anticipate all special cases of input the program unit under study may receive in reality. Unit testing is only effective if it is used in conjunction with other software testing activities. It is unrealistic to test all possible input combinations for any non-trivial piece of software. Like all forms of software testing, unit tests can only show the presence of errors; it cannot show the absence of errors. To obtain the intended benefits from unit-testing, a rigorous sense of discipline is needed throughout the software development process. It is essential to keep careful records, not only of the tests that have been performed, but also of all changes that have been made to the source-code of this or any other unit in the software. Use of a version control system is essential; If a later version of the unit fails a particular test that it had previously passed, the version-control software can provide list of the source-code changes (if any) that have been applied to the unit since that time.

Applications Extreme Programming The cornerstone of Extreme Programming (XP) is the unit test. XP relies on an automated unit testing framework. This automated unit testing framework can be either third party, e.g. xUnit, or created within the development group. Extreme Programming uses the creation of unit tests for test-driven development. The developer writes a unit test that exposes either a software requirement or a defect. This test will fail because either the requirement isn't implemented yet, or because it intentionally exposes a defect in the existing code. Then, the developer writes the simplest code to make the test, along with other tests, pass. All classes in the system are unit tested. Developers release unit testing code to the code repository in conjunction with the code it tests. XP's thorough unit testing allows the benefits mentioned above, such as simpler and more confident code development and refactoring, simplified code integration, accurate documentation, and more modular designs. These unit tests are also constantly run as a form of regression test.

Techniques Unit testing is commonly automated, but may still be performed manually. The IEEE[1] does not favor one over the other. A manual approach to unit testing may employ a step-by-step instructional document. Nevertheless, the objective in unit testing is to isolate a unit and validate its correctness. Automation is efficient for achieving this, and enables the many benefits listed in this article. Conversely, if not planned carefully, a careless manual unit test case may execute as an integration test case that involves many software components, and thus preclude the achievement of most if not all of the goals established for unit testing. Under the automated approach, to fully realize the effect of isolation, the unit or code body subjected to the unit test is executed within a framework outside of its natural environment, that is, outside of the product or calling context for which it was originally created. Testing in an isolated manner has the benefit of revealing unnecessary dependencies between the code being tested and other units or data spaces in the product. These dependencies can then be eliminated. Using an automation framework, the developer codes criteria into the test to verify the correctness of the unit. During execution of the test cases, the framework logs those that fail any criterion. Many frameworks will also automatically flag and report in a summary these failed test cases. Depending upon the severity of a failure, the framework may halt subsequent testing. As a consequence, unit testing is traditionally a motivator for programmers to create decoupled and cohesive code bodies. This practice promotes healthy habits in software development. Design patterns, unit testing, and refactoring often work together so that the most ideal solution may emerge.

Unit testing frameworks Unit testing frameworks, which help simplify the process of unit testing, have been developed for a wide variety of languages. It is generally possible to perform unit testing without the support of specific framework by writing client code that exercises the units under test and uses assertion, exception, or early exit mechanisms to signal failure. This approach is valuable in that there is a negligible barrier to the adoption of unit testing. However, it is also limited in that many advanced features of a proper framework are missing or must be hand-coded.

Developer Black box testing Black box testing takes an external perspective of the test object to derive test cases. These tests can be functional or non-functional, though usually functional. The test designer selects valid and invalid input and determines the correct output. There is no knowledge of the test object's internal structure. This method of test design is applicable to all levels of software testing: unit, integration, functional testing, system and acceptance. The higher the level, and hence the bigger and more complex the box, the more one is forced to use black box testing to simplify. While this method can uncover unimplemented parts of the specification, one cannot be sure that all existent paths are tested.

Contents • •

1 Test design techniques 2 User input validation



3 Hardware

Test design techniques Typical black box test design techniques include: • • • • • • •

Equivalence partitioning Boundary value analysis Decision table testing Pairwise testing State transition tables Use case testing Cross-functional testing

User input validation User input must be validated to conform to expected values. For example, if the software program is requesting input on the price of an item, and is expecting a value such as 3.99, the software must check to make sure all invalid cases are handled. A user could enter the price as "-1" and achieve results contrary to the design of the program. Other examples of entries that could be entered and cause a failure in the software include: "1.20.35", "Abc", "0.000001", and "999999999". These are possible test scenarios that should be entered for each point of user input.

Other domains, such as text input, need to restrict the length of the characters that can be entered. If a program allocates 30 characters of memory space for a name, and the user enters 50 characters, a buffer overflow condition can occur. Typically when invalid user input occurs, the program will either correct it automatically, or display a message to the user that their input needs to be corrected before proceeding.

Hardware Functional testing devices like power supplies, amplifiers, and many other simple function electrical devices is common in the electronics industry. Automated functional testing of specified characteristics is used for production testing, and part of design validation.

Equivalence partitioning Equivalence partitioning is a software testing related technique with the goal: 1. To reduce the number of test cases to a necessary minimum. 2. To select the right test cases to cover all possible scenarios. Although in rare cases equivalence partitioning is also applied to outputs of a software component, typically it is applied to the inputs of a tested component. The equivalence partitions are usually derived from the specification of the component's behaviour. An input has certain ranges which are valid and other ranges which are invalid. This may be best explained at the following example of a function which has the pass parameter "month" of a date. The valid range for the month is 1 to 12, standing for January to December. This valid range is called a partition. In this example there are two further partitions of invalid ranges. The first invalid partition would be <= 0 and the second invalid partition would be >= 13. .... -2 -1 0 1 .............. 12 13 14 15 ..... --------------|-------------------|--------------------invalid partition 1 valid partition invalid partition 2

Equivalence partitioning is no stand alone method to determine test cases. It has to be supplemented by boundary value analysis. Having determined the partitions of possible inputs the method of boundary value analysis has to be applied to select the most effective test cases out of these partitions.

Contents •

1 The Theory



2 Black Box vs. White Box



3 Types of Equivalence Classes

The Theory The testing theory related to equivalence partitioning says that only one test case of each partition is needed to evaluate the behaviour of the program for the related partition. In other words it is sufficient to select one test case out of each partition to check the behaviour of the program. To use more or even all test cases of a partition will not find new faults in the program. The values within one partition are considered to be "equivalent". Thus the number of test cases can be reduced considerably. An additional effect by applying this technique is that you also find the so called "dirty" test cases. An unexperienced tester may be tempted to use as test cases the input data 1 to 12 for the month and forget to select some out of the invalid partitions. This would lead to a huge number of unnecessary test cases on the one hand, and a lack of test cases for the dirty ranges on the other hand.

Black Box vs. White Box The tendency is to relate equivalence partitioning to the so called black box testing which is strictly checking a software component at its interface, without consideration of internal structures of the software. But having a closer look on the subject there are cases where it applies to the white box testing as well. Imagine an interface to a component which has a valid range between 1 and 12 like in the example above. However internally the function may have a differentiation of values between 1 and 6 and the values between 7 and 12. Depending on the input value the software internally will run through different paths to perform slightly different actions. Regarding the input and output interfaces to the component this difference will not be noticed, however in your white-box testing you would like to make sure that both paths are examined. To achieve this it is necessary to introduce additional equivalence partitions which would not be needed for black-box testing. For this example this would be: .... -2 -1 0 1 ..... 6 7 ..... 12 13 14 15 ..... --------------|---------|----------|--------------------invalid partition 1 P1 P2 invalid partition 2 valid partitions

To check for the expected results you would need to evaluate some internal intermediate values rather than the output interface.

Types of Equivalence Classes •

Continuous classes run from one point to another, with no clear separations of values. An example is a temperature range.



Discrete classes have clear separation of values. Discrete classes are sets, or enumerations.



Boolean classes are either true or false. Boolean classes only have two values, either true or false, on or off, yes or no. An example is whether a checkbox is checked or unchecked

Boundary value analysis Boundary value analysis is a software testing design technique to determine test cases covering off-by-one errors. The boundaries of software component input ranges are areas of frequent problems.

Introduction Testing experience has shown that especially the boundaries of input ranges to a software component are liable to defects. A programmer who has to implement e.g. the range 1 to 12 at an input, which e.g. stands for the month January to December in a date, has in his code a line checking for this range. This may look like: if (month > 0 && month < 13)

But a common programming error may check a wrong range e.g. starting the range at 0 by writing: if (month >= 0 && month < 13)

For more complex range checks in a program this may be a problem which is not so easily spotted as in the above simple example.

Applying boundary value analysis To set up boundary value analysis test cases you first have to determine which boundaries you have at the interface of a software component. This has to be done by applying the equivalence partitioning technique. Boundary value analysis and equivalence partitioning are inevitably linked together. For the example of the month in a date you would have the following partitions:

... -2 -1 0 1 .............. 12 13 14 15 ..... --------------|-------------------|--------------------invalid partition 1 valid partition invalid partition 2

Applying boundary value analysis you have to select now a test case at each side of the boundary between two partitions. In the above example this would be 0 and 1 for the lower boundary as well as 12 and 13 for the upper boundary. Each of these pairs consists of a "clean" and a "dirty" test case. A "clean" test case should give you a valid operation result of your program. A "dirty" test case should lead to a correct and specified input error treatment such as the limiting of values, the usage of a substitute value, or in case of a program with a user interface, it has to lead to warning and request to enter correct data. The boundary value analysis can have 6 testcases.n, n-1,n+1 for the upper limit and n, n-1,n+1 for the lower limit. A further set of boundaries has to be considered when you set up your test cases. A solid testing strategy also has to consider the natural boundaries of the data types used in the program. If you are working with signed values this is especially the range around zero (-1, 0, +1). Similar to the typical range check faults, programmers tend to have weaknesses in their programs in this range. e.g. this could be a division by zero problem where a zero value may occur although the programmer always thought the range started at 1. It could be a sign problem when a value turns out to be negative in some rare cases, although the programmer always expected it to be positive. Even if this critical natural boundary is clearly within an equivalence partition it should lead to additional test cases checking the range around zero. A further natural boundary is the natural lower and upper limit of the data type itself. E.g. an unsigned 8-bit value has the range of 0 to 255. A good test strategy would also check how the program reacts at an input of -1 and 0 as well as 255 and 256. The tendency is to relate boundary value analysis more to the so called black box testing which is strictly checking a software component at its interfaces, without consideration of internal structures of the software. But looking closer at the subject, there are cases where it applies also to white box testing. After determining the necessary test cases with equivalence partitioning and subsequent boundary value analysis, it is necessary to define the combinations of the test cases when there are multiple inputs to a software component.

Decision Table Decision tables are a precise yet compact way to model complicated logic. Decision tables, like if-then-else and switch-case statements, associate conditions with actions to perform. But, unlike the control structures found in traditional programming languages, decision tables can associate many independent conditions with several actions in an elegant way.

Contents • •

1 Structure 2 Example



3 Software engineering benefits

Structure Decision tables are typically divided into four quadrants, as shown below. The four quadrants Conditions Condition alternatives Actions

Action entries

Each decision corresponds to a variable, relation or predicate whose possible values are listed among the condition alternatives. Each action is a procedure or operation to perform, and the entries specify whether (or in what order) the action is to be performed for the set of condition alternatives the entry corresponds to. Many decision tables include in their condition alternatives the don't care symbol, a hyphen. Using don't cares can simplify decision tables, especially when a given condition has little influence on the actions to be performed. In some cases, entire conditions thought to be important initially are found to be irrelevant when none of the conditions influence which actions are performed. Aside from the basic four quadrant structure, decision tables vary widely in the way the condition alternatives and action entries are represented. Some decision tables use simple true/false values to represent the alternatives to a condition (akin to if-thenelse), other tables may use numbered alternatives (akin to switch-case), and some tables even use fuzzy logic or probabilistic representations for condition alternatives. In a similar way, action entries can simply represent whether an action is to be performed (check the actions to perform), or in more advanced decision tables, the sequencing of actions to perform (number the actions to perform).

Example The limited-entry decision table is the simplest to describe. The condition alternatives are simple boolean values, and the action entries are check-marks, representing which of the actions in a given column are to be performed. A technical support company writes a decision table to diagnose printer problems based upon symptoms described to them over the phone from their clients.

Printer does not print

Y Y Y Y N N N N

Conditions A red light is flashing

Y Y N N Y Y N N

Printer is unrecognized

Y N Y N Y N Y N

Check the power cable Check the printer-computer cable Actions

X X

X

Ensure printer software is installed X

X

Check/replace ink Check for paper jam

X

X X X

X

X X X

Of course, this is just a simple example,it demonstrates how decision tables can scale to several conditions with many possibilities.

Software engineering benefits Decision tables make it easy to observe that all possible conditions are accounted for. In the example above, every possible combination of the three conditions is given. In decision tables, when conditions are omitted, it is obvious even at a glance that logic is missing. Compare this to traditional control structures, where it is not easy to notice gaps in program logic with a mere glance --- sometimes it is difficult to follow which conditions correspond to which actions! Just as decision tables make it easy to audit control logic, decision tables demand that a programmer think of all possible conditions. With traditional control structures, it is easy to forget about corner cases, especially when the else statement is optional. Since logic is so important to programming, decision tables are an excellent tool for designing control logic. In one incredible anecdote, after a failed 6 man-year attempt to describe program logic for a file maintenance system using flow charts, four people solved the problem using decision tables in just four weeks. Choosing the right tool for the problem is fundamental.

System testing System testing is testing conducted on a complete, integrated system to evaluate the system's compliance with its specified requirements. System testing falls within the scope of black box testing, and as such, should require no knowledge of the inner design of the code or logic. [1] As a rule, system testing takes, as its input, all of the "integrated" software components that have successfully passed integration testing and also the software system itself integrated with any applicable hardware system(s). The purpose of integration testing is to detect any inconsistencies between the software units that are integrated together (called assemblages) or between any of the assemblages and the hardware. System testing is a more limiting type of testing; it seeks to detect defects both within the "inter-assemblages" and also within the system as a whole.

Contents • •

1 Testing the whole system 2 Types of system testing

Testing the whole system System testing is actually done to the entire system against the Functional Requirement Specification(s) (FRS) and/or the System Requirement Specification (SRS). Moreover, the system testing is an investigatory testing phase, where the focus is to have almost a destructive attitude and test not only the design, but also the behaviour and even the believed expectations of the customer. It is also intended to test up to and beyond the bounds defined in the software/hardware requirements specification(s). One could view System testing as the final destructive testing phase before user acceptance testing.

Types of system testing The following examples are different types of testing that should be considered during System testing: • • • • • • • • • • • • • • • • • • • • •

User interface testing Usability testing Performance testing Compatibility testing Error handling testing Load testing Volume testing Stress testing User help testing Security testing Capacity testing Sanity testing Smoke testing Exploratory testing Adhoc testing Regression testing Reliability testing Recovery testing Installation testing Maintenance testing Accessibility testing, including compliance with: o Americans with Disabilities Act of 1990 o Section 508 Amendment to the Rehabilitation Act of 1973 o Web Accessibility Initiative (WAI) of the World Wide Web Consortium (W3C)

Although different testing organizations may prescribe different tests as part of System testing, this list serves as a general framework or foundation to begin with.

Usability testing Usability testing is a means for measuring how well people can use some humanmade object (such as a web page, a computer interface, a document, or a device) for its intended purpose, i.e. usability testing measures the usability of the object. Usability testing focuses on a particular object or a small set of objects, whereas general humancomputer interaction studies attempt to formulate universal principles. If usability testing uncovers difficulties, such as people having difficulty understanding instructions, manipulating parts, or interpreting feedback, then developers should improve the design and test it again. During usability testing, the aim is to observe people using the product in as realistic a situation as possible, to discover errors and areas of improvement. Designers commonly focus excessively on creating designs that look "cool", compromising usability and functionality. This is often caused by pressure from the people in charge, forcing designers to develop systems based on management expectations instead of people's needs. A designer's primary function should be more than appearance, including making things work with people. Simply gathering opinions on an object or document is market research, rather than usability testing. Usability testing usually involves a controlled experiment to determine how well people can use the product. 1 Rather than showing users a rough draft and asking, "Do you understand this?", usability testing involves watching people trying to use something for its intended purpose. For example, when testing instructions for assembling a toy, the test subjects should be given the instructions and a box of parts. Instruction phrasing, illustration quality, and the toy's design all affect the assembly process. Setting up a usability test involves carefully creating a scenario, or realistic situation, wherein the person performs a list of tasks using the product being tested while observers watch and take notes. Several other test instruments such as scripted instructions, paper prototypes, and pre- and post-test questionnaires are also used to gather feedback on the product being tested. For example, to test the attachment function of an e-mail program, a scenario would describe a situation where a person needs to send an e-mail attachment, and ask him or her to undertake this task. The aim is to observe how people function in a realistic manner, so that developers can see problem areas, and what people like. Techniques popularly used to gather data during a usability test include think aloud protocol and eye tracking. Hallway testing (or hallway usability testing) is a specific methodology of software usability testing. Rather than using an in-house, trained group of testers, just five to six random people, indicative of a cross-section of end users, are brought in to test the software (be it an application, web site, etc.); the name of the technique refers to the fact that the testers should be random people who pass by in the hallway. The theory, as

adopted from Jakob Nielsen's research, is that 95% of usability problems can be discovered using this technique.

Contents • •

1 What to measure 2 See also



3 External links

What to measure Usability testing generally involves measuring how well test subjects respond in four areas: time, accuracy, recall, and emotional response. The results of the first test can be treated as a baseline or control measurement; all subsequent tests can then be compared to the baseline to indicate improvement. • • • •

Time on Task -- How long does it take people to complete basic tasks? (For example, find something to buy, create a new account, and order the item.) Accuracy -- How many mistakes did people make? (And were they fatal or recoverable with the right information?) Recall -- How much does the person remember afterwards or after periods of nonuse? Emotional Response -- How does the person feel about the tasks completed? (Confident? Stressed? Would the user recommend this system to a friend?)

In the early 1990s, Jakob Nielsen, at that time a researcher at Sun Microsystems, popularized the concept of using numerous small usability tests -- typically with only five test subjects each -- at various stages of the development process. His argument is that, once it is found that two or three people are totally confused by the home page, little is gained by watching more people suffer through the same flawed design. "Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford." 2. Nielsen subsequently published his research and coined the term heuristic evaluation. The claim of "Five users is enough" was later described by a mathematical model (Virzi, R.A., Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough? Human Factors, 1992. 34(4): p. 457-468.) which states for the proportion of uncovered problems U U = 1 − (1 − p)n

where p is the probability of one subject identifying a specific problem and n the number of subjects (or test sessions). This model shows up as an asymptotic graph towards the number of real existing problems (see figure below).

In later research Nielsen's claim has eagerly been questioned with both empirical evidence 3 and more advanced mathematical models (Caulton, D.A., Relaxing the homogeneity assumption in usability testing. Behaviour & Information Technology, 2001. 20(1): p. 1-7.). Two of the key challeges to this assertion are: (1) since usability is related to the specific set of users, such a small sample size is unlikely to be representative of the total population so the data from such a small sample is more likely to reflect the sample group than the population they may represent and (2) many usability problems encountered in testing are likely to prevent exposure of other usability problems, making it impossible to predict the percentage of problems that can be uncovered without knowing the relationship between existing problems. Most researchers today agree that, although 5 users can generate a significant amount of data at any given point in the development cycle, in many applications a sample size larger than five is required to detect a satisfying amount of usability problems. Bruce Tognazzini advocates close-coupled testing: "Run a test subject through the product, figure out what's wrong, change it, and repeat until everything works. Using this technique, I've gone through seven design iterations in three-and-a-half days, testing in the morning, changing the prototype at noon, testing in the afternoon, and making more elaborate changes at night." 4 This testing can be useful in research situations.

Load testing Load testing is the process of creating demand on a system or device and measuring its response. In mechanical systems it refers to the testing of a system to certify it under the appropriate regulations (LOLER in the UK - Lifting Operations and Lifting Equipment Regulations). Load testing is usually carried out to a load 1.5x the SWL (Safe Working Load) periodic recertification is required. In software engineering it is a blanket term that is used in many different ways across the professional software testing community. Load testing generally refers to the practice of modeling the expected usage of a software program by simulating multiple users accessing the program's services concurrently. As such, this testing is most relevant for multi-user systems, often one built using a client/server model, such as web servers. However, other types of software systems can be load-tested also. For example, a word processor or graphics editor can be forced to read an extremely large document; or a financial package can be forced to generate a report based on several years' worth of data. The most accurate load testing occurs with actual, rather than theoretical, results. When the load placed on the system is raised beyond normal usage patterns, in order to test the system's response at unusually high or peak loads, it is known as stress testing. The load is usually so great that error conditions are the expected result, although no clear boundary exists when an activity ceases to be a load test and becomes a stress test. There is little agreement on what the specific goals of load testing are. The term is often used synonymously with performance testing, reliability testing, and volume testing.

Volume testing Volume Testing belongs to the group of non-functional tests, which are often misunderstood and/or used interchangeably. Volume testing refers to testing a software application for a certain data volume. This volume can in generic terms be the database size or it could also be the size of an interface file that is the subject of volume testing. For example, if you want to volume test your application with a specific database size, you will explode your database to that size and then test the application's performance on it. Another example could be when there is a requirement for your application to interact with an interface file (could be any file such as .dat, .xml); this interaction could be reading and/or writing on to/from the file. You will create a sample file of the size you want and then test the application's functionality with that file to check performance.

Stress testing Stress testing is a form of testing that is used to determine the stability of a given system or entity. It involves testing beyond normal operational capacity, often to a breaking point, in order to observe the results. Stress testing may have a more specific meaning in certain industries.

Contents • •

1 IT industry 2 Medicine



3 Financial sector

IT industry In software testing, stress testing often refers to tests that put a greater emphasis on robustness, availability, and error handling under a heavy load, than on what would be considered correct behavior under normal circumstances. In particular, the goals of such tests may be to ensure the software doesn't crash in conditions of insufficient computational resources (such as memory or disk space), unusually high concurrency, or denial of service attacks. Examples: •

A web server may be stress tested using scripts, bots, and various denial of service tools to observe the performance of a web site during peak loads.

Medicine •

A Cardiac stress test is used most commonly to detect marked imbalances in blood flow to the heart muscle.

Financial sector •

Instead of doing financial projection on a "best estimate" basis, a company may do stress testing where they look at how robust a financial instrument is in certain crashes. They may test the instrument under, for example, the following stresses: o What happens if the market crashes by more than x% this year? o What happens if interest rates go up by at least y%?

o o

What if half the instruments in the portfolio terminate their contacts in the 5th year? What happens if oil prices rise by 200%?

This type of analysis has become increasingly widespread, and has been taken up by various governmental bodies (such as the FSA in the UK) as a regulatory requirement on certain financial institutions to ensure adequate capital allocation levels to cover potential losses incurred during extreme, but plausible, events. This emphasis on adequate, risk adjusted determination of capital has been further enhanced by modifications to banking regulations such as Basel II. Stress testing models typically allow not only the testing of individual stressors, but also combinations of different events. There is also usually the ability to test the current exposure to a known historical scenario (such as the Russian debt default in 1998 or 9/11 terrorist attacks) to ensure the liquidity of the institution

Sanity testing A sanity test or sanity check is a basic test to quickly evaluate the validity of a claim or calculation. In mathematics, for example, when multiplying by three or nine, verifying that the sum of the digits of the result is a multiple of 3 or 9 respectively is a sanity test. In computer science it is a very brief run-through of the functionality of a computer program, system, calculation, or other analysis, to assure that the system or methodology works as expected, often prior to a more exhaustive round of testing. Sanity tests are sometimes mistakenly equated to smoke tests. Where a distinction is made between sanity testing and smoke testing, it's usually in one of two directions. Either sanity testing is a focused but limited form of regression testing – narrow and deep, but cursory; or it's broad and shallow, like a smoke test, but concerned more with the possibility of "insane behavior" such as slowing the entire system to a crawl, or destroying the database, but is not as thorough as a true smoke test. Generally, a smoke test is scripted (either using a written set of tests or an automated test), whereas a sanity test is usually unscripted. With the evolution of test methodologies, sanity tests are useful both for initial environment validation and future interactive increments. The process of sanity testing begins with the execution of some online transactions of various modules, batch programs of various modules to see whether the software runs without any hindrance or abnormal termination. This practice can help identify most of the environment related problems. A classic example of this in programming is the hello world program. If a person has just set up a computer and a compiler, a quick sanity test can be performed to

see if the compiler actually works: write a program that simply displays the words "hello world". A sanity test can refer to various order of magnitude and other simple rule of thumb devices applied to cross-check mathematical calculations. For example: •





If one were to attempt to square 738 and calculated 53,874, a quick sanity check could show that this cannot be true. Consider that 500 < 738, yet 5002 = 521002 = 250000 > 53874. Since squaring preserves inequality for positive numbers (see inequality), this cannot be true and so the calculation was bad. In multiplication, 918 x 155 is not 142135 since 918 is divisible by three but 142135 is not (digits add up to 13 the digits of which do not add up to a multiple of three). When talking about quantities in physics, the power output of a car cannot be 700 kJ since that is a unit of energy, not power (energy per unit time). See dimensional analysis.

Smoke testing Smoke testing is a term used in plumbing, woodwind repair, electronics, and computer software development. It refers to the first test made after repairs or first assembly to provide some assurance that system under test will not catastrophically fail. After a smoke test proves that the pipes will not leak, the keys seal properly, the circuit will not burn, or the software will not crash outright, the assembly is ready for more stressful testing.



In computer programming and software testing, smoke testing is a preliminary to further testing, which should reveal simple failures severe enough to reject a prospective software release. In this case, the smoke is metaphorical.

.

Smoke testing in software development Smoke testing is done by developers before the build is released or by testers before accepting a build for further testing.

In software engineering, a smoke test generally consists of a collection of tests that can be applied to a newly created or repaired computer program. Sometimes the tests are performed by the automated system that builds the final software. In this sense a smoke test is the process of validating code changes before the changes are checked into the larger product’s official source code collection. Next after code reviews, smoke testing is the most cost effective method for identifying and fixing defects in software; some even believe that it is the most effective of all. In software testing, a smoke test is a collection of written tests that are performed on a system prior to being accepted for further testing. This is also known as a build verification test. This is a "shallow and wide" approach to the application. The tester "touches" all areas of the application without getting too deep, looking for answers to basic questions like, "Can I launch the test item at all?", "Does it open to a window?", "Do the buttons on the window do things?". There is no need to get down to field validation or business flows. If you get a "No" answer to basic questions like these, then the application is so badly broken, there's effectively nothing there to allow further testing. These written tests can either be performed manually or using an automated tool. When automated tools are used, the tests are often initiated by the same process that generates the build itself.

Exploratory testing Exploratory testing is an approach in software testing with simultaneous learning, test design and test execution. While the software is being tested, the tester learns things that together with experience and creativity generates new good tests to run.

Contents • • •

1 History 2 Description 3 Benefits and drawbacks



4 Usage

History Exploratory testing has been performed for a long time, and has similarities to ad hoc testing. In the early 1990s, ad hoc was too often synonymous with sloppy and careless work. As a result, a group of test methodologists (now calling themselves the

Context-Driven School) began using the term "exploratory" seeking to emphasize the dominant thought process involved in unscripted testing, and to begin to develop the practice into a teachable discipline. This new terminology was first published by Cem Kaner in his book Testing Computer Software. Exploratory testing can be as disciplined as any other intellectual activity.

Description Exploratory testing seeks to find out how the software actually works, and to ask questions about how it will handle difficult and easy cases. The testing is dependent on the testers skill of inventing test cases and finding defects. The more the tester knows about the product and different test methods, the better the testing will be. To further explain, comparison can be made to the antithesis scripted testing, which basically means that test cases are designed in advance, including steps to reproduce and expected results. These tests are later performed by a tester who compares the actual result with the expected. When performing exploratory testing, there are no exact expected results; it is the tester that decides what will be verified, critically investigating the correctness of the result. In reality, testing almost always is a combination of exploratory and scripted testing, but with a tendency towards either one, depending on context. The documentation of exploratory testing ranges from documenting all tests performed to just documenting the bugs. During pair testing, two persons create test cases together; one performs them, and the other documents. Session-based testing is a method specifically designed to make exploratory testing auditable and measurable on a wider scale.

Benefits and drawbacks The main advantage of exploratory testing is that less preparation is needed, important bugs are found fast, and is more intellectually stimulating than scripted testing. Disadvantages are that the tests can't be reviewed in advance (and by that prevent errors in code and test cases), and that it can be difficult to show exactly which tests have been run. When repeating exploratory tests, they will not be performed in the exact same manner, which can be an advantage if it is important to find new errors; or a disadvantage if it is more important to know that exact things are functional.

Usage

Exploratory testing is extra suitable if requirements and specifications are incomplete, or if there is lack of time. The method can also be used to verify that previous testing has found the most important defects. It is common to perform a combination of exploratory and scripted testing where the choice is based on risk. An example of exploratory testing in practice is Microsofts verification of Windows compatibility

Regression testing Regression testing is any type of software testing which seeks to uncover regression bugs. Regression bugs occur whenever software functionality that previously worked as desired stops working or no longer works in the same way that was previously planned. Typically regression bugs occur as an unintended consequence of program changes. Common methods of regression testing include re-running previously run tests and checking whether previously fixed faults have re-emerged. Experience has shown that as software is developed, this kind of reemergence of faults is quite common. Sometimes it occurs because a fix gets lost through poor revision control practices (or simple human error in revision control), but just as often a fix for a problem will be "fragile" - i.e. if some other change is made to the program, the fix no longer works. Finally, it has often been the case that when some feature is redesigned, the same mistakes will be made in the redesign that were made in the original implementation of the feature. Therefore, in most software development situations it is considered good practice that when a bug is located and fixed, a test that exposes the bug is recorded and regularly retested after subsequent changes to the program. Although this may be done through manual testing procedures using programming techniques, it is often done using automated testing tools. Such a 'test suite' contains software tools that allow the testing environment to execute all the regression test cases automatically; some projects even set up automated systems to automatically re-run all regression tests at specified intervals and report any regressions. Common strategies are to run such a system after every successful compile (for small projects), every night, or once a week. Regression testing is an integral part of the extreme programming software development method. In this method, design documents are replaced by extensive, repeatable, and automated testing of the entire software package at every stage in the software development cycle.

Contents •

1 Types of regression



2 Mitigating regression risk



3 Uses

Types of regression • • •

Local - changes introduce new bugs. Unmasked - changes unmask previously existing bugs. Remote - Changing one part breaks another part of the program. For example, Module A writes to a database. Module B reads from the database. If changes to what Module A writes to the database break Module B, it is remote regression.

There's another way to classify regression. •

New feature regression - changes to code that is new to release 1.1 break other code that is new to release 1.1.



Existing feature regression - changes to code that is new to release 1.1 break code that existed in release 1.0.

Mitigating regression risk • • • •

• •

Complete test suite repetition Regression test automation (GUI, API, CLI) Partial test repetition based on traceability and analysis of technical and business risks Customer or user testing o Beta - early release to both potential and current customers o Pilot - deploy to a subset of users o Parallel - users use both old and new systems simultaneously Use larger releases. Testing new functions often covers existing functions. The more new features in a release, the more "accidental" regression testing. Emergency patches - these patches are released immediately, and will be included in future maintenance releases.

Uses Regression testing can be used not only for testing the correctness of a program, but it is also often used to track the quality of its output. For instance in the design of a compiler, regression testing should track the code size, simulation time and compilation time of the test suites.

Installation testing Implementation Testing or sometimes called Installation testing is typically completed by the software testing engineer in conjunction with the configuration manager. Implementation Testing is usually defined as testing which takes place using the compile version of code into the Testing environment or pre-production environment which may or may not make it into Production. This generally takes place outside of the development environment to limit code corruption from other future releases which may reside on the development environment. While the ideal installation might simply appear to be to run an install program sometimes called package software. This package software typically uses a setup program which acts as a multi configuration wrapper, which may allow the software to be installed on a variety of machine and/or operating environments. Every possible configuration should require extensive testing before it can be used with confidence. In distributed systems, particularly where software is to be released into an already live target environment (such as an operational web site) installation (or Software deployment as it is sometimes called) can involve database schema changes as well as the installation of new software. Deployment plans in such circumstances may include backout procedures whose use is intended to roll the target environment back in the event that the deployment is unsuccessful. Ideally, the deployment plan itself should be tested in an environment that is a replica of the live environment. A factor that can increase the organizational requirements of such an exercise is the need to synchronize the data in the test deployment environment with that in the live environment with minimum disruption to live operation. This type of implementation may include testing of the processes which take place during the installation or upgrade of a multi tear application. This type of testing is commonly compared to a dress rehearsal or may even be called a “dry run”. Implementation Testing is testing of a full, partial or upgrades install/uninstall processes.

Related Documents

Types Of Testing
October 2019 17
Types Of Testing Approaches
November 2019 29
Types Of Testing
May 2020 5
Testing Types
November 2019 19
Types And Levels Of Testing
October 2019 15