What is black box/white box testing? Black box and white-box are test design methods. Black-box test design treats the system as a "black-box", so it doesn't explicitly use knowledge of the internal structure. Black-box test design is usually described as focusing on testing functional requirements. Synonyms for black box include: behavioral, functional, opaque-box, and closed-box. White-box test design allows one to peek inside the "box", and it focuses specifically on using internal knowledge of the software to guide the selection of test data. Synonyms for white-box include: structural, glass-box and clearbox. While black box and white-box are terms that are still in popular use, many people prefer the terms "behavioral" and "structural". Behavioral test design is slightly different from black-box test design because the use of internal knowledge isn't strictly forbidden, but it's still discouraged. In practice, it hasn't proven useful to use a single test design method. One has to use a mixture of different methods so that they aren't hindered by the limitations of a particular one. Some call this "gray-box" or "translucent-box" test design, but others wish we'd stop talking about boxes altogether. It is important to understand that these methods are used during the test design phase, and their influence is hard to see in the tests once they're implemented. Note that any level of testing (unit testing, system testing, etc.) can use any test design methods. Unit testing is usually associated with structural test design, but this is because testers usually don't have well-defined requirements at the unit level to validate. What are unit, component and integration testing? The following definitions are from a posting by Boris Beizer on the topic of "Integration Testing" in the c.s.t. Newsgroup. The definitions of integration tests are after Leung and White. Note that the definitions of unit, component, integration, and integration testing are recursive: Unit. The smallest compliable component. A unit typically is the work of one programmer (At least in principle). As defined, it does not include any called sub-components (for procedural languages) or communicating components in general. Unit Testing: in unit testing called components (or communicating components) are replaced with stubs, simulators, or trusted components. Calling components are replaced with drivers or trusted super-components. The unit is tested in isolation. Component: a unit is a component. The integration of one or more components is a component. Note: The reason for "one or more" as contrasted to "Two or more" is to allow for components that call themselves recursively. Component testing: the same as unit testing except that all stubs and simulators are replaced with the real thing. Two components (actually one or more) are said to be integrated when: a. They have been compiled, linked, and loaded together. b. They have successfully passed the integration tests at the interface between them. Thus, components A and B are integrated to create a new, larger, component (A, B). Note that this does not conflict with the idea of incremental integration -- it just means that A is a big component and B, the component added, is a small one. Integration testing: carrying out integration tests. Integration tests (After Leung and White) for procedural languages. This is easily generalized for OO languages by using the equivalent constructs for message passing. In the following, the word "call" is to be understood in the most general sense of a data flow and is not restricted to just formal subroutine calls and returns -- for example, passage of data through global data structures and/or the use of pointers.
Let A and B be two components in which A calls B. Let Ta be the component level tests of A Let Tb be the component level tests of B Tab the tests in A's suite that cause A to call B. Tbsa the tests in B's suite for which it is possible to sensitize A -- the inputs are to A, not B. Tbsa + Tab == the integration test suite (+ = union). Note: Sensitize is a technical term. It means inputs that will cause a routine to go down a specified path. The inputs are to A. Not every input to A will cause A to traverse a path in which B is called. Tbsa is the set of tests, which do cause A to follow a path in which B is called. The outcome of the test of B may or may not be affected. There have been variations on these definitions, but the key point is that it is pretty darn formal and there's a goodly hunk of testing theory, especially as concerns integration testing, OO testing, and regression testing, based on them. As to the difference between integration testing and system testing. System testing specifically goes after behaviors and bugs that are properties of the entire system as distinct from properties attributable to components (unless, of course, the component in question is the entire system). Examples of system testing issues: resource loss bugs, throughput bugs, performance, security, recovery, transaction synchronization bugs (often misnamed "timing bugs"). What's the difference between load and stress testing? Boris Beizer says: One of the most common, but unfortunate misuse of terminology is treating "load testing" and "stress testing" as synonymous. The consequence of this ignorant semantic abuse is usually that the system is neither properly "load tested" nor subjected to a meaningful stress test. 1. Stress testing is subjecting a system to an unreasonable load while denying it the resources (e.g., RAM, disc, mips, interrupts, etc.) needed to process that load. The idea is to stress a system to the breaking point in order to find bugs that will make that break potentially harmful. The system is not expected to process the overload without adequate resources, but to behave (e.g., fail) in a decent manner (e.g., not corrupting or losing data). Bugs and failure modes discovered under stress testing may or may not be repaired depending on the application, the failure mode, consequences, etc. The load (incoming transaction stream) in stress testing is often deliberately distorted so as to force the system into resource depletion. 2. Load testing is subjecting a system to a statistically representative (usually) load. The two main reasons for using such loads are in support of software reliability testing and in performance testing. The term "load testing" by itself is too vague and imprecise to warrant use. For example, do you mean representative load," "overload," "high load," etc. In performance testing, load is varied from a minimum (zero) to the maximum level the system can sustain without running out of resources or having, transactions suffer (application-specific) excessive delay. 3. A third use of the term is as a test whose objective is to determine the maximum sustainable load the system can handle. In this usage, "load testing" is merely testing at the highest transaction arrival rate in performance testing.