Preetam Reddy
G E E K S N A34%C K
ThoughtWorks Au gus t 03, 2 00 9 Episode 4
Measuring Value of Automation Tests Value and purpose of test automation The value of test automation is often described in terms of the cost benefits due to reduction in manual testing effort (and the resources needed thereof) and also their ability to give fast feedback. However, this is based on a key assumption that the automated tests are serving their primary purpose – to repeatedly, consistently, and quickly validate that the application is within the threshold of acceptable defects. Since it is impossible to know most of the defects in an application without using it over a period of time (either by a manual testing team or by users in production), we will need statistical concepts and models to help us design and confirm that the automated tests are indeed serving their primary purpose.
Definitions Failure / Positive Automation Test Results
Pass / Negative
Manual Confirmation of Defects Is a defect Is not a defect Defective code correctly Good code wrongly identified as identified as defective – defective - Not A Defect (NAD) Caught Defects (CD) (aka Type I Error / False Positive) Defective code wrongly Good code correctly identified as identified as good good - Eureka! (E) Missed Defects (MD) (aka Type II Error / False Negative) Sensitivity - CD/(CD+MD)
Positive Predictive Value – CD/(CD+NAD)
The sensitivity of a test is the probability that it will identify a defect when used on defective component. A sensitivity of 100% means that the tests recognize all defects as such. Thus in a high sensitivity test, a pass result is used to rule out defects. The positive predictive value of a test is the probability that a component is indeed defective when the test fails. Predictive values are inherently dependent upon the prevalence of defects. The threshold of acceptable number of defects has to be traded-off with the cost of achieving such a threshold - test development costs, test maintenance costs, higher test run times, etc. Tests will involve a trade-off between the acceptable number of defects missed (false negatives) and the acceptable number of “Not a Defect” (false positives). E.g. In order to prevent hijacking, airport security has to screen all baggage for arms being carried into the airplane. This can be done by manually checking all the cabin baggage. This was briefly done for domestic flights in India. However, this is prone to human error, increasing the probability of Missed Defects / false negative. Note - NAD / false positive would be low in this case. How would this change if the manual check is replaced with metal detectors?
Hypothesis The efficacy of automated tests should be measured by their sensitivity and the probability of Missed Defects / false negatives when the application is subjected to these tests.
Data from a project Automation Test Results
Failure / Positive Pass / Negative
Manual Confirmation of Defects Is a defect Is not a defect 58 20 113 34%
74%
Comments and discussion: www.geeksnack.com Want to author the next episode? email us:
[email protected]