Effectiveness

  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Effectiveness as PDF for free.

More details

  • Words: 6,423
  • Pages: 8
The effectiveness of risk management: measuring what didn't happen

John F. McGrew Pacific Bell, San Ramon, California, USA John G. Bilotta Charles Schwab & Co., Inc., San Francisco, California, USA

Keywords

Risk management, Measurement, Project management

Introduction

Development of software is an inherently risky business. The probability of schedule Two of the most common reasons slippage on large software engineering for not implementing a risk projects is nearly 100 per cent (Jones, 1994). management program are cost The probability of cost overruns exceeds 50 and benefit. This paper focuses per cent, while the probability of outright on whether the benefits of intervention can be shown to failure is about 10 per cent (Jones, 1994). One justify the costs. A confounding in ten large system efforts in the USA is factor is that the acts of never finished or delivered (Putnam and intervention during a risk Myers, 1992). Any effort on the scale of management program may alter the outcome in ways we cannot software development involving significant separate and therefore cannot risk factors requires that risks be actively cost out. A second confounding managed if the effort is to succeed. factor is response bias ± the Project risk management is the overall tendency of individuals consistently to underestimate or process of analyzing and addressing risk. The overestimate risk, resulting in process has three major components: interventions that may be assessment, mitigation, and contingency ineffective or excessively planning (Charette, 1989). In the assessment wasteful. The authors demonstrate that signal detection phase, risks are analyzed for likelihood and theory (SDT) can be used to impact on project objectives. The mitigation analyze data collected during a phase generates action plans to minimize the risk management program to risks. As the project progresses, the disambiguate the confounding effects of intervention and effectiveness of the mitigation plans is response bias. SDT can produce reviewed and adjustments are made as an unbiased estimate of percent necessary. Finally, contingency plans are correct for a risk management developed to offset the consequences of program. Furthermore, this unbiased estimator allows failure should the risk mitigation plans fail comparison of results from one (Bilotta, 1995). An effective program of risk program to another. management is an ongoing process of assessment, intervention and fallback planning (Boehm, 1989). Yet the implementation of risk management programs on software development projects is rare in commercial business (Jones, 1994). The reasons for not implementing risk management include cost, benefits, and expertise. The most common rationalizations are that the project is too small (or too large) to justify the time and expense of a review; that the benefits cannot be determined and, therefore, the costs are assumed to outweigh Abstract

Management Decision 38/4 [2000] 293±300

# MCB University Press [ISSN 0025-1747]

The current issue and full text archive of this journal is available at http://www.emerald-library.com

the benefits; and that the effort is unlikely to uncover anything that is not already well known to everyone involved in the project. Nearly every project risk is known to at least one person on the team. Success results from assuring that someone owns the responsibility for addressing each risk. The issue is to identify, document, and manage risks ± not just to know about them. This paper is focused on the first two concerns which come down to whether the benefits of risk management justify the costs ± a question that can only be answered if we can measure the effectiveness of a risk management program. It is difficult to argue against the objection that the benefits of risk management cannot be demonstrated. If a team assesses the risks associated with a project and intervenes to minimize them and the project succeeds, was the intervention program successful ± or would the project have succeeded anyway? The act of intervention may have altered the outcome in ways we cannot separate out ± or it may not have altered it at all. The act of intervention confounds any effort to quantify the effectiveness of a risk management program. A second confounding factor is response bias. We can best understand the impact of response bias by examining two extreme hypothetical cases: Projects A and B. The managers of Projects A and B are both anxious to minimize their risks. They each hold risk assessment and planning sessions to identify risks and to create risk mitigation and contingency plans. The team members of Project A are very confident about their ability to deliver on all tasks, they see little or no risks associated with any of their commitments. Rather than assessing each activity objectively by reference to its actual risks, the team is naysaying, that is, minimizing the assessment of risk across the board. Consequently, their analysis generates few, if any, mitigation or contingency plans. The result is that, as

[ 293 ]

John F. McGrew and John G. Bilotta The effectiveness of risk management: measuring what didn't happen Management Decision 38/4 [2000] 293±300

[ 294 ]

problems do arise, the impacts are greater than expected and the team is unprepared to deal with them. In contrast, the team members of Project B see significant problems hidden in every task, even the simplest routine work. They err on the side of yea-saying. In their minds, every task is overflowing with risk that must be controlled. As a result, their analysis generates dozens or hundreds of action items for staff members to implement at considerable cost in terms of people and resources with, proportionately, little in the way of payback. Assuming that both projects have moderate, and roughly equal, amounts of risk, it is not too difficult to imagine their respective outcomes. Project A demonstrated a strong negative response bias by consistently underestimating the risk associated with the project and, consequently, underpreparing for the consequences. Project B, on the other hand, demonstrated a strong positive response bias by overestimating the risks and, therefore, overpreparing for the consequences. What will the outcome be? Project A, which underestimated the risks, will experience significant problems in delivering its product and will dismiss risk assessment as ineffective because it did not prevent the problems from occurring. Project B, which overestimated the risk, will likely avoid most of the real problems in delivering its product and all of the imaginary ones as well. It is likely that it too will dismiss the value of risk management since it was so costly in terms of people and resources yet the project was successful anyway. Ideally, we want to optimize the value of a risk management program. To achieve that goal, we must ask whether it is possible, based on an evaluation of a team's risk management efforts relative to the actual outcome of the project, to determine just how skillful the team was in correctly identifying significant risks, dismissing insignificant risks, and intervening to head off risks. In short, can we, from the data, determine a team's skill in risk assessment and risk mitigation while controlling for response bias? Can we provide an unbiased estimate of the effectiveness of risk management? Put another way, can we measure what we may have caused to not happen? This paper discusses techniques for measuring and understanding the effectiveness of risk management programs while controlling for response bias. Our research data focus on risk programs that were implemented for two software development teams over several product

releases. The measure of effectiveness is based on an analysis that contrasts the results of the risk assessment, mitigation, and contingency planning efforts with the actual outcome of the project in terms of the tasks that failed and the tasks that did not. There are two prerequisites for such an analysis. The first is that the risk assessment, mitigation, and contingency data be tracked. The second is that an independent audit or post-implementation review of the project be conducted in a way that allows the actual outcome of each task (in terms of its degree of failure) to be assessed. This information can then be organized into two contingency tables (examples with hypothetical data for an imaginary project are provided by Tables I and II). The rows of Table I separate the tasks into two groups: those which the team estimated to be of high risk and those which it estimated to be of low risk. The rows of Table II also separate the tasks into two groups: those which the team judged to require intervention and those which required no intervention. Similarly, the columns in both Tables I and II separate the tasks into two groups: those found, after completion of the project, to have failed and those found not to have failed. If we look at the second column of each table (Tables I and II), we can see the nature of the analytical problem facing us. At the end of the project, 36 of the tasks were classified as having been completed successfully, that is, they did not fail. Of

Table I Contingency table for the outcome of a hypothetical risk assessment program

Tasks estimated to be high risk Tasks estimated to be low risk Column totals

Tasks that failed

Tasks that did not fail

Row totals

21

4

25

9

32

41

30

36

66

Table II Contingency table for the outcome of a hypothetical risk intervention program

Tasks intervened Tasks not intervened Column totals

Tasks that failed

Tasks that did not fail

Row totals

19 11

12 24

31 35

30

36

66

John F. McGrew and John G. Bilotta The effectiveness of risk management: measuring what didn't happen Management Decision 38/4 [2000] 293±300

those, 32 were judged to be of low risk (cf., the lower right hand cell of Table I) so no intervention was planned for 24 of them (cf., the same cell of Table II). On the other hand, the team did intervene in the four tasks judged to be of high risk, taking steps, either through risk mitigation or contingency planning, to minimize the risk. The core analytical problem is that the team intervened on behalf of four tasks based on its estimate of the risk inherent in each task. Consequently, it is not possible to say how many of those four tasks succeeded because the team intervened as opposed to the number that would have succeeded even if the team had not intervened. After all, it also intervened on eight other low risk tasks that were successful (the team intervened in a total of 12 tasks as shown in Table II, but only four of the 12 tasks were assessed to be of high risk in Table I). This is an instance of a positive response bias, that is, seeing problems where they don't exist. In the same way, if we look at the first column of each table, we see that the team estimated 21 of the 30 tasks that failed to be of high risk but intervened on only 19 ± yet all 30 tasks failed. Is this a skill problem on the part of the team in intervening on the correct tasks, or is it an example of a negative response bias? If an intervened task fails, did the intervention prevent a worse failure or did it actually cause the failure? If the task succeeds, did the intervention cause the success or would it have succeeded anyway? In other words, we need a method to disambiguate the interaction of risk and intervention. We need a tool to extract from the risk assessment and intervention contingency tables any bias toward yeasaying or nay-saying if we are accurately to estimate the effectiveness of the overall risk management program. At the end of any risk management program, a project team will be unable to say much about its effectiveness. It will know how many tasks failed and how many did not ± they will have some estimate of its overall success rate ± but it will not know whether its intervention had an impact on the tasks that succeeded or whether their success or failure was owing to an overly positive or negative response bias. This paper will show that, using the tools of signal detection analysis, the skill of the team, that is, the effectiveness of the team's intervention program in mitigating and controlling risk, can be estimated while controlling for response bias. Understanding response bias provides the means to assess, and through repeated use, to optimize the value of a risk management program.

Applying signal detection theory to the problem Signal detection theory (SDT) was developed in the communications industry by Abraham Wald (Swets, 1996) to provide a tool for determining whether a signal could be accurately separated from background noise. The technique was adapted by Swets and Tanner (Swets, 1996) for use in psychophysics to separate a subject's skill from the response bias when detecting a signal. Since that time, SDT has been used extensively to describe the interaction of a person's sensitivity and bias in perceptual, auditory, and other sensory tasks (Macmillan and Creelman, 1991; Swets, 1996). It has also been extended beyond sensory discrimination tasks to describe the interaction between a person's understanding of the world and the world as it actually exists (McGrew and Jones, 1976; McGrew, 1983, 1994). Used in this way, it provides a method to separate an individual's ability to understand the realities of a situation from his/her biases about the situation. In the context of risk management, it can separate a team's ability to accurately detect and intervene in risks from its tendency to nay-say or yea-say. The advantage of SDT over traditional inferential and descriptive statistics is that, instead of comparing a sample distribution to a theoretical distribution, it compares two sample distributions directly. The ability to work directly with sample distributions makes SDT an ideal tool for assessing the success or failure of a risk management program because there is no theoretical distribution for risk. The comparison of sample distributions is configured as a contingency table. The contingency table contrasts the perception of the world with the world as it really is. Table III shows the contingency table layout for a typical SDT analysis. In Table III, A and B represent the column marginals which sum to 1; C and D represent the row marginals which do not necessarily sum to 1. The row labels in Table III (event reported and event not reported) represent a perception of the world. The column labels

Table III Contingency table for a typical signal detection analysis Event occurred Event reported Event not reported Column totals

Event did not occur

Hit False alarm Incorrect Correct Rejection Rejection A=1 B=1

Row totals C=1 D=1

[ 295 ]

John F. McGrew and John G. Bilotta The effectiveness of risk management: measuring what didn't happen Management Decision 38/4 [2000] 293±300

[ 296 ]

(event occurred and event did not occur) represent the real world. In the case of a risk management program, using Table II as an example, we can interpret row ``task intervened'' as event reported and ``task not intervened'' as event not reported. We can, also in Table II, interpret the column ``task failed'' as event occurred and ``task did not fail'' as event did not occur. The row and column headings of Table I can be interpreted in the same way. The upper lefthand cell of Table III (labeled ``hit'') represents the correct perceptions of true events (event reported and event occurred). The lower righthand cell of Table III (labeled ``correct rejections'') represents the correct perceptions of false events (event not reported and event did not occur). Together these two cells represent the only situations in which the team has correctly discerned the nature of the events it has examined or in which it has intervened. That is, they have correctly called true events true and false events false. In the context of risk management, we can say the team has correctly identified high risks that require intervention and low risks that do not. In contrast, the other diagonal of Table III represents those items the team has incorrectly perceived. That is, these are events which are false but which the team has judged to be true (false alarms ± upper righthand cell) or events which are true and which the team has judged to be false (incorrect rejections ± lower lefthand cell). Owing to the confounding effect of intervention, however, the only two cells of Table III that offer the possibility of an unambiguous interpretation fall along the bottom row (the incorrect rejections and the correct rejections). In the top row, the truth of a task (that is, whether it failed) is partly due to the actions of the risk management team. If you intervene to mitigate a risk and it does occurs (that is, the task fails), you don't know whether it would have failed anyway or if the intervention increased or decreased the degree of failure. The same problem exists for an intervention for which the risk does not occur (that is, the task does not fail). You don't know whether the intervention succeeded or if the risk simply did not materialize. Interpretation of the top row is confounded by the act of intervention and the human propensity toward response bias. However, in the bottom row, it is clear when a risk occurs and you do not intervene (incorrect rejections) and it is also clear when you do not intervene and a risk does not occur (correct rejection). Traditionally, SDT uses the hit and false alarm rates (the top row of Tables I, II and III)

for its analysis. One can just as appropriately use the correct and incorrect rejections in the bottom row to perform the analysis owing to the symmetry of the theory. The structure of an SDT contingency table is such that we can infer the correct values for the top row of the table from the bottom row by virtue of the fact that the columns must sum to 1. In this analysis, we use the incorrect and correct rejection rates to determine the skill and bias of the team because they provide a direct and unambiguous measure of the effectiveness of the team's risk management program. The outcome would be the same if we used the inferred values. However, the inferred values are confounded with the actions of the team in an unknowable way. SDT provides a model for understanding the contingency table summarizing the outcome of a risk management program in spite of the analytical complications introduced by the team's acts of intervention. An SDT analysis of the contingency tables for risk assessment and risk intervention allows us to determine three critical metrics about the effectiveness of a risk management program: it provides an estimate of the team's skill, of its response bias, and an unbiased estimate of the team's percentcorrect, an easily understood indicator of effectiveness. We could, alternatively, take the very simple approach of adding up the hits and correct rejections (all of the team's correct responses) and dividing by the total number of tasks to determine a simple percent-correct metric from the contingency table. The straight percent-correct metric, however, is easily distorted by response bias which can cause a gross over or underestimate of the true percent correct. In our empirical data, we will show just such a large effect for one of the project teams.

Method The data presented here were collected from two different project teams across three product releases. The first team, Project T, implemented a risk management program during the final months of the project's overall schedule. The team consisted of almost 200 management and technical professionals working over a period of five years (Bilotta, 1995). The project management team requested a risk program be set up to address concerns about the readiness of the project. This was the team's first attempt at formal risk management. Fifty-two tasks remaining in the project schedule were reviewed once a month for six months by the key technical staff under the

John F. McGrew and John G. Bilotta The effectiveness of risk management: measuring what didn't happen Management Decision 38/4 [2000] 293±300

direction of one of the authors. Likeliness of failure and impact of failure were estimated by the team for each task and were used to assess risk according to a process that was first documented by the Air Force (US Air Force, 1988). Separate sessions were held to develop action plans to mitigate identified risks or to develop contingency plans to minimize the impact of failure. After the application had been successfully implemented, a quality assurance team from outside the project performed a postconversion review, covering all aspects of the development effort. They provided an independent assessment of the root causes of success and failure that could be linked back to project activities, including the 52 tasks monitored under the risk management program (Bilotta, 1995). This, plus the original risk assessment, mitigation, and contingency planning data provided the material to construct risk assessment and risk intervention contingency tables for the team. The same methodology was used to establish a risk management program for a second project, Project N, being developed in multiple phases by a small team of a dozen technical staff. Risk assessments, facilitated by the authors, were done once for each of two separate project phases. Again, follow up assessments of success and failure for individual tasks were conducted by a quality assurance manager. To begin the SDT analysis for risk assessment, the team's assessment of risk for each task (high or low) was matched with the task's corresponding outcome (failed or did not fail) ± see Tables IV, VI and VIII. The same procedure was followed to analyze risk intervention: the team's action relative to each task (intervened or did not intervene) was matched with the task's outcome (again, failed or did not fail) see Tables V, VII, and IX. In this way, each task was placed in one of four categories corresponding to the four cells of the contingency table: correct rejections, incorrect rejections, hits, and false alarms.

Table IV Risk assessment contingency table for Project T

Tasks estimated to be high risk Tasks estimated to be low risk Column totals

Table V Risk intervention contingency table for Project T

Tasks intervened Tasks not intervened Column totals

Tasks that failed

Tasks that did not fail

Row totals

16

15

31

9 25

12 27

21 52

Table VI Risk assessment contingency table for Project N, Phase I

Tasks estimated to be high risk Tasks estimated to be low risk Column totals

Tasks that failed

Tasks that did not fail

Row totals

3

5

8

1 4

6 11

7 15

Table VII Risk intervention contingency table for Project N, Phase I

Tasks intervened Tasks not intervened Column totals

Tasks that failed

Tasks that did not fail

Row totals

3

7

10

1 4

4 11

5 15

Table VIII Risk assessment contingency table for Project N, Phase II

Tasks estimated to be high risk Tasks estimated to be low risk Column totals

Tasks that failed

Tasks that did not fail

Row totals

1

0

1

0 1

27 27

27 28

Table IX Risk intervention contingency table for Project N, Phase II

Tasks that failed

Tasks that did not fail

Row totals

18

3

21

7 25

24 27

31 52

Tasks intervened Tasks not intervened Column totals

Tasks that failed

Tasks that did not fail

Row totals

0

11

11

1 1

16 27

17 28 [ 297 ]

John F. McGrew and John G. Bilotta The effectiveness of risk management: measuring what didn't happen Management Decision 38/4 [2000] 293±300

A correct rejection is a task which did not fail and which, during assessment, was judged to be of low risk or which, during intervention, was not acted upon. An incorrect rejection is a task which did fail during implementation but which, during assessment, was judged to be of low risk or which, during intervention, was not acted upon. The upper lefthand cell of the upper row contains the risk estimates and interventions that were successful (hits) while the upper righthand cell contains the risk estimates and interventions that were not (false alarms). The reciprocal relationship between the cells in the upper and lower rows is made clear in Equations 1 and 2 where H stands for hits, FA for false alarms, IR for incorrect rejections, and CR for correct rejections. A is the column total for ``event occurred'' (task failed) and B is the column total for ``event did not occur'' (task did not fail). The relationships in Equations 1 and 2 are what enable us to estimate the effect of events that did not happen. 1 = (H/A) + (IR/A)

(1)

1 = (FA/B) + (CR/B)

(2)

The primary measures derived from a signal detection analysis are d', which is a measure of the observer's sensitivity or skill, and c which is a measure of the observer's response bias. The statistic d' is the distance, in standard deviation units, between the mean of the distribution of tasks that failed and the mean of the distribution of tasks that did not. d' can be interpreted in the same way as differences between standard normal scores (or z-scores). A d' of 0 indicates a complete overlap of the sample distributions. A d' of 1.65 indicates a distance between the means of the sample distributions of 1.65 standard normal scores or about a 5 per cent overlap of the distributions, while a d' of 3 indicates an overlap of no more than 0.3 per cent. The statistic c is the acceptance cutoff point for choosing between the two sample distributions. Its value, positive or negative, reflects the shift in the team's criterion for accepting false alarms and incorrect rejections, the two types of errors that can be made. As c shifts, the team is adjusting its level of acceptance of incorrect rejections or false alarms. When c = 0, the team has been able to minimize the acceptance of both. c translates to observer bias, or the tendency to yea-say or nay-say. Both risk assessment and risk intervention decisions are considered unbiased when c = 0. Because c indexes the tendency to say no, when c > 0 the bias is towards nay-saying or denying that the risk exists. When c < 0 the bias is towards yea-saying or seeing risk

[ 298 ]

where it doesn't exist. The range of c runs from 0 to any value, positive or negative, although it is unusual for it to be beyond ‹2. It is obvious that the closer c is to 0, the more efficient the risk management program because response bias is not leading the team toward underestimating or overestimating the risk. When c = 0, risk intervention actions are taken when required, but only when required. Both d' and c are based on a comparison of the hit and false alarm rates or, alternatively, the correct rejections and the incorrect rejections, as in our case. Other measures can also be derived from an SDT analysis including p(c), the raw (but biased) percent correct described earlier and p(c)unb, an unbiased estimate of the percent correct. The statistic p(c) is the sum of the hits and the correct rejections divided by the total number of tasks in all the cells. p(c) yields a biased estimate of the true percentcorrect because the underlying sample distributions may be skewed. The statistic p(c)unb yields an unbiased estimate of the true percent-correct and is calculated from d' which is a measure of skill that is independent of bias. Equations 3, 4, 5, and 6 are used to calculate d', c, p(c), and p(c)unb respectively. d' = z(H/A) ± z(FA/B)

(3)

c = ±0.5[z(H/A) + z(FA/B)]

(4)

p(c) = (H + CR)/(A + B)

(5)

p(c)unb = d'/2)

(6)

In equations (3) and (4), z is the standard normal variate expressed in standard deviations. In equation (5),  is the standard normal cumulative probability.

Results The risk management program for Project T focused on the 52 most critical tasks remaining in the project schedule. Ninetytwo risk mitigation action items and 23 contingency plans were created to offset the risk associated with 33 of the 52 tasks. The risk mitigation action items were never implemented for two of the tasks, leaving 31 tasks for which intervention was planned and taken. Twenty-nine of the 52 tasks were estimated to be of low risk and no intervention was planned or taken. Table IV is the contingency table for Project T's risk assessment data. It compares the project team's estimates of risk for the 52 tasks with the post-implementation review team's determination of the actual degree of failure or success for the same tasks. Table V is the contingency table for Project T's risk intervention data. It contrasts the

John F. McGrew and John G. Bilotta The effectiveness of risk management: measuring what didn't happen Management Decision 38/4 [2000] 293±300

project team's intervention actions for the 52 tasks with the post-implementation review team's determination of the degree of failure or success for the same tasks. Table V makes clear that the team was able to control the risk associated with only 15 of the 31 tasks for which it intervened ± the other 16 tasks in which the team intervened failed. On the other hand, the risk management team was successful in predicting that 12 of the 27 tasks which did not fail did not require intervention. Tables VI and VII summarize the assessment and intervention results of the risk management program established by Project N for Phase I of its project. Fifteen critical tasks were reviewed by the team and the outcome, at the end of Phase I, was assessed by a quality assurance manager. Tables VIII and IX summarize the assessment and intervention results of Project N's risk management program for Phase II of its project. Twenty-eight critical tasks were reviewed by the team and again the outcome was assessed by a quality assurance manager. The values of d' and c were calculated for each contingency table using Equations 3 and 4. The raw and unbiased estimates of the percent correct were calculated using Equations 5 and 6. The results for Project T and Project N, Phases I and II, are summarized in Table X. The d' values in Table X show that both teams were effective at assessing risk. Team T had a d' of 1.81 in risk assessment and Team N, Phase I, had a d' of 0.80 and in Phase II a d' of 12. Both teams were less effective at risk intervention. Team T had a d' of 0.37. Team N, Phase I, had a risk intervention of 0.30 and in Phase II a d' of 5.70. The assessment and intervention d' values for Team N show a strong learning effect from Phase I to Phase II. The c values in Table X indicate that Team N overestimated the risk during both assessment and intervention for Phase I. In Phase II, it provided an unbiased estimate

Table X Skill and bias estimates for Project T and Project N, Phases I and II

Project T Risk assessment Risk intervention Project N, Phase I Risk assessment Risk intervention Project N, Phase II Risk assessment Risk intervention

d' ``Skill''

c ``Bias''

p(c) ``raw % correct''

p(c)unb ``Unbiased % correct''

1.81 0.37

0.32 ±0.34

81 54

82 57

0.80 0.30

±0.20 ±0.45

60 47

66 62

12.00 5.70

0.00 3.15

100 57

100 99

during risk assessment but underestimated the risk during intervention. In contrast, Team T tended to minimize the risk during assessment but to overreact to the risk during intervention. Bias results of this type are an indicator of systemic problems within an organization and a zeitgeist in which teams publicly deny problems but scramble behind the scenes to correct them. This is frequently a consequence of a ``no negative feedback'' culture. Although p(c), the raw percent correct shown in the third column of Table X, has the advantage of being more familiar to most people than d', it is only accurate in situations where there is no bias in the observer's judgments. So long as c is near zero, the difference in interpretation between p(c)unb and p(c) is small, but if there is bias, whether positive or negative, the difference between p(c)unb and p(c) will grow. For instance, a comparison of the raw and unbiased percent correct show that if ±0.30 < c < +0.30, then the raw percent correct is fairly accurate. If bias is greater than ‹0.30, the raw percent correct underestimates the actual percent correct. Team N, Phase I, had a p(c) of 47 per cent for its risk intervention efforts but a p(c)unb of 62 per cent ± a 15 per cent underestimation. Team N, Phase II, had a p(c) of 57 per cent for risk intervention but a p(c)unb of 99 per cent ± a 42 per cent underestimation. This is not surprising given that the bias is 0.45 in the first case and 3.15 in the second. In contrast, all three teams show little difference between p(c) and p(c)unb for risk assessment where their bias scores all fall at approximately ‹0.30. Thus, it can be seen that p(c) can give seriously erroneous estimates of a risk management program's effectiveness. Team T made only one pass at risk management so we can say nothing about its ability to learn from experience. However, Team N shows clearly that this technique is most valuable in its repetition. By using risk management and SDT analysis from one phase to the next, the team is able to build on its past experiences to improve its future performance ± and the metrics d', c, and p(c)unb provide the means to quantify and demonstrate for management the level of improvement and value of a risk management program.

Discussion Is it possible to measure what you may have prevented from happening? The answer is yes. Ultimately, the effectiveness of a team's risk management program is probably best

[ 299 ]

John F. McGrew and John G. Bilotta The effectiveness of risk management: measuring what didn't happen Management Decision 38/4 [2000] 293±300

represented by the effectiveness of its intervention strategy. In our opening discussion, a team following a strategy of intervention to minimize risk has no way of judging the effectiveness of its efforts. It is unable to do so because it is not possible to separate the set of all successful tasks into those that owe their success to intervention and those that do not. Signal detection theory, however, enables us to separate them out by looking at the correct and incorrect rejections. The prerequisites to doing this are that potential risks be identified and tracked, and an independent ``after-action'' review of the results be done. When arranged in contingency tables and subjected to SDT analysis, the data can be used to extract estimates of a team's skill and bias in assessing and intervening in risks. SDT also provides a means to calculate an unbiased estimate of the percent correct, a direct indicator of effectiveness. The unbiased percent correct p(c)unb is derived from the value of d' and has the advantage of being more easily and intuitively understood by those less familiar with SDT, especially if percent correct is a measure being used to report the effectiveness of risk management programs. In the context of risk management, either d' or p(c)unb is a valuable and unbiased estimator of effectiveness. The positive news is that even teams inexperienced in risk management are able to assign risk estimates with a remarkable degree of accuracy. The risk assessment team for Project T had never participated in a risk management program and was able to identify correctly the risk associated with 82 per cent of the tasks. On the other hand, controlling risk through intervention is more difficult to do. Although Project T could identify the correct level of risk 82 per cent of the time, it was able to mitigate only 57 per cent of the identified risk. Team N increased its ability to assess risk between Phase I, where p(c)unb = 66 per cent, and Phase II, where p(c)unb = 100 per cent. Its risk intervention skill improved from 62 per cent in Phase I to 99 per cent in Phase II. We have no way of knowing if the effectiveness ratings for risk assessment and mitigation for the projects in this study are above or below average relative to other

software development projects. For the project teams in this study, we have some results indicating that teams inexperienced in risk management can be very effective in managing risk if only by their own reports to the authors of their satisfaction with the results of the program. An accumulation of data from a number of projects will help establish the true range of risk control. Because p(c)unb is monotonically related to d', either d' or p(c)unb can be used to support the direct comparison of results across widely different projects.

References

Bilotta, J.G. (1995), ``A study in risk management'', presented at the Bay Area Software Process Improvement Network, 18 January. Boehm, B. (1989), Software Risk Management, IEEE Computer Society Press, Los Alamitos, CA. Charette, R.N. (1989), Software Engineering Risk Analysis and Management, McGraw-Hill, New York, NY. Jones, C. (1994), Assessment & Control of Software Risks, Prentice-Hall, Englewood Cliffs, NJ. Macmillan, N.A. and Creelman, C.D. (1991), Detection Theory: A User's Guide, Cambridge University Press, New York, NY. McGrew, J.F. (1983), ``Human performance modeling using the signal detection paradigm'', presented to the Human Factors Society, Norfolk, VA. McGrew, J.F. (1994), ``Measuring the success of the SEI model using signal detection theory: an exploratory evaluation'', presented to the Software Engineering Process Group National Meeting, Dallas, TX, 27 April. McGrew, J.F. and Jones, W. (1976), ``Likelihood ratio shift as an indicator of motivational shift in multiple choice test items'', presented to the annual meeting of the Midwestern Association of Behavior Analysis, Chicago, IL, 4 May. Putnam, L.H. and Myers, W. (1992), Measures for Excellence, Prentice-Hall, Englewood Cliffs, NJ. Swets, J.A. (1996), Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers, Lawrence Erlbaum Associates, Mahwah, NJ. US Air Force (1988), Software Risk Abatement, AFCS/AFLC Pamphlet 800-45.

Application questions 1 What is your organisation's risk management plan? Are there any areas which you think might need addressing based on the authors' arguments/ discussions?

[ 300 ]

2 Which of your projects do you feel are the most high risk? Do you think this idea could help you?

Related Documents

Effectiveness
July 2020 29
Pm Effectiveness
November 2019 24
Cost Effectiveness
June 2020 15
Apg Effectiveness
May 2020 12