EUROPEAN JOURNAL OF OPERATIONAL RESEARCH ELSEVIER
European Journal of Operational Research 85 (1995) 39-52
i
Theory and Methodology
A proposal on improved procedures for estimating task-tiime distributions in PERT Hon-Shiang
L a u a, C. S o m a r a j a n
b,.
a College of Business Administration, Oklahoma State University, Stillwater, OK 74078, USA b College of Business Administration, Southeast Missouri State University, Cape Girardeau, MO 63701, USA
Received September 1991; revised October 1992
Abstract The objective of using the well-known PERT task-time estimation formulas has two components: (i) to elicit subjective time estimates from an 'expert'; and (ii) to translate these estimates into 'something' that canbe used as inputs to probabilistic network analysis. By incorporating the more current techniques in subjective probability elicitation and distribution theory, this paper proposes and justifies a practical and more logical two-step! procedure for achieving both components of the objective. The first step uses a seven-point fractile-estimation procedure, the second step determines the parameters of a beta and a Ramberg-Schmeiser distribution function that minimize the mean square deviation between the distribution function and the estimated fractiles. The procedure can be easily implemented by practitioners using a canned program. Keywords: PERT; Project management
1. Introduction In the 'classical' P E R T procedure, the stochastic time (duration) of a task is estimated as follows: (i) elicit the time estimates a, m and b, representing respectively the optimistic, most likely and pessimistic times; (ii) assume that the stochastic time is beta distributed, therefore compute the mean (/~) and standard deviation (o,) of the task time as ix=~(a+4m+b),
tr = ~ ( b - a ) .
(1)
Many authors have discussed the logic behind these formulas; see, e.g., Healy (1961), Grubbs (1962), Donaldson (1965), M a c C r i m m o n and Ryavec (1964), M o d e r and Rodgers (1968), Swanson and Pazer (1971), Sasieni (1986), Littlefield and Randolph (1987), Gallagher (1987), etc. From their discussions, it is apparent that formulas (1) can at best be considered as something marginally defensible only iafter one recognizes certain implicit restrictions that are not clearly reasonable. Therefore, some of the preceding
* Corresponding author. 0377-2217/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0377-2217(93)E0213-H
40
H-S. Lau, C. Somarajan/ European Journal of Operational Research 85 (1995) 39-52
authors have implied that formulas (1) are simply illogical and incorrect. Nevertheless, formulas (1) are still widely taught and cited, perhaps because of their simplicity. In classrooms, the need to teach formulas (1) stated in the standard textbooks and then to point out their logical errors often cause unnecessary confusion among students and embarrassment to the teacher. The justification of computational simplicity is becoming less and less valid; given today's universal availability of computers, computational requirement should be a very minor factor, and greater importance should be attached to the use of a theoretically correct and logical procedure. The basic objective behind using formulas (1) can be stated in two components: Component 1: elicit subjective time estimates from a knowledgeable manager; Component 2: translating these estimates into 'something' that can be used as inputs to probabilistic network analysis. Depending on the level of sophistication in the subsequent network analysis, the "something" may range from merely a mean value to a complete mathematical distribution function. By incorporating the more current techniques in subjective probability elicitation and distribution theory, this paper proposes a practical and more logical procedure for achieving both components of the objective behind using formulas (1). Section 2 discusses approaches for making a coordinated set of subjective time estimates. Section 3 presents a mathematically correct procedure for fitting a set of time estimates to a beta distribution. Section 4 points out the theoretical necessity of using another distribution to supplement the beta distribution in order to cover all possible shapes of task-time distributions, and procedures for handling this additional distribution are presented. Programs for implementing the procedures in Sections 3 and 4 are obtainable from the authors. Section 5 explains the superiority of Perry-Greig's (1975) formulas over formulas (1) in situations where only very simple formulas are acceptable. Section 6 gives a summary and points out issues for further consideration.
2. A more logical approach for making time estimates
The fractile method of subjective probability elicitation In PERT, estimating the optimistic, most likely and pessimistic times is the initial step for obtaining a "subjective probability distribution" for a task's completion time. That is, the task's completion time T is a random variable, and one attempts to elicit the expert's perception of the cumulative distribution function (cdf) F(T). From the huge literature on the elicitation of subjective probability distributions (see, e.g., Hampton, Moore and Thomas, 1973, Chesley, 1975, Spetzler and Stael von Holstein, 1975, Wallsten and Budescu, 1983, etc.), it is apparent that the most straightforward and common method for estimating F ( T ) is the "fractile method": Specify a number of (say, n) required fractiles ai's (i = 1 . . . . . n), elicit the corresponding time estimates ti's. For example, if one of the a / s is (say) a 3 = 0.4, then ask the expert to estimate the magnitude of the target time t 3 such that the probability of T not exceeding t 3 is ot3 = 0.4. Or, more briefly, one estimates To.4. Details on training managers to give good fractile estimates and on the actual wording of fractileeliciting questions can be found in many papers and textbooks, including Winkler (1967), Hampton et al. (1973), Vatter, Bradley, Frey and Jackson (1978), Solomon (1982) and Winterfeldt and Edwards (1986). For example, Solomon (1982) presented his "fractile elicitation instrument" and also described how he trained public accountants to estimate seven fractiles (the 0.01, 0.10, 0.25, 0.50, 0.75, 0.90 and 0.99 fractiles) of an uncertain accounting item (the account balances); Vatter et al. (1978, p. 169) gave a
H-S. Lau, C Somarajan / European Journal of Operational Research 85 (1995) 39-52
41
detailed example of the steps that one might follow in determining the five fractiles (the 0.01, 0.25, 0.50, 0.75 and 0.99 fractiles) of one's own subjective probability distribution.
Classical PERT as an improper fractile procedure Estimating 'a', 'm' and 'b' in classical PERT can be considered as a poorly defined fractile method, with the following ambiguities: 1. First, there is considerable confusion in the literature and the standard textbooks on what fractiles 'a' and 'b' should correspond to. To the original PERT developers (Malcolm, Roseboom, Clark and Fazar, 1959), a and b correspond to the 'absolute endpoints' TO and T l, respectively. A good number of standard textbooks (e.g., Hillier and Lieberman, 1986; Markland and Sweigart, 1987; Stevenson, 1990) do not specify explicitly what fractiles a and b correspond to, but their statements and diagrams imply that a and b correspond to TO and TI, thus following Malcolm et al.'s original formula. In contrast, some (e.g., Moder and Rodgers, 1968; Perry and Greig, 1975) advocate that a and b should be T's 0.05 and 0.95 fractiles. However, the majority of current operations management textbooks (e.g., Buffa and Miller, 1979, Chase and Aquilano, 1989, to name but a few) state that a and b should be T's 0.01 and 0.99 fractiles. Finally, as restated recently by Littlefield and Randolph (1987) and Gallagher (1987), one of the necessary implicit assumptions for justifying formulas (1) is tha~ a and b correspond to TO and T 1. 2. Estimating 'm' (a modal value) is not estimating a fractile. Although there is no evidence that an expert can estimate a central fractile (e.g., the median) more accurately than a modal value, a large amount of empirical knowledge has been accumulated in the probability elicitation literature on fractile estimation; in contrast, little is known about modal estimation. Furthermore, when a person is asked to make two fractile estimations (for a and b) and one modal estimation, it appears reasonable to conjecture that the person may confuse the mode (which is not a fractile) with the median (a fractile).
The recommended fractile procedure for task time estimation The preceding discussions suggest that it appears reasonable to estimate task times with a 'clean' fractile method. The next question is: How many and which fractiles should be estimated? The density function of the beta distribution used in classical PERT is (X-- u)P-I(V-x)
q-1
ft3(X)=B(p,q)(V_U)p+q_l,
U<x
p>0,
q>0,
(2)
where B(p, q) is the beta function evaluated at (p, q). ft,(x) has four parameters (U, V, p, q). Since the elicited fractiles will be used to fit this four-parameter distribution, it appears somewhat illogical to have less than four fractile estimates (this issue will be elaborated in Section 3). While the classical PERT procedure calls for estimating three points of the subjective probability distribution, empirical investigations on other applications of subjective probability elicitation have typically called for estimating more fractiles. For example, Alpert and Raiffa (1982) asked subjects to estimate five fractiles (the three 'central' fractiles: 0.25, 0.50, 0.75; and the 'extreme' fractiles: either 0.01 and 0.99 or 0.001 and 0.999); Schaefer and Borcherding (1973) asked subjects to assess seven fractiles (0.01, 0.125, 0.25, 0.50, 0.75, 0.875, 0.99), while Selvidge (1975) asked subjects to assess a different set of seven fractiles (0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99). Many other variants have also been considered (see, e.g., Lichtenstein et al., 1982, for a partial listing). Even after the set of required fractiles is given,
42
H-S. Lau, C. Somarajan/European Journalof OperationalResearch 85 (1995) 39-52
one still has to decide on the order of estimation. For example, given Selvidge's set of fractiles, should the fractiles be estimated in the order as given above (i.e., 0.01 first, and 0.99 last), or should we start from the center (i.e., 0.50 first) and estimate the extreme fractiles last? After empirically comparing several fractile-estimation schemes, Selvidge (1980) showed that the following fractile estimation procedure performed best: 1. Asssess seven fractiles. That is, the three central fractiles: the 0.25, 0.50 and 0.75 fractiles; and the four extreme fractiles: the 0.01, 0.10, 0.90 and 0.99 fractiles. 2. Assess the central fractiles first. Even intuitively, this procedure appears more 'reasonable' than the PERT's (a, m, b) procedure. The requirement of assessing the central fractiles first is also intuitively attractive because the estimated central fractiles can then act as 'anchors' for estimating the extreme fractiles. Moreover, several empirical studies (Alpert and Raiffa, 1982; Murphy and Winkler, 1974; Selvidge, 1980; Lichtenstein, Fischhoff and Philips, 1982) have confirmed that people can estimate central fractiles more accurately than extreme fractiles. Considering the background review above, it appears reasonable to suggest that: (i) Selvidge's (1980) fractile procedure be used to estimate stochastic task times; (ii) if one prefers to estimate a smaller number of fractiles (say, for cost reasons), the more extreme fractiles should be the ones to drop. However, when these time estimates are to be used in the planning of large projects, it appears unlikely that the extreme fractiles can be dropped on the basis of their marginal costs of estimation.
3. Fitting the estimated fractiles to a beta distribution
Assume for the time being that the classical assumption of beta-distributed T is valid, and therefore the fractiles estimated in the preceding section should be fitted to a beta distribution. Define F~(TIU, V, p, q) as the cumulative distribution function (cdf) of f~(T) stated in (2). Given a set of fractile-estimates (ai's, ti's), the current mathematical problem can be stated as: Determine (U, V, p, q) such that Ft3(t i [U, V, p, q) = Oli for all i.
(3)
For example, if for the fractiles a~'s = (0.05, 0.55, 0.95), the corresponding times estimates are t/s = (9, 15, 21); then the problem is to determine (U, V, p, q) such that Ft3(91U, V, p, q) =0.05,
Ft3(151U, V, p, q) =0.55,
F~(211U, V, p, q) =0.95.
(4)
At first glance, it might appear that since Ft3 has four parameters, a set of three fractile-estimates may give an infinite number of solutions (U, V, p, q) for (3), whereas a set of four fractile-estimates will give a unique solution for (3). However, Ft3 is much more restrictive than a general fourth-degree polynomial; among other things, Ft3 is monotonically increasing with at most one inflexion point. Therefore, even for the set of three fractile-estimates, there may not be a solution for (3) (this situation will be illustrated later). Clearly, with more than four fractile-estimates, a solution to (3) usually does not exist. Therefore, (3) should be modified to: Determine F~ ( T I U, V, p, q) that minimizes M S D = ( l / n ) × ~_,[Ft3(tilU, V, p, q) - a i ] 2,
(5)
n
where n is the number of fractile estimates, and MSD (mean of square of deviations) is a goodness-of-fit measure. Solution of this nonlinear programming problem is explained in Appendix A; the problem cannot be solved analytically, but it can be conveniently solved with a computer. While using formulas (1)
H-S. Lau, C. Somarajan/ EuropeanJournalof OperationalResearch85 (1995)39-52
43
does not give T's distribution function, the parameters (U, V, p, q) obtained by solving (5) fully defines T's distribution function, which is required for more sophisticated probabilistic network analysis (as in, e.g., Van Slyke, 1963, Burt and Garman, 1971, Robillard and Trahan, 1977, etc.). To obtain T's mean and standard deviation, one uses the formulas (see, e.g., Johnson and Kotz 1970)
Iz = U + ( V - U ) p / ( p ( V - U) f or= ( p + q ~ X ¢
+ q), pq
(6)
p+q+l
In contrast to formulas (1), formulas (6) are mathematically exact. There are no 'hidden' assumptions besides the explicit beta assumption.
Numerical examples Numerical example I. If the
O~i'S = (0.05, 0.55, 0.95)
and
the
ti'S
(9, 15, 21),
=
(7a)
executing the procedure explained in Appendix A gives U = 2.915,
V = 35.510,
p = 6.306,
q = 11.139;
MSD = 1.9 × 10 -11 .
(7b)
Noting that the a;'s in (5) are probabilities with magnitudes between 0 and 1, an MSD of 0.0012 (or 10 -6) may be used as the benchmark of a very good fit. Thus, the extremely small MSD in (7b) indicates that we have found a beta distribution that provides a practically perfect fit for the given (ai's, ti's). The a;'s in (7a) differ from the standard a / s recommended in Section 2; this is to emphasize that the fitting procedure can be used with any a / s . Numerical Example 2. If the
O~i'S = (0.05, 0.55, 0.95)
and
the
t / s = (9, 15, 16),
(8a)
the procedure in Appendix A gives U = 2.051,
V = 16.022,
p = 3.249,
q = 0.571;
MSD = 6.5 × 10 - 6 .
(8b)
The original a;'s and the fitted a;'s implied by (8) are compared below:
ti
9
ai
0.050
0.550
0.950
Ft3(tilU, V, p, q) from (8)
0.048
0.552
0.947
15
21
(9)
Although the differences between the O/i'S and the Ft3(tilU, V, p, q)'s in (9) are still quite small, the fact that (8b) has a much larger MSD than (7b) nevertheless illustrates the fact that even though the four-parameter beta distribution needs to fit only three given fractiles, a perfect fit still may not exist. A more serious case of 'misfit' will be given in Numerical Example 4. Numerical Example 3. If we use the ai's recommended by Selvidge (1980), i.e., ai's = (0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99),
(10a)
44
H-S. Lau, C. Somarajan / European Journal of Operational Research 85 (1995) 39-52
and assume that t~'s = (10.8, 13.1, 14.5, 16.2, 17.7, 19.0, 21.1),
(10b)
the procedure in Appendix A gives U=4.35,
V=25.38,
p=11.12,
q=8.76;
M S D = 6 . 6 × 1 0 -8 .
(11)
For each of the t/s given in (10b), the values of a~ and F~(tiLU, V, p, q) agree to at least the third decimal place. Numerical Example 4. Using the same Selvidge's a~'s as in (10a), but assume that the t~'s are ti's = (9.6, 14.6, 16.9, 19.5, 22.5, 26.1, 35.5).
(12)
The procedure in Appendix A gives U = 6.251,
V = 128.553,
p = 8.714,
q = 69.140;
MSD = 6.5 >( 10 -4.
(13)
The original a~'s and the fitted ai's implied by (13) are compared below:
ti
9.6
ai
0.010
0.100
0.250
0.500
0.750
0.900
0.990
F~(tgIU, V, p, q)
0.000
0.096
0.258
0.497
0.741
0.912
0.998
14.6
16.9
19.5
22.5
26.1
35.5 (14)
The magnitudes of the MSD's in (11) and (13) suggest the following: the estimates (a/'s, ti'S) given in (10a) and (10b) correspond closely to a beta distribution, but the (ai's, ti's) given in (10a) and (12) do not correspond closely to a beta distribution. There are three alternatives for handling the situation of a large MSD presented in (13). First, the fit can be taken as it is, since the MSD-magnitude and the fit depicted in (14) are not unreasonable, even though they are significantly inferior to the corresponding figures in Numerical Example 3. Second, asserting that the estimate should follow the beta distribution, the expert could be asked to reconsider those estimates with the largest discrepancy between oti and F~(tilU, V, p, q); e.g., the ti'S for the 0.01 and 0.90 fractiles shown in (14). Third, one may attempt to obtain a better fit for the (ai's, ti's) given in (10a) and (12) with another theoretical distribution, as illustrated in the following section. These issues will be taken up again in Section 7.
4. Using a supplementary fitting distribution for full versatility The original developers of PERT (Malcolm et al., 1959) chose the beta distribution to represent task-time distributions because they assumed that the beta distribution can accommodate all the varieties of 'shapes' taken up by subjective task-time distributions. This assumption is repeated in most standard textbooks that cover PERT. We now show that a supplementary distribution is necessary to accommodate all the 'shapes' (as defined below) that can be assumed by subjective task-time distributions. For a random variable x, let/z x be its expected or mean value,/Zk(X) = E ( x - Ixx) k be its k-th central moment, a l ( X ) =/.L3(X)//l./~2(x)l'5 , /31(X) = [O~I(X)] 2, /32(X) = ]/~4(X)//l.t2(X) 2. t~ 1 (or/31 ) and/32 are respectively the standardized skewness and kurtosis indicators. In statistical theory it is known that the four main 'shape' or 'appearance' characteristics of a distribution are its location (/Xx), dispersion (/z 2 or variance), skewness (a 1 or /31) and kurtosis (/32). Different empirical distributions have different combinations of these four characteristics, which can vary independently among themselves. In order to be sufficiently versatile to model these four independently varying characteristics, a distribution must
H-S. Lau, C. Somarajan / European Journal of Operational Research 85 (1995) 39-52
45
have at least four parameters. For example, ft3 in (2) has four parameters. However, a distribution with four-parameter distribution does not necessarily have the capability to model all possible combinations of the four main distribution characteristics. K. Pearson (1895) first delineated the preceding principles, and he proceeded to develop his classical four-parameter 'Pearson system of distributions' that can collectively model all the possible combinations of the four main distribution characteristics. Since most distribution functions can handle any combination of location and dispersion, the standard way to explain and compare the ability of different distribution functions to handle different shapes is with a 'skewness-kurtosis' or '(/3~,/32)' diagram shown in Fig. 1 (reproduced from Fig. 6-1 in Hahn and Shapiro, 1967) and Fig. 2 (reproduced from Fig. 1 of Ramberg et al., 1979). Figs. 1 and 2 illustrate some well-known facts; e.g., the uniform, normal and exponential distributions can each have only one possible (/31,/32) combination (or 'point' in Fig. 1). The possible (/31,/32) combinations of a gamma diistribution (i.e., the Pearson 'Type III' in Fig. 1) or a lognormal distribution (see Fig. 2) are represented by lines; they are clearly still too restrictive. The Pearson system has three'main types' of distribution functions;
NOIqklAL
m
B2
2
131 Fig. 1. A (ill, t2 ) diagram.
4
H-S. Lau, C. Somarajan/ European Journal of Operational Research 85 (1995) 39-52
46
namely, Types I, IV and VI. Fig. 1 shows that these 'main types' of distributions cover two-dimensional 'areas' of (/31,/32) combinations. The Type-I distribution is now more commonly known as the beta distribution; its area of possible (/31,/32) combinations is subdivided into three sub-areas in Fig. 1: the shaded Type I area for bell-shaped beta distributions that have a distinct mode, and Types I(U) and I(J) sub-areas for respectively the U-and J-shaped beta distributions (the two latter sub-Types are quite unimportant in practice). For our purpose, the important point overlooked in the PERT literature is that while the Types I, IV and VI areas collectively cover the entire possible (/31,/32) area for all empirical distributions, the beta (or Type I) distribution by itself does not cover it. Specifically, it does not cover the (/31,/32) area below the gamma line. For example, if a probability distribution has (/31,/32) = (1,7) as represented by point P1 in Fig. 1, then it cannot be modeled by a beta distribution. There are several distribution functions that can cover either a larger (/31,/32) area than or the (/31,/32) area left off by the beta distribution. For example, the Schmeisen-Deutsch (1977) and Johnson (1949) distributions cover the entire feasible (/31,/32) area (below the 'Impossible Area'), and the Pearson Types IV and VI distributions together cover all the (/31,/32) areas left off by the beta distribution. For a comprehensive review of other versatile distributions with four or more parameters, see, e.g., Kendall and Stuart (1969), Ord (1972), and Schmeiser (1977). Different distributions have different advantages and disadvantages. For our purpose, we note that: (i) the estimated ai's and ti's will be fitted to the distribution function, therefore it is preferable that the distribution has a closed-form cdf or inverse cdf; (ii) an important approach of analyzing a stochastic network is simulation (see, e.g., Van Slyke, 1963), therefore it is preferable that random variates can be easily generated from the distribution. The Ramberg-Schmeiser (1974) distribution seems to fit our purposes best because: (i) it has a closed-from inverse cdf with parameters (a, b, c, d):
R(p)
=F~l(t)=a+
[pC--(l--p)d]/b,
0
(15)
(ii) the closed-form inverse cdf makes it very easy to generate random variate for simulation using the inverse transform method; (iii) it covers the shaded (/31,/32) area in Fig. 2, therefore it complements very well the beta distribution. A computerized procedure for fitting fractiles to the RS distribution is presented in Appendix B. However, given a set of fractile estimates (a;'s, ti's), it is not clear what the (/31,/32) of the fitting distribution will be; this is especially true when the number of fractile estimates is small. In other words, 0
1
2
$
4
EXP(~IENTIAL
Fig. 2. A (/3 z, ~2 ) diagram showing the Ramberg-Schmeiser region and the Iognormal line.
H-S. Lau, C. Somarajan / European Journal of OperationalResearch85 (1995) 39-52
47
given a set of fractile estimates, one cannot ascertain whether the beta or the Ramberg-Schmeiser (hereafter 'RS') distribution should be used for fitting. Fortunately, given today's computing power, this problem can be easily solved by fitting a set of given fractiles to both the beta and the RS distributions and choosing the one that gives the lowest MSD.
Numerical examples
Numerical Example 5. The same (ai's, t/s) given in Numerical Example 2 were fitted to an RS distribution with the procedure explained in Appendix B. The procedure found many sets of (a, b, c, d)-parameters for the RS distribution that give MSD's that are at least 100 times smaller than the MSD stated in (8b) for the beta fit. One example is a=9.386,
b=0.151,
c=0.070,
d=2.722,
MSD=5×10
-8.
(16)
Given (a, b, c, d), the mean and standard deviation of the corresponding RS distribution can be easily computed with formulas (B.1).
Numerical Example 6. The same (ag'S, tg's) given in Numerical Example 4 were fitted to an RS distribution. The result is a = 18.505,
b = -0.023,
c = -0.041,
d = -0.072,
MSD = 1.6 × 10 -6.
(17)
This MSD is more than 100 times smaller than the MSD for the beta fit stated in (13). The better fit provided here by the RS distribution can also be seen by the following table, which is the counterpart of (14) for comparing the original ai's with the fitted Fn(ti)'s. ti
9.6
14.6
16.9
19.5
22.5
26.1
35.5
ai
0.010
0.100
0.250
0.500
0.750
0.900
0.990
Fn(ti[a, b, c, d)
0.010
0.101
0.250
0.502
0.747
0.899
0.990
(18)
Since the RS fit has a considerably lower MSD than the beta fit given in (13), it should be chosen as the appropriate distribution for fitting the fractiles given in (10a) and (12). The ability of the RS distribution to give a substantially better fit than the beta in this case can be explained with the concepts reviewed in Section 4. Substituting the (a, b, c, d)-parameters in (17) into (B.1), (B.4) and (B.5) in Appendix B gives the skewness-kurtosis of the distribution as /3 t = 0.955;
/32 = 6.76.
(19)
Figs. 1 and 2 clearly show that this (/31,/32) is outside the beta region but is inside the RS area.
5. An approximation approach with closed-form formulas Although the fitting procedures in Sections 3 and 4 are mathematically correct and practical with today's computer resources, they lack the one important advantage of formulas (1), i.e., formulas (1) have closed-forms and can be easily presented and 'plugged in'. The is often the overriding factor, especially when one has to introduce the P E R T concept in (say) a required undergraduate operations managem e n t / m a n a g e m e n t science course. Based on Pearson and Tukey's (1965) numerical experiments, Perry and Greig (1975) proposed the following as substitutes for formulas (1): = To.5 + 0.185(To.o5 + T0.gs - 2T0.5),
o- = (T0.95 - 7"o.o5)/3.25.
(20)
48
H-S. Lau, C. Somarajan / European Journal of Operational Research 85 (1995) 39-52
Formulas (20) are superior to formulas (1) because: (i) they use fractiles consistently and the fractiles are explicitly defined; (ii) they do not require fractiles that are too extreme such as T O or T0.0fi (iii) Pearson and Tukey (1965) have shown that they are accurate for all bell-shaped beta distributions, whereas Gallagher (1987) have shown that formulas (1) are only valid for a very small subset of beta distributions. Therefore, formulas (20) should be used if one requires simple 'plug-in' formulas for an introductory course. However, formulas (20) do not provide T's distribution function, it also cannot benefit from Selvidge's approach of using more than 3 fractiles. Moreover, from extensive computerized numerical experiments, we have found that formulas (20) can become very inaccurate if T falls outside the beta (Type I) region shown in Fig. 1, whereas it has been shown in Section 4 that there is a very large feasible (/31,/32) area for T that is outside the beta region. Detailed results on the applicability of 'convenient' formulas such as (20) to random variables in different (/31,/32) regions will be reported separately.
6. Summary and conclusion
We proposed and justified logical procedures for implementing the two components of estimating task-time distributions for PERT: (i) use Selvidge's (1980) 7-fractile approach to estimate the fractiles of a task-time distribution; (ii) find the parameters of a best-fitting beta or RS distribution function by solving nonlinear programming problems that minimize the MSD between the fractiles and the fitted distribution function. The second component is conceptually straightforward; although it is not computationally simplistic, it can be readily performed with a canned program, and practitioners need not follow the technical details fully. The procedure provides a complete distribution function for the task time. If only the task-time's mean and variance are required, and simple formulas are essential, formulas (20) proposed by Perry and Greig (1975) are pedagogically superior to formulas (1). We pointed out in Section 4 that there exists many different mathematical distribution functions for any given combination of characteristics (/z x,/z 2,/31,/32). Although it is incorrect to claim that the beta distribution can handle all (tz x,/z2,/31,/32) combinations, it is reasonable (but not necessary) to use the beta distribution to model empirical distributions with (/31,/32) combinations falling within the shaded beta region depicted in Fig. 1. Similarly, using the RS distribution to handle (/31,/32) combinations below the gamma line is equal reasonable, but also equally arbitrary. The truth, of course, is that a real-life random variable probably does not follow any of the mathematical distributions. In choosing arbitrarily a distribution function such as (say) the RS instead of (say) a Johnson distribution to model (say) the (O~i'S, /i'S) considered in Numerical Examples 5 and 6, the implicit assumption is that there is little difference between the RS and Johnson functions with the same (/z x,/z2,/31, /32). However, this assumption is not always true (see, e.g., Pearson 1963). Therefore, if a set of (ai's, ti's) does not fit the arbitrarily chosen functions (the beta and RS distributions in our case) well, it is theoretically difficult to ascertain whether the poor fitting is due to: (i) inaccurate a n d / o r unreasonable estimates by the expert; or (ii) the selection of an inappropriate distribution function. A practitioner should always bear these issues in mind whenever a mathematical distribution function is used to model an empirical random variable.
Appendix A. Fitting a beta distribution to a given set of fractiles
In ft3(x) given in (2), U and V represent respectively the left and right end points of the beta distribution, p and q together control the distribution's skewness and kurtosis; the distribution becomes
H-S. Lau, C. Somarajan / European Journal of Operational Research 85 (1995) 39-52
49
an uniform distribution when p = q = 1, and the distribution approaches normality rapidly when both p and q exceed (say) 15. For practical purposes p and q very rarely exceed 50. See Chapter 24 of Johnson and Kotz (1970) for a comprehensive collection of properties of the beta distribution. Our solution procedure for solving the nonlinear programming problem stated in (5) can be considered in three components. Component 1: Generation o f initial solution points (U, V, p, q)
Assume that the given ai's ( a 1, a 2. . . . . a n) and ti's (tl, t 2 , . . . , t n) are ranked, i.e., t n > t n_ 1. The likely range of (U, V, p, q)-values are set as follows. The range of p and q is taken to be from 1 to 50. The upper bound of possible U-values is taken to be t 1 - 0.01, and the lower bound of possible V-values is taken to be t n + 0.01. Let 'the range of time estimates' be R t = t n - t v The lower bound of possible U-values is taken to be t~ - 2 R t, and the upper bound of possible V-values is taken to be t n + 2 R t. If one prefers to restrict U to be nonnegative, the lower bound of U-values can be taken as the manimum of 0 and t 1 - 2 R r For most practical situations it does not matter whether U is restricted to be n0nnegative, since even if the fitted U is negative, the area of the negative tail will be negligible; this is similar to the situation of using the normal distribution to model a nonnegative random variable with small coefficient of variation. Within the boundaries of (U, V, p, q) set above, the IMSL procedure G G U E S (executing the procedure of Aird and Rice, 1977) is used to generate 100 initial starting (U, V, p, q) solution points. This generation procedure optimizes the dispersion of the initial solution points within the bounded 4-dimensional space of (U, V, p, q). Incidentally, note that the (U, V, p, q) boundaries are set for the sole purpose of generating initial solution points. They do not restrict the freedom of the final solution; e.g., the final solution for (say) p can be greater than 50. Component 2: Numerical search for the optimal solution
With each initial solution obtained above, the unconstrained quasi-Newton minimization IMSL subroutine U M I N F is used to obtain a local optimum for the problem stated in (5). From the resultant 100 local optima, the one with the best MSD is selected as the final solution. We have used 100 initial points because we have found that the objective function in (5) has many local optima. However, we have also found that using 10 instead of 100 initial points gives Practically the same quality of final solution. Using more or less initial points is a trivial issue today, since the computer cost of implementing any management science procedure is usually insignificant compared to the human cost or the cost of making incorrect decisions. Component 3: Evaluating M S D at each iteration o f the numerical search
The IMSL subroutine B E T D F is used to compute the values of F~(t i) in (5).
Appendix B. Fitting a R a m b e r g - S c h m e i s e r distribution to a given set of fractiles
In R ( p ) given in (15), parameters c and d together control the distribution's skewness and kurtosis, while a and b are respectively the location and scale parameter. Similar to Appendix A, the procedure for fitting an RS distribution to a given set of fractiles also has three components.
H-S. Lau, C. Somarajan~European Journal of OperationalResearch 85 (1995)39-52
50
Generation of initial solution points (a, b, c, d) Ramberg and Schmeiser (1974) indicated that practically all RS distributions will have c and d such that - 0 . 3 3 < (c, d) < 1 and c + d < 1. We therefore consider initial points with 10 different (c, d) combinations within these limits. For each (c, d) combination, we consider 9 different (a, b) combinations, giving a total of 90 initial points. To generate the (a, b) combinations, one needs to set the probable range of a and b under each given (c, d) combination. Ramberg and Schmeiser (1974) have shown that the mean and standard deviation of an RS distribution are:
Iz = a + ( 1 / c 1 - 1 / d l ) / b , 0"2= { [ 1 / c 2 + 1 / d 2 - 2 " B ( c 1, dl) ] - ( 1 / c ~ -
1 / d l ) Z } / b 2,
(B.1)
where c 1 = c + 1, d I = d + 1, c z = 2c + 1, dE = 2d + 1, and B(x, y) is the beta function evaluated at
(x, y). Given the n ranked time estimates ti'S, compute R t = t n - t 1. Assuming that the upper and lower limits of/~ are t 1 and t n respectively, and that the upper and lower limits of or are Rt/1.6 and R t / l O respectively, these limits can be substituted into (B.1) to estimate the range of a and b. Note that the basic idea here is to generate reasonable initial points within roughly reasonable and conservative ranges.
Numerical search for the optimal solution When fitting fractiles to a beta distribution in Appendix A, we minimized Z ( F , ( t i) - ai) 2 as stated in (5). However, since the RS distribution has a closed-form inverse cdf but not a closed-form cdf, we capitalized this property by minimizing instead (see (15)) M S D ' = ( l / n ) × ~[ R(otila, b, c, d) - - t i ] 2.
(B.2)
As in the beta fit, the IMSL subroutine U M I N F is used to obtain a local minimum of (B.2) for each initial point. Among the 90 local solutions, the set of (a, b, c, d) that gives the lowest MSD' is selected as the final solution. However, in order to compare the qualities of a beta and an RS fit, the MSD (but not MSD') of the RS fit is needed to compare with the MSD of the beta fit. Therefore, after the final (a, b, c, d) is obtained, the value of the fitted RS fractile Fn(t ~) corresponding to a given t~ is obtained by finding the root to the equation
a + [pC-(1-p)a]/b-ti=O.
(B.3)
This root is obtained with the IMSL subroutine ZBREN. After the roots Fn(t;)'s are obtained, the MSD of the RS fit is simply {E[Fn(t i) - ai]2}//n.
Evaluating MSD' at each iteration of the numerical search The values R ( o t i) needed in (B.2) can be rapidly computed since a closed-form function, (15), exists in this case. This is a major computational advantage of fitting fractiles to an RS distribution (instead of, say, to a beta distribution).
Computing the mean and central moments of an RS distribution Ramberg and Schmeiser (1974) gave the following formulas for computing the third and fourth central moments of an RS distribution from its parameters (a, b, c, d):
H-S. Lau, C. Somarajan /European Journal of Operational Research 85 (1995) 39-52
N3=b-3X(
[1/C 3
-
-
2 " B ( c 2, d , ) + 3 "B(c 1, d2)
-
1/d3]
- 3 [ 1 / c 2 - 2"B(Cl, all) + l/d2] [1/c,- l/d1) ] + 2[1/c I - 1/dl)] 3} P'4 = b - 4 X ( [ 1 / c 4 - 4 - B ( c 3 , -4[1/c
3 -3"B(c
dl)+
6"B(c2,
51
d2) - 4 . B ( C l ,
(B.4)
d3) + l / d 4 ]
2, d~) + 3 . B ( C l , d 2 ) - l / d 3 ] [ 1 / c ~ -
1/d,]
+611/c2- 2 .B(q, d l ) + 1/d2][1/c,- 1/d~] 2
(B.5)
- - 3 [ 1 / c I -- 1 / d l ] 4}
w h e r e c 3 = 3c + 1, c 4 = 4 c + 1, d 3 = 3d + 1,
d 4 =
4 d + 1; Cl, cz, dl, d 2 and B(x, y ) are defined in (B.1).
References Aird, T., and Rice, J. (1977), "Systematic search in high-dimensional sets", SIAM Journal on Numerical Analysis 14, 296-312. Alpert, M., and Raiffa, H. (1982), "A progress report on the training of probability assessors", in: D. Kahneman, P. Slovic and A. Tversky (eds.), Judgement under Uncertainty: Heuristics and Biases, Cambridge University Press, New York. Box, M. (1966), "A comparison of several current optimization methods, and the use of transformations in constrained problems", Computer Journal 9, 67-77. Buffa, E., and Miller, J. (1979), Production-Inventory Systems, Irwin, IL. Burt, J., and Garman, M. (1971), "Monte Carlo techniques for stochastic PERT network analysis", Infor 9, 248-262. Chase, R., and Aquilano, N. (1989), Production and Operations Management, Irwin, IL. Chesley, G. (1975), "Elicitation of subjective probabilities: A review", The Accounting Review 60/2, 325-337. Donaldson, W. (1965), "Estimation of the mean and variance of a PERT activity time", Operations Research 13, 382-385. Grubbs, F. (1962), "Attempts to validate some PERT statistics or "Picking on PERT" ", Operations Research 10, 912-915. Hahn, G., and Shapiro, S. (1967), Statistical Models in Engineering, Wiley, New York. Hampton, J., Moore, P., and Thomas, H. (1973), "Subjective probability and its measurements", Journal of the Royal Statistical Society Series A 136/1, 21-42. Healy, T. (1971), "Activity subdivision and PERT probability statements", Operations Research 9, 341-348. Hiilier, F., and Lieberman, G. (1986), Introduction to Operations Research, Holden-Day, San Francisco CA. IMSL (1987), "User's Manual", IMSL, Houston, TX. Johnson, N. (1949), "Systems of frequency curves generated by methods of translation", Biometrika 36, 149. Johnson, N., and Kotz, S. (1970), Continuous Univariate Distributions, Vol. 2, Houghton Mifflin, New York. Kendall, M., and Stuart, A. (1969), The Advanced Theory of Statistics, Hafner, New York. Lichtenstein, S., Fischhoff, B., and Phillips, L. (1982), "Calibration of probabilities: The state of the art to 1980", in: D. Kahneman, P. Slovic and A. Tversky (eds.), Judgement under Uncertainty: Heuristics and Biases, Cambridge University Press, New York. MacCrimmon, K., and Ryavec, C. (1964), "Analytical study of PERT assumptions", Operations Research 12, 16-36. Malcolm, D., Roseboom, J., Clark, C., and Fazar, W. (1959), "Application of a technique for research and development program evaluation", Operations Research 7, 646-669. Markland, R., and Sweigart, J. (1987), Quantitative Methods: Applications to Managerial Decision Making, Wiley, New York. Moder, J., and Rodgers, E. (1968), "Judgement estimates of the moments of PERT type distributions", Management Science 15, B76-B83. Murphy, A., and Winkler, R. (1974), "Subjective probability forecasting experiments in meteorology: Some preliminary results", Bulletin of the American Meteorological Society 55, 1206-1216. Ord, J. (1962), Families of Frequency Distributions, Hafner, New York. Pearson, E. (1963), "Some problems arising in approximating to probability distributions, using moments", Biometrika 50, 95-112. Pearson, E., and Tukey, J. (1965), "Approximate means and standard deviations based on distances between percentage points of frequency curves", Biometrika 52, 533-546. Pearson, K. (1895), "Skew variation in homogeneous material", Philosophical Transactions of the Royal Society 186, 343-414. Perry, C., and Greig, I. (1975), "Estimating the mean and variance of subjective distributions in PERT and decision analysis", Management Science 21, 1477-1480. Ramberg, J., Dudewicz, E., Tadikamalla, P., and Mykytka, E. (1979), "A probability distribution and its uses in fitting data", Technometrics 21,201-214.
52
H-S. Lau, C. Somarajan / European Journal of Operational Research 85 (1995) 39-52
Ramberg, J., and Schmeiser, B. (1974), "An approximate method for generating asymmetric random variables", Communications of the ACM 17/2, 78-82. Robillard, P., and Trahan, M. (1977), "The completion yime of PERT networks", Operations Research 25/1, 15-29. Sasieni, M. (1986), "A note on PERT times", Management Science 32/12, 1652-1653. Schaefer, R., and Borcherding, K. (1973), "The assessment of subjective probability distribution: A training experiment", Acta Psychologica 37, 117-129. Schmeiser, B., and Deutsch, S. (1977), "A versatile four parameter family of probability distributions suitable for simulation", AIIE Transactions 9, 176-181. Selvidge, J. (1975), "Experimental comparison of different methods for assessing the extremes of probability distributions by the fractile method", Report 75-13, Management Science Report Series, Graduate School of Business Administration, University of Colorado. Selvidge, J. (1980), "Assessing the extremes of probability distributions by the fractile method", Decision Sciences 11, 493-502. Solomon, I. (1982), "Probability assessment by individual auditors and audit teams: An empirical investigation", Journal of Accounting Research 20, 689-710. Spetzler, C., and Stael von Holstein, C.-A. (1975), "Probability encoding in decision analysis", Management Science 22, 340-358. Stevenson, W. (1990), Production/Operations Management, Irwin, IL. Swanson, L., and Pazer, H. (1971), "Implications of the underlying assumptions of PERT", Decision Sciences 2, 461-480. Van Slyke, R. (1963), "Monte Carlo methods and the PERT problem", Operations Research 11,839-860. Vatter, P., Bradley, S., Frey, S., and Jackson, B. (1978), Quantitative Methods in Management, Irwin, IL. Wallsten, T., and Budescu, D. (1983), "Encoding subjective probabilities; a psychological and psychometric review", Management Science 29, 151-173. Winkler, R. (1967), "The assessment of prior distributions in Bayesian analysis", Journal of the American Statistical Association 62, 776-800. Winterfeldt, D., and Edwards, W. (1986), Decision Analysis and Behavioral Research, Cambridge University Press, Cambridge, MA.