Evaluation Of Different Imputation Methods

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Evaluation Of Different Imputation Methods as PDF for free.

More details

  • Words: 3,912
  • Pages: 17
5.4

Evaluation of Different Imputation Methods

To determine the effect of nonresponse rates in the results for each imputation method (IM), evaluation of different IMs was performed. In the evaluation of the different IMs, the results of each IM will be discussed independently. For each IM, the discussion of results will go as follows: (1) bias of the mean of the imputed data, (2) distribution of the imputed data using the Kolmogorov-Smirnov Goodness of Fit Test, and (3) other measures of variability using the mean deviation (MD), mean absolute deviation (MAD), and root mean square deviation (RMSD).

The table of results will contain the following columns: (a) variable of interest (VI), (b) nonresponse rate (NRR), (c) the bias of the mean of the imputed data, Bias ( y ' ), (d) percentage of correct distribution of the imputed data to the actual data set out of 1000 trials (PCD) , (e) MD, (f) MAD, and (g) RMSD.

5.4.1 Overall Mean Imputation Table 8 shows the results of the different criteria in evaluating the imputed data using the OMI method. Table 8: Criteria results for the OMI method (a) VI TOTEX2

TOTIN2

(b) NRR 10% 20% 30% 10% 20% 30%

(c) BIAS( y ' ) 640.66 499.43 -222.76 -597.84 -2855.49 -6093.27

(d) PCD 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%

(e) MD -6406.60 -2497.14 20310.91 5978.39 14277.43 742.53

(f) MAD 56929.61 59555.36 90396.26 77502.27 87469.87 62388.11

(g) RMSD 108547.82 119193.32 271775.35 167206.24 244758.00 151740.94

1. Bias of the mean of the imputed data In (c) of Table 8, results show that for the bias of the mean of the imputed data, as the NRR increases, the bias for TOTEX2 slowly decreases in magnitude. The decrease in magnitude of the respondents’ mean as NRR increase is the rationale behind the decrease of the bias of the mean of the imputed data. As the magnitude of the respondents’ mean decreases, variability caused by imputing a single value (i.e. the mean of TOTEX1, the total expenditure of the first visit data, which is equal to 105566.9) that is higher than the mean of the actual data set also decreases.

On the other hand, the results shown for TOTIN2 are the opposite of TOTEX2 as NRR increases. The bias of the mean of the imputed data for TOTIN2 rapidly increases in magnitude as NRR increases. The rationale for this is the decrease in magnitude of the respondents’ mean as NRR increases. However, unlike in TOTEX2, the imputed values (i.e. the mean of TOTIN1, the total income for the first visit data, which is equal to 121820.7) are much lower than the actual mean of the data set.

2. Distribution of the Imputed Data Results in column (e) of Table 8 showed that in all NRRs and VIs, the OMI method failed to maintain the distribution of the actual data. This was expected primarily because for each missing observation for the VIs, the observations were replaced by a single value which is the overall mean of the first visit of the VIs.

Results from related studies that performed OMI stated that this method is one of the worst among all IM since it distorts the distribution of the data. The distribution of the data becomes too peaked which makes this method unsuitable for many post-analyses. (Cheng & Sy, 1999)

3. Other measures of Variability The three criteria in Table 8 under the columns (f), (g) and (h) show the other measures of variability of the imputed data. The values for the MAD and RMSD are increasing in magnitude as NRR increases for TOTEX2. The data which have the highest percentage of imputed values have the highest values for the three measures of variability in TOTEX2. It’s worth noting that a huge increase in magnitude is seen in all the three criterions from the twenty to thirty percent NRR for TOTEX2.

For TOTIN2, the data which have twenty percent imputed observations have the highest values in all the three measures of variability. Unlike for TOTEX2, surprisingly, values from the three measures of variability under the highest NRR have the lowest results.

5.4.2

Hot Deck Imputation

Table 9 shows the results of the different criteria in evaluating imputed data using the hot deck imputation (HDI3) method with three imputation classes.

Table 9: Criteria results for the HDI3 method (a) VI TOTEX2

TOTIN2

(b) NRR 10% 20% 30% 10% 20% 30%

(c) BIAS( y ' ) 491.91 179.42 -606.37 -717.52 -3095.41 -6508.65

(d) PCD 100.00% 96.90% 0.00% 100.00% 100.00% 1.00%

(e) MD 4919.40 897.18 -2021.19 -7175.25 -15477.09 -21695.52

(f) MAD 78071.61 78292.63 81395.79 105369.15 111748.04 115087.13

(g) RMSD 79251.22 67149.16 71390.65 242022.99 297151.50 313814.92

1. Bias of the mean of the imputed data Similar to the results in the OMI method for the TOTIN2 variable, as the NRR increases, the bias of the mean of the imputed data rapidly increases. In the TOTEX2 variable, the biases fluctuated as the NRR increases. For TOTEX2 and TOTIN2, the data with the highest NRR has the largest bias. For the TOTEX2 variable, the data with twenty percent NRR provided the least bias. On the other hand, the data with the lowest NRR yielded the smallest bias for TOTIN2. 2. Distribution of the Imputed Data Results in column (e) shows that in TOTIN2, the data which contained ten and twenty percent imputation of the total number of observations, maintained the distribution of the actual data. In TOTEX2, only the data which contained ten percent imputations of the total number of observations maintained the distribution of the actual data for all the one thousand data sets. In the data which contained twenty percent imputations of the total number of observations, 969 out of the 1000 data sets maintained the distribution of the actual data set.

For TOTEX2 and TOTIN2, the data with the highest number of imputed observations failed to maintain the distribution of the actual data. Much worse, none of the simulated data set for TOTEX2 registered the same distribution as the actual. On the other hand, only a lone data set maintained the same distribution as the actual. The researchers look into the possibility that more than one recipient are having the same donor.

3. Other measures of variability The three criteria in Table 9 under the columns (f), (g) and (h) show the other measures of variability of the imputed data. For the variable TOTEX2, the following results were obtained: (i) data that contains twenty percent imputed value yielded the least values for the MD and RMSD, (ii) the data with the lowest number of imputations yielded the largest value for MD and RMSD and (iii) MAD is the only criterion which the values are increasing as NRR increases.

For the variable TOTIN2, the following results were obtained: (i) all the three criteria increases as NRR increases, (ii) results for the three criteria were larger than for TOTEX2, and (iii) the data with the largest number of imputations generated the highest value in the three criteria.

2.4.3

Deterministic Regression Imputation

Table 10 shows the results of the different criteria in evaluating the imputed data using the deterministic regression imputation method with three imputation classes (DRI3).

Table 10: Criteria results for the DRI3 method (a) VI TOTEX2

TOTIN2

(b) NRR 10% 20% 30% 10% 20% 30%

(c)

(d) PCD 536.32 100.00% 1080.12 98.40% 398.39 100.00% 897.11 100.00% -1815.39 100.00% 356.50 100.00%

BIAS( y ' )

(e) MD 5363.47 5400.71 1328.06 9043.98 -9076.98 1188.31

(f) MAD 33683.48 33782.60 32449.49 51363.17 57429.24 51886.73

(g) RMSD 70553.64 72487.39 72803.60 106374.39 148278.49 131429.61

1. Bias of the mean of the imputed data Looking at Table 10, column (c), the bias of the VI is increasing in magnitude as the NRR increases for TOTEX2 and TOTIN2. Compared to OMI and HDI3 where the bias increases tremendously as NRR increases, the increase in bias for DRI3 is much slower. The bias of the data with twenty percent NRR is just twice the bias of the data set with ten percent NRR. For TOTEX2, this method produces larger bias for the mean of the imputed data in all NRR than the OMI and HDI3.

2. Distribution of the Imputed Data Contrary to the results in the OMI method under this criterion, results in column (e) shows that the imputed data maintained the distribution of the actual data in all NRR and VIs. It is even much better than HDI since all of the imputed data sets under all the NRRs and VIs preserved the same distribution as the actual data. It is

interesting to note that the regression models that were used in this study did not show the expected results that were mentioned in the related literature and provided a distinct result. Earlier studies that made use of categorical auxiliary variables, the matching variables that were transformed into dummy variables, concluded that DRI is just the same as the mean imputation. However, in this study, the independent variable was the first visit VIs and for each imputation class there is a fitted model which registered a good R2.

3. Other measures of variability The three criteria in Table 10 under the columns (f), (g) and (h) show the other measures of variability of the imputed data. For these criteria, the following results were obtained: First, results from the three criteria are almost stable as NRR increases for TOTEX2 and TOTIN2. The rate of change of the values for MD, MAD and RMSD is minimal compared to OMI and HDI3. Second, the MAD and RMSD have smaller values than for OMI and HDI3 for TOTEX2 and TOTIN2. Fitting models with high R2 was the key factor that made this method better than the other two IM previously evaluated.

4.2.4

Stochastic Regression Imputation

Table 11 shows the results of the different criteria in evaluating the imputed data using the stochastic regression imputation method with three imputation classes (SRI3).

Table 11: Criteria Results for the SRI3 method (a) VI TOTEX2

TOTIN2

(b) NRR 10% 20% 30% 10% 20% 30%

(c)

(d) PCD 536.32 100.00% 1080.12 98.40% 398.39 100.00% 897.11 100.00% -1815.39 100.00% 356.50 100.00%

BIAS( y ' )

(e) MD 5363.47 5400.71 1328.06 9043.98 -9076.98 1188.31

(f) MAD 33683.48 33782.60 32449.49 51363.17 57429.24 51886.73

(g) RMSD 70553.64 72487.39 72803.60 106374.39 148278.49 131429.61

1. Bias of the mean of the imputed data Looking at Table 11, column (c), for TOTEX2 and TOTIN2, values produced for this method yielded much better results than for DRI3. The bias for TOTEX2 and TOTIN2 do not follow the same scenario for the previous three method that as the NRR increases, the bias increases. The biases fluctuate from one NRR to another. Compared to the three previously evaluated, this method provided the least bias in the highest NRR for both TOTEX2 and TOTIN2. While the other methods reached a four digit bias, SRI3 generated only a three digit bias. Moreover, there is a huge disparity in the third NRR where it only produced less than twenty percent of the bias produced by its deterministic counterpart.

2. Distribution of the imputed data Results from the SRI3 performed better than HDI3 which also simulated the data 1000 times. Unlike in HDI3, SRI3 maintained the same distribution for all imputed data sets for the first and third nonresponse rates. The SRI3 also outperformed HDI3 for the twenty percent NRR. In earlier studies, the stochastic regression imputation performs better than any of the three methods used here.

The random residual was added to the deterministic predicted value to preserve the distribution of the data.

3. Other measures of variability The three criteria in Table 10 under the columns (f), (g) and (h) show the other measures of variability of the imputed data. For this criteria, the following results were obtained: First, similar to the results in measuring the bias of the mean of the imputed data, results in TOTIN2 for all the criteria fluctuates from one NRR to another. Second, in TOTEX2, only the RMSD criterion increase as NRR increases while the MAD and MD fluctuates from one NRR to another. Third, the data with the highest NRR yielded the lowest results for the MD criterion. Fourth, for TOTIN2, the data with twenty percent NRR yielded the largest values for the three criteria.

5.5 Distribution of the True vs. Imputed Values To provide additional information on the distribution of the imputed data that was discussed previously, the distribution of the true (deleted) values (TVs) and the imputed values (IVs) from each of the IMs for all the VIs and NRRs were obtained. Table 12, 13, and 14 shows the frequency distribution of the methods with their corresponding relative frequencies (RFs) for the first, second, and third NRR respectively. The RFs’ for the 1000 simulated data set from HDI3 and SRI3 were averaged. The first column represents the VIs frequency classes (FCs). This was the same classes that were used in the Kolomogorov - Smirnov Goodness of

Fit Test in determining the estimated percentage of similar distributions of the imputed data. For each NRR, the table containing the distribution of the actual and imputed values will go as follows: (a) VIs, (b) FCs, (c) RFs of the TVs (TV), (d) RFs of the OMI (OMI), (e) RFs of the HDI3 (HDI3), (f) RFs of the DRI3 (DRI3), and (g) RFs of the SRI3 (SRI3).

Table 12: Distribution of the TVs and IVs: 10% NRR 10% NRR (a) VI

(b) FCs

<37869.5 37869.5 – 47056.5 47056.5 – 54922.0 54922.0 – 62365.0 63265.0 – 73868.0 TOTEX2 73868.0 – 86103.0 86103.0 - 101947.0 101947.0 - 126254.5 126254.5 - 169964.0 >169964 (a) VI

(b) FCs

(c) TV 10.90% 9.70% 9.70% 11.40% 8.70% 9.70% 10.90% 11.10% 9.00% 8.90% (c) TV

IMs (d) OMI 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00%

(e) HDI3* 13.90% 10.20% 9.70% 8.90% 9.10% 9.40% 9.40% 8.90% 8.90% 11.60%

(f) DRI3 7.70% 8.70% 11.40% 12.30% 11.10% 12.60% 8.00% 11.40% 9.00% 7.70%

(g) SRI3* 9.50% 8.70% 6.10% 9.50% 11.40% 11.10% 11.10% 8.50% 12.20% 12.10%

IMs

(d) (e) (f) (g) OMI HDI3* DRI3 SRI3* <40570 9.70% 0.00% 15.10% 6.10% 9.10% 40570.0 – 51564.0 10.20% 0.00% 11.90% 8.70% 7.90% 51564.0 – 62006.5 9.40% 0.00% 10.10% 14.50% 8.30% 62006.5 – 73900.5 10.20% 0.00% 9.50% 10.70% 10.00% 73900.5 – 88127.0 9.00% 0.00% 9.60% 12.80% 12.40% TOTIN2 88127.0 - 104801.0 10.90% 0.00% 9.30% 9.20% 9.00% 104801.0 - 128000.0 11.90% 100.00% 9.80% 9.90% 10.50% 128000.0 - 161669.0 11.40% 0.00% 7.80% 11.10% 9.30% 161669.0 - 233907.0 7.70% 0.00% 8.00% 10.70% 11.20% >233907 9.90% 0.00% 8.90% 6.30% 12.30% * RF for each class was obtained by taking the average of the 1000 simulated data set.

Table 13: Distribution of the TVs and IVs: 20% NRR 20% NRR (a) VI

(b) FCs

<37869.5 37869.5 - 47056.5 47056.5 - 54922.0 54922.0 - 62365.0 63265.0 - 73868.0 TOTEX2 73868.0 - 86103.0 86103.0 - 101947.0 101947.0 - 126254.5 126254.5 - 169964.0 >169964

(c) TV 9.40% 9.70% 11.60% 10.00% 9.60% 8.40% 9.60% 11.30% 9.70% 10.70%

IMs (d) OMI 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00%

(e) HDI3* 14.30% 10.40% 9.70% 9.00% 9.20% 9.40% 9.30% 8.70% 8.70% 11.30%

(f) DRI3 7.40% 9.60% 9.00% 11.00% 12.30% 12.50% 9.90% 10.80% 8.80% 8.70%

(g) SRI3* 8.20% 7.60% 8.20% 7.90% 10.30% 11.90% 10.30% 11.80% 11.70% 12.10%

IMs (d) (e) (f) (g) OMI HDI3* DRI3 SRI3* <40570 10.00% 0.00% 15.70% 4.80% 11.80% 40570.0 - 51564.0 10.30% 0.00% 12.10% 11.90% 12.20% 51564.0 - 62006.5 11.70% 0.00% 10.10% 10.20% 11.30% 62006.5 - 73900.5 10.20% 0.00% 9.60% 11.70% 9.90% 73900.5 - 88127.0 8.60% 0.00% 9.50% 11.90% 8.50% TOTIN2 88127.0 - 104801.0 9.40% 0.00% 9.30% 9.60% 10.10% 104801.0 - 128000.0 9.10% 100.00% 9.70% 11.70% 9.00% 128000.0 - 161669.0 9.20% 0.00% 7.60% 9.80% 8.30% 161669.0 - 233907.0 11.30% 0.00% 7.80% 9.70% 8.90% >233907 10.20% 0.00% 8.70% 8.70% 10.10% * RF for each class was obtained by taking the average of the 1000 simulated data set. (a) VI

(b) FCs

(c) TV

Table 14: Distribution of the TVs and IVs: 30% NRR

30% NRR (a) VI

(b) FCs

<37869.5 37869.5 - 47056.5 47056.5 - 54922.0 54922.0 - 62365.0 63265.0 - 73868.0 TOTEX2 73868.0 - 86103.0 86103.0 - 101947.0 101947.0 - 126254.5 126254.5 - 169964.0 >169964 (a) VI

(b) FCs

(c) TV 9.80% 8.80% 9.60% 9.50% 11.00% 10.70% 10.70% 9.40% 11.00% 9.50% (c) TV

(d) OMI 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00%

IMs (e) (f) HDI3* DRI3 14.30% 7.80% 10.40% 9.00% 9.70% 9.40% 8.90% 10.80% 9.20% 12.70% 9.40% 11.50% 9.40% 12.10% 8.70% 8.80% 8.70% 9.00% 11.30% 9.00%

(g) SRI3* 10.30% 9.60% 8.30% 9.30% 10.10% 10.60% 9.80% 10.10% 8.10% 13.70%

IMs

(d) (e) (f) (g) OMI HDI3* DRI3 SRI3* < 40570 9.40% 0.00% 15.60% 6.50% 8.90% 40570.0 - 51564.0 9.00% 0.00% 12.10% 10.40% 8.20% 51564.0 - 62006.5 9.90% 0.00% 10.10% 10.80% 8.80% 62006.5 - 73900.5 10.70% 0.00% 9.60% 11.50% 10.10% 73900.5 - 88127.0 10.20% 0.00% 9.50% 12.20% 11.00% TOTIN2 88127.0 - 104801.0 10.30% 0.00% 9.30% 10.70% 10.20% 104801.0 - 128000.0 10.30% 100.00% 9.70% 10.50% 10.40% 128000.0 - 161669.0 9.80% 0.00% 7.60% 11.20% 10.80% 161669.0 - 233907.0 10.70% 0.00% 7.70% 8.20% 10.30% >233907 9.90% 0.00% 8.70% 8.00% 11.30% * RF for each class was obtained by taking the average of the 1000 simulated data set.

In all NRR, the results clearly illustrate the distortion of the distribution. Since the OMI method assigns the mean of the first visit VI to all the missing cases, all the data sets

concentrated in one particular frequency class. The three other methods which implemented imputation classes, gave a better outcome than OMI by spreading the distribution of the imputed data.

For the HDI method, in all nonresponse rates, most of the imputed observations clustered in the first frequency class, that is less than 37859.5 for TOTEX2 and 40570.0 for TOTIN2. The clustering was also formed for the first and third nonresponse rate in last frequency class for TOTEX2 and for the all nonresponse rates in second frequency class for TOTIN2. The percentage of the data from the lowest class for TOTEX2 and TOTIN2, for all nonresponse rate ranges from 14-16% as compared to the actual percentage which only ranges from 9-11%.

While there is an over representation of the data for HDI3, an under representation was observed from the interval 86103-126254.5 for the 10% and 20% nonresponse imputed data sets respectively and from the interval 63265-101947 for the 30% nonresponse imputed data sets. The percentage from the interval indicated for the 10% and 20% under the actual data totaled about 30% while the imputed data only totaled less than 30%.

For the two regression imputation methods, unlike hot deck and OMI which had major cluster, produced more spread distribution although there are some areas that are under represented. The failure to consider a random residual term in deterministic regression resulted into a severe under representation of the data in particular the first frequency class. On the other hand, the SRI which considered a random residual provided better

results than DRI. However, there are some areas that the added random produced significant excess mostly from the last frequency class.

5.6

Choosing the best imputation method

For this section, the rankings of all the tests are the basis to determine which of the following IMs will be chosen as the best IMs for this particular study and data. The selection of the best method will be independent for all VIs and NRRs. The ranking are based on a four-point system wherein the rank value of 4 denotes the worst IM for that specific criterion and 1 denotes the best IM for that criterion. In case of ties, the average ranks will be substituted. The IM with the smallest rank total will be declared the best IM for the particular VI and NRR. The ranking of IM will cover the following criteria: (a) Bias of the mean of the imputed data (N.B.), (b) percentage of correct distributions (PCD), and (c) Other measures of variability, namely, MD, MAD and RMSD. All in all, there are five criteria that each IM will be rank in.

Tables 15, 16 and 17 show the ranking of the different imputation methods for the 10%, 20% and 30% NRR respectively. For each NRR, the table containing the rankings of the IMs will go as follows: (a) VIs, (b) Criteria, (c) OMI, (d) HDI3, (e) DRI3, and (f) SRI3.

Table 15: Ranking of the Different IMs: 10% NRR 10% NRR VI

CRITERIA

IMs OMI HDI3 DRI3 SRI3

N.B. PCD MD TOTEX2 MAD RMSD TOTAL Category Rank VI

CRITERIA

N.B. PCD MD TOTIN2 MAD RMSD TOTAL Category Rank

3 4 3 3 4 17 4th

1 1.3 1 4 3 10.3 2nd

4 1.3 4 1 1 11.3 3rd

2 1.3 2 2 2 9.3 1st

IMs OMI HDI3 DRI3 SRI3 1 2 4 3 4 1.3 1.3 1.3 1 2 4 3 3 4 1 2 3 4 1 2 12 13.3 11.3 11.3 3rd 4th 1st 1st

Table 16: Ranking of the Different IMs: 20% NRR 20% NRR VI

CRITERIA

N.B. PCD MD TOTEX2 MAD RMSD TOTAL Category Rank VI

CRITERIA

N.B. PCD MD TOTIN2 MAD RMSD TOTAL Category Rank

IMs OMI HDI3 DRI3 SRI3 2 1 4 3 4 3 1 2 2 1 4 3 3 4 1 2 4 2 1 3 15 11 11 13 4th 1st 1st 3rd IMs OMI HDI3 DRI3 SRI3 3 4 2 1 4 1.3 1.3 1.3 3 4 2 1 3 4 1 2 3 4 1 2 16 17.3 7.3 7.3 3rd 4th 1st 1st

Table 17: Ranking of the different IMs: 30% NRR 30% NRR VI

CRITERIA

IMs OMI HDI3 DRI3 SRI3

N.B. PCD MD TOTEX2 MAD RMSD TOTAL Category Rank VI

CRITERIA

N.B. PCD MD TOTIN2 MAD RMSD TOTAL Category Rank

1 4 1 3 4 13 3rd

3 3 3 4 2 15 4th

4 2 1.5 1.5 4 2 1 2 1 3 11.5 10.5 2nd 1st

IMs OMI HDI3 DRI3 SRI3 3 4 2 1 4 3 1.5 1.5 3 4 2 1 3 4 1 2 3 4 1 2 16 19 7.5 7.5 3rd 4th 1st 1st

Rankings show that the two regression IMs provided better results than their model-free counterparts. For all the nonresponse rates under the TOTIN2 variable, the two regression imputation methods tied as the best IM, and surprisingly the HDI finished the worst IM behind OMI. Under the TOTEX2 variable, mixed rankings were seen for all nonresponse rates. The regression methods still provided good results. The SRI method finished first in the 10% and 30% NRR and ranked third in the 20% NRR while the DRI method finished third, first and second in the 10%, 20% and 30% NRR respectively. While the HDI was seen as the worst IM for TOTIN2, the OMI was concluded the worst IM for TOTEX2 by ranking last for both 10% and 20% NRR and third for the 30% NRR.

In conclusion, the best imputation method for this study is the SRI3 using the 1997 FIES data. It is very closely followed by the DRI3 method. No records in the results show that SRI3 method ranked last in all the criteria, NRRs and VIs, unlike for DRI3 which provided the worst IM in the bias of the mean of the imputed data and MD criteria. The

researchers selected the HDI3 as the worst IM in this study. The HDI3 method fared the worst in most of the criteria in particular to the other measures of variability in the 20% and 30% NRR.

Related Documents