Comparison Of Imputation Methods

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Comparison Of Imputation Methods as PDF for free.

More details

  • Words: 1,361
  • Pages: 4
E. Comparison of Imputation Methods E.1.Accuracy and Precision of the imputed data

1.3. Kalton’s measures of the effectiveness of the four IMs for imputing for nonresponse observations under TOTEX2 and TOTIN2 TOTEX2, 10% NONRESPONSE RATE METHODS OMI HDI3 DRI3 SRI3

MD -6406.603632 4919.395646 -7204.560545 5363.466919

Rank 3 1 4 2

MAD 56929.613669 78071.611061 23839.817150 33683.480259

Rank 3 4 1 2

RMSD 108547.820041 79251.216593 57726.615582 70553.643550

Rank 4 3 1 2

In the table above, the mean deviations vary between the various IMs. The two IMs namely OMI and DRI3 generated negative mean deviations indicating that the imputed values underestimated the actual (deleted) values of TOTEX2. On the other hand, HDI3 and SRI3 which used randomization generated positive mean deviations, indicating that the on average imputed values overestimated the actual (deleted) values of TOTEX2. The average overestimation of HDI3 fared the best in generating unbiased estimates. On the other hand, DRI3 finished last in generating unbiased estimates. All of the imputation methods generated more biased estimates than the bias in estimating the populations mean ignoring the nonresponse values. The hot-decking of the values in HDI3 and random residuals in SRI3 seemed effective. The mean absolute deviation (MAD) and the root mean square deviations (RMSD) of the DRI3 procedure fares best in terms of these values, having the smallest measures on both of them. The table above shows that the two procedures namely the DRI3, and SRI3, which used a regression model, yielded the best estimates for both measures. Surprisingly, HDI3 fared the worst in MAD and third in RMSD while OMI fared third in MAD and last in RMSD. TOTIN2, 10% NONRESPONSE RATE METHODS OMI HDI3 DRI3 SRI3

MD 5978.393462 -7175.254063 -11284.461504 9043.982223

Rank 1 2 4 3

MAD 77502.270650 105369.153855 32115.804981 51363.168122

Rank 3 4 1 2

RMSD 167206.240181 242022.994540 77228.476946 106374.388796

Rank 3 4 1 2

In the figure above, the mean deviations vary from method to another. Unlike in TOTEX2 with the same nonresponse rate, the two methods HDI3 and DRI3 generated

negative estimates for MD indicating that the imputed values underestimated the actual values. On the other hand, the two other methods namely OMI and SRI3 generated positive estimates for MD indicating that the imputed values overestimated the actual values. While in TOTEX1 the OMI fared the second to the worst in generating unbiased estimates, it generated the best unbiased estimate. The DRI3 procedure had the same result for TOTIN2 having the largest value in MD. Both procedures which used models in imputing missing values fared worst than the other imputation procedures. Almost same results were generated in TOTIN2 except that HDI3 fared worst in both measures. The mean absolute deviation (MAD) and the root mean square deviations (RMSD) of the DRI3 procedure fares best in terms of these values, having the smallest measures on both of them. The table above shows that the two procedures namely the DRI3, and SRI3, which used a regression model, yielded the best estimates for both measures. Estimates from the DRI3 procedure for both measures were three times smaller than the estimates from the HDI3 procedure. TOTEX2, 20% NONRESPONSE RATE METHODS OMI HDI3 DRI3 SRI3

MD -2497.141162 897.178516 -7347.862975 5400.712813

Rank 2 1 4 3

MAD 59555.355146 78292.631254 23231.654338 33782.602655

Rank 3 4 1 2

RMSD 119193.320481 67149.156877 53180.024322 72487.392833

Rank 4 2 1 3

In the table above, results from the second nonresponse rate was different from the results of the lowest nonresponse rate. Only the DRI3 procedure didn’t change its ranking while the rest of the imputation procedure swapped positions. Same as the first nonresponse rate, the OMI and DRI3 generated negative mean deviations while the HDI3 and SRI3 generated positive values. In the mean deviation, HDI3 fared the best by generating the smallest estimate. Both regression IMs fared the worst in generating unbiased estimates than the other methods. Again, the mean absolute deviation and the root mean square deviation for DRI3 yielded the best for both measures. It seemed that HDI3 and OMI for both ten and twenty percent nonresponse rate for TOTEX2, generates the worst estimate for MAD and RMSD respectively. Similar to the results from TOTIN2 and TOTEX2 with ten percent nonresponse rate, estimate from the DRI3 is three times smaller than the estimate of HDI3 for MAD. TOTIN2, 20% NONRESPONSE RATE METHODS OMI HDI3 DRI3 SRI3

MD 14277.427361 -15477.092677 -11059.090609 -9076.980826

Rank 3 4 2 1

MAD 87469.865623 111748.043527 35274.032863 57429.236572

Rank 3 4 1 2

RMSD 244757.995335 297151.501265 114957.425614 148278.489381

Rank 3 4 1 2

A different result for this data set and nonresponse rate was generated. In all the methods used, only the OMI procedure generated positive results which imply that it overestimates the actual values. On the other hand, the three others methods generated negative values which imply that it underestimates the actual values. Far from the results generated above, HDI3 and OMI were the worst and the second to the worst IM in measuring the effectiveness of the method used. All methods generated larger biases than the bias of the population ignoring nonresponse observations. In all the tests, both regression imputation methods proved better than the two other methods. Again, the HDI3 estimates for the MAD and RMSD were almost three times larger than DRI3. TOTEX2, 30% NONRESPONSE RATE METHODS OMI HDI3 DRI3 SRI3

MD 742.526312 -2021.192269 -7554.607707 1328.061322

Rank 1 3 4 2

MAD 62388.111315 81395.790207 24082.880071 32449.488519

Rank 3 4 1 2

RMSD 151740.940804 71390.645266 59795.667757 72803.602152

Rank 4 2 1 3

In the results above, the OMI procedure fared the best in mean deviation. Similar to the results of the other nonresponse rates, the DRI3 finished last in generating unbiased results for nonresponse observations. OMI and SRI3 mean deviations were better than bias in estimating the population mean ignoring the nonresponse observations. Like the results in the first nonresponse rate of the TOTEX2 variable, both HDI3 and DRI3 generated negative values for mean deviation. It is interesting to note that the mean deviation for this nonresponse rate under TOTEX2 among all nonresponse rates for OMI was the smallest. It indicates that as the nonresponse rate increases, the mean deviation gets larger and larger under this variable. Similar to the previous nonresponse rates, the HDI3 and the OMI fared the worst in the mean absolute deviation and root mean square deviation categories respectively. On the other hand, DRI3 bested all methods for both measures of closeness in all nonresponse rates under TOTEX2. TOTIN2, 30% NONRESPONSE RATE METHODS OMI HDI3 DRI3 SRI3

MD 20310.908394 -21695.518140 -13792.599597 1188.309972

Rank 3 4 2 1

MAD 90396.258755 115087.128145 34537.359021 51886.726288

Rank 3 4 1 2

RMSD 271775.351071 313814.916561 103253.122885 131429.611649

Rank 3 4 1 2

The rankings of the second nonresponse rate were exactly the same with the results above. HDI3 values for both measures of closeness were three times larger than best IM for those categories. The regression imputation again bested the other two methods in all criteria. Same as the results from other nonresponse rates under this nonresponse variable, the HDI3 and DRI3 underestimated TOTIN2 and OMI and SRI3

overestimated TOTIN2. Among all the methods, SRI3 has the smallest MD and the only method which has a value lower than ten thousand. While in TOTEX2 wherein the mean deviation of OMI was the smallest, the mean deviation of the TOTIN2 in this nonresponse rate was the largest. Same observation from TOTEX2 under the largest nonresponse rate was applied here. However in TOTEX2, the mean deviation gets even better while here the mean deviation gets worse. To summarize all the results shown above, majority of the measures of the effectiveness of four IMs for imputing nonresponse observation won by the deterministic regression imputation with three imputation classes. It is followed closely by its stochastic counterpart in regression imputation. The measures of effectiveness for the overall mean imputation under both nonresponse variables seemed to get larger and larger as the nonresponse rate gets larger. The measures of effectiveness for the hot deck imputation under income seemed to have same problem as with the overall mean imputation. In hot deck, the mean deviation under income for each nonresponse rate seemed to be decreasing in value.

Related Documents