Chapter 8
Recommendations and Issues for Further Research
In this study, we have compared four imputation methods commonly used in dealing with partial nonresponse data and with the assumption of MAR. However, there are other methods that are currently being developed and improved. For example, the multiple imputation method involves independently imputing more than one value for each nonresponse value. Multiple imputation is an important and powerful form of imputation and has the advantage that variance estimation under imputation can be carried out comparatively easily. (Kalton, 1983)
Regarding the variance estimation, further studies should implement the use of the jackknife variance estimator. This variance estimator is more often used in comparing the variance estimates of most imputation methods. The study of Rao and Shao (1992) has proposed an adjusted jackknife variance estimator for use with the imputation methods related to the hot deck imputation procedure. This variance estimator is said to be asymptotically unbiased.
Future researchers may test other methods on the same data set and compare the results with those presented in this paper. They could also compare the results of this study with those of multiple imputation and the Rao-Shao jackknife variance estimator. There is a need, however, for a higher knowledge in statistics and Bayesian statistics in using the above procedures. The complexity of the methods especially both regression imputations could hinder future researchers in the use of modern variance estimator.
It is also suggested that the use of a method to select a matching variable through the use of advanced modern statistical methods like the CHAID analysis. The acronym CHAID stands for Chi-squared Automatic Interaction Detector. It is one of the oldest tree classification methods originally proposed by Kass (1980; according to Ripley, 1996, the CHAID algorithm is a descendent of THAID developed by Morgan and Messenger, 1973). CHAID will "build" nonbinary trees (i.e., trees where more than two branches can attach to a single root or node), based on a relatively simple algorithm that is particularly well suited for the analysis of larger datasets. Also, because the CHAID algorithm will often effectively yield many multi-way frequency tables (e.g., when classifying a categorical response variable with many categories, based on categorical predictors with many classes), it has been particularly popular in marketing research, in the context of market segmentation studies. (Statsoft, 2003)
In pursuing regression imputation, instead of creating models for each imputation class that can really be time-consuming at the same time frustrating since not all models will have the same result, dummy variables should be inserted in the model. These dummy variables are the categories of the matching variables. It would definitely save time and money since only one model is created and tested.
These researchers strongly recommend using a statistical package that can generate faster and a lot easier imputations but generate less biased estimates than programming. It would definitely save time than creating a computer program that eats up a majority of the research time in debugging and prevent computer crashes due to computer memory overload.