Before imputation took place, variables needed in the program were declared. There were two variables that were common in all programs, SIMI and SIME. SIMI and SIME were matrices containing the data with nonresponse observations for the income and expenditure variables respectively. For both matrices, there were five columns and four thousand one hundred thirty rows. The five columns were broke down as follows: the education status categories for the first column, the first and second visit data for the second and third columns respectively, the second visit data containing nonresponse observations for the fourth column and lastly the flags explained earlier in the simulation of nonresponse observations. After declaring the variables, the files that contained the data sets for each nonresponse variable were uploaded in the program. The files were in the comma separated values (CSV) format with no extra spaces after each observation to ensure that the program does not generate errors. The creations of new files were also done in order to save the output for post-analysis later on. For the imputation methods that have constant values to be imputed to the nonresponse observation, the researchers opted to write a program instead of directly inserting a value or an equation. This is to ensure that the values being imputed are valid across all nonresponse rates in the case of mean imputation and avoid the risk of mistyping the correct equation in the case of the deterministic regression wherein there are eighteen different equations.
In imputing for the nonresponse observations in the overall mean imputation method, the computed mean from the first visit variable is substituted to each nonresponse observation in the second visit variable. Since there is only one imputation class for the method and the nonresponse observations occupy the first set of observations, the use of the conditional statements if-then was unnecessary. The three other methods which used more than one in uploading the data files required to use the if-then conditional statement. The if-then conditional statements are important codes in the program for the three other methods because it distinguishes the nonresponse observations with the response observations in the substitution of observations from the procedures of the imputation methods. In the hot deck imputation procedure, substitution for nonresponse observations were done by choosing randomly an observation from the donor record which is the first visit variable. The selection of the random observations from the donor record was made by getting the result from the variables, PECK# and PICK# for expenditure and income respectively. The two variables in the code contains a general formula: INT(RND* N#) + 1, where the command INT makes the resulting value an integer, the command RND generates a random number and # indicating what imputation class it belonged to. To generate a data that contains one thousand simulated data, the number of trials were broke down into ten runs. For each run, the numbers of sets generated by the program were one hundred. This strategy was implemented to prevent run-time error or having the computer to hang or even crash similar to the earlier theses related to this study and make the post-analysis using the results of the data feasible. The declarations of matrices variable containing the results were declared. In determining the total number of
rows that can hold the one hundred simulated sets, the rows were computed from the total number of cases from the actual data set multiplied by the number of trials indicated by the researcher. In the deterministic and stochastic imputation method, the computation of the imputed values for both methods was split into two source codes even though the methods have the same starting point to follow through.
Before imputing for the
nonresponse observations in deterministic regression, the regression equation for each imputation class was computed using the program. The computed regression equation in each imputation class was then applied to all observations in the first visit variable data. However, in generating the imputed data set, only those values which were set to nonresponse replaced the values of the second visit data set.