Chapter 4
Methodology
4.1
Source of Data The purpose of this section is to give an overview about the data that will be used for this study which is the Family Income and Expenditures Survey (FIES) 1997.
4.1.1 General Background The Family Income and Expenditures Survey (FIES) 1997 is a nationwide survey with two visits per survey period on the same households conducted by the National Statistics Office (NSO) every three years. The objectives of the survey are as follows: a. to gather data on family income and family living expenditures and related information affecting income and expenditure levels and patterns in the Philippines; b. to determine the sources of income and income distribution, levels of living and spending patterns, and the degree of inequality among families, c. to provide benchmark information to update weights in the estimation of consumer price index and
d. to provide information in the estimation of the country's poverty threshold and incidence.
4.1.2 Sampling Design and Coverage The sampling design method for the FIES 1997 is a stratified multi – stage sampling design consisting of 3,416 Primary Sampling Units (PSU’s) for the provincial estimate with as subsample of 2,247 PSU’s as a master sample for the regional level estimates. (National Statistics Office [NSO], 1997-2005)
This multi stage sampling design involved three stages. First is the selection of sample barangays. Second is the selection of sample enumeration areas, which is a physically delineated portion of the barangay. This was followed by a selection of sample households. The sampling frame and stratification of the three stages were based on the 1995 Census of Population (POPCEN) and 1990 Census of Population and Housing (CPH). From this method, a sample of 41,000 households participated in this survey. (NSO, 1997-2005)
4.1.3 Survey Characteristics The FIES 1997 questionnaire contains about 800 data items, where questions are asked by the interviewer to the
respondent of the selected sample household. A respondent is defined as the household or the person who manages the finances of the family or any member of the family who can give reliable information to the questionnaire. (NSO, 1997-2005) The items or variables to be gathered in the survey are as follows:
Table 2: The Variables Gathered in the FIES 1997 Part I – Identification A. Identification of the Household and Other B. Other Information: Information 1. Particulars about the Head of the Family a) Sex b) Age as of Last Birthday c) Martial Status d) Highest Grade Completed e) Employment Status f) Occupation g) Kind of Indutry / Business h) Class of Worker 2. Other information about the Household a) Type of Household b) Number of Family Members Enumerated c) Number of boarders, helpers and other non-relatives d) Number of Family Members who are Employed for Pay or Profit
Part II - Expenditures and Other Disbursements
A. Food, Alcoholic Beverages and Tobacco 1. Particulars about the Head of the Family a) Cereals and Cereal Preparations b) Roots and Tubers c) Fruits and Vegetables d) Meat and Meat Preparations e) Dairy Products and Eggs f) Fish and Marine Products g) Coffee, Cocoa and Tea h) Non-Alcoholic Beverages i) Food Not Elsewhere Classified 2. Food Regularly Consumed Outside the Home 3. Alcoholic Beverages 4. Tobacco 5. Food Items, Alcoholic Beverages and Tobacco Received as Gifts B. Fuel, Light and Water, Transportation and Communication and Household Operation C. Personal Care and Effects, Clothing, Footwear and Other Wear D. Education, Recreation and Medical Care E. Furnishings and Equipment F. Taxes G. Housing, House Maintenance and Minor Repairs H. Miscellaneous Expenditures I. Other Disbursements
Part III – Income and Other Receipts
A. Salaries and Wages from Employment B. Net Share of Crops, Fruits and Vegetables Produced or Livestock and Poultry Raised by Other Households C. Other Sources of Income 1. Cash Receipts, Gifts, Support, Relief and Other Forms of Assistance From Abroad 2. Cash Receipts, Support, Assistance and Relief from Domestic Source 3. Rentals Received From Non-Agricultural Lands, Buildings, Spaces and Other Properties 4. Interest 5. Pension and Retirement, Workmen's Compensation and Social Security Benefits 6. Net Winnings from Gambling, Sweepstakes and Raffle 7. Dividends From Investment 8. Profits from Sale of Stocks, Bonds and Real and Personal Property 9. Back pay and Proceeds from Insurance 10.Inheritance D. Other Receipts
4.1.4 Survey Nonresponse Two types of nonresponse occurred in the 1997 FIES. The first type of nonresponse which resulted from factors such as being unaware of the question, unwilling to provide the answer or omission of the question during the interview is called the item nonresponse. (NSO, 1997-2005)
The other type of nonresponse which is due to households being temporarily away, on vacation, not at home, demolished or
transferred residence during the second visit is called as partial nonresponse. This type of nonresponse totaled to only 3.6% of the total number of respondents. (NSO, 1997-2005)
The NSO has only devised the deductive imputation for solving the problem of item nonresponse while no specific method was made to compensate for the partial nonresponse. (NSO, 1997-2005)
Hence, the researchers will focus on the comparison of imputation procedures for partial nonresponse. The first selection made by the researchers is the choice of regional data set to which the imputation techniques will be applied. In this case, the National Capital Region (NCR) was chosen because it was noted as the region with highest nonresponse rate. The data consist of 4,130 observations, 39 categorical variables and the rest are continuous variables pertaining to income and expenditures of the respondents. Using Nordholt’s criteria on selecting which variables should be imputed such as the importance of the variable in the survey and the percentage of nonresponse (Nordholt, 1998), the variables of interest that the researchers chose were Total Income (TOTIN) and Total Expenditure (TOTEX).
4.2
The Simulation Method In order to investigate and make an empirical comparison of the statistical properties of the estimates with imputed values using selected imputation methods, a data set with missing observations was simulated. This simulation method will create an artificial data set with missing observations to indicate which values will be imputed.
The algorithm for this simulation procedure is as follows: 1. A matrix of random numbers was generated in order to satisfy
the
assumption
that
the
data
was
Missing
Completely at Random (MCAR). 2. This matrix of random numbers was matched to each observation of the FIES 1997 second visit variables TOTIN and TOTEX. 3. The second visit observations were sorted in ascending order through their corresponding random number. 4. To get the number of nonresponse observations, the FIES 1997 data set, which is 4,130, was multiplied to the indicated nonresponse rate. The nonresponse rates used for this study were 10%, 20% and 30%. The rational for setting different nonresponse rates is because the study aims to investigate the effect of varying nonresponse rates for each imputation method.
5. The observations that were set as nonresponse were identified and deleted. The observations which were deleted were flagged in order to distinguish the imputed values from the actual values. 6. To ensure that the data satisfies the MCAR assumption and to prevent the selection of an odd sample of deleted cases (Kalton, 1983); the simulation method was replicated 1,000 times.
This simulation method was implemented with the use of the Decimal Basic program, SIMULATION.BAS (see Appendix for the Source Code) where the files Simulated Values for Income (SIMI) and Simulated Values for Expenditure (SIME), a matrix containing nonresponse observations for the income and expenditure were stored in order to use it in the application of the imputation methods.
4.3
Formation of Imputation Classes Imputation classes are stratification classes that divide the data in order to produce more homogeneous groupings. Assuming that the units that have the same characteristics have the propensity to give the same response, the formation of imputation classes would help reduce the biasness of the estimates.
The steps undertaken in the formation of the imputation classes are as follows: 1. The researchers identified the potential matching variables, which are the candidate variables that could have a relationship with the variables of interest (i.e. TOTIN and TOTEX). 2. These variables must fit into the criteria in order to be selected as a matching variable. Three criteria were used as a basis for selecting the matching variables. The first criterion is that the variable must be known. Second, the matching variable must be easy to measure. Lastly, the probability of missing observations for matching variable is small. If the candidate variables would fit in the three criteria, then it can be used as a matching variable. 3. For the variables that have many categories, the researchers reduced the number of categories for these variables. The rationale for this procedure is because having too many categories can increase heterogeneity and the biasness of the estimates. This was done with the use of the software Statistica, particularly, the Recode function. 4. Measures of association were tested on the matching variables. The Chi Squared test was the first test applied on the variables. This was made to determine if the matching variables is a significant factor or has a great degree of association for the variables of interest.
5. Other tests for measuring the association of matching variables to the variables of interest followed. The purpose of these tests is to find the best matching variable that would divide the data into imputation classes. For the tests of association, three tests were used namely Phi-coefficient, Cramer's V and Contingency Test. The matching variable with the greatest degree of association will be chosen as the variable to be used in the formation of imputation classes.
All these tests were made using statistical packages Statistica and SPSS. The results of these tests will be presented in the next chapter.
4.4
Performing the Imputation Techniques 4.4.1 Overall Mean Imputation The Overall Mean Imputation (OMI) is an imputation procedure where the missing observations are replaced with the mean of the variable which contains available units. As said in the previous chapter, this imputation method does not require the formation of imputation classes, which makes this method as the simplest procedure among the four methods in this study. The procedures in applying the Overall Mean Imputation (OMI) are as follows:
The overall mean for the variables of interest,
1.
TOTIN and TOTEX, for the first visit was computed. The formula that was used for the computation of the overall mean is: m
yOMI =
∑
i =1
yri
r
Where: yOMI is the overall mean for the first visit TOTIN or TOTEX
yri is the first visit observation for the variable TOTIN or TOTEX
r is the total number of responding units for the first visit variable TOTIN or TOTEX
2.
Using the output from the simulation method, the missing observations for the second visit variables TOTIN and TOTEX were replaced with the overall means of the first visit TOTIN and TOTEX.
The implementation of the Overall Mean Imputation (OMI) was made through the Decimal Basic program OMI.BAS. (See Appendix for the source code).
4.4.2 Hot Deck Imputation The Hot Deck (HD) Imputation is an imputation procedure where the missing observations are replaced by choosing a value from the set of available units. The steps undertaken in applying the Hot Deck (HD) Imputation are as follows: 1.
The donor and recipient files are sorted before allocating values to the missing observations.
2.
The
values
that
were
substituted
for
the
missing
observations in the second visit were randomly chosen from the donor record, which is the first visit record for each imputation class. 3.
Using the output in the simulation method, the missing observations for the second visit variables TOTIN and TOTEX were replaced with the selected donor records from the first visit TOTIN and TOTEX. The implementation of the Hot Deck (HD) Imputation was made
through the Decimal Basic program HOT DECK.BAS. (See Appendix for the source code)