Categorical Data Analysis

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Categorical Data Analysis as PDF for free.

More details

  • Words: 7,369
  • Pages: 19
Chapter 8: Categorical Data Analysis Murtaza Haider, Ph.D. ([email protected]) What are Categorical Data? Categorical data deal with situations where the outcome of an experiment or a process can be categorized into a finite number of mutually exclusive classes or categories.

For instance, a survey of labour force

participation will record an adult’s status as either employed or unemployed. Thus the individual’s status can be expressed as a dichotomous variable (1/0) where 1 denotes the outcome of being employed 0 denotes the outcome of being unemployed. Other examples of categorical data include scenarios involving a choice between the make of new automobiles. For instance, consider the situation where the national make of the automobile is being researched. Automobile executives are interested in learning about the determinants of the consumer choice. In this case, the choice could again be represented as a dichotomous variable carrying the value ‘1’ if the consumer chooses an American make, and ‘0’ otherwise. Similarly, if the choice was between American, European, or Japanese car manufacturers, the categorical variable representing the choice would have three categories: 1 (American), 2 (European), and 3 (Japanese). A categorical variable that represents more than two outcomes is called multinomial variable. The above-mentioned examples are that of the unordered outcomes.

We could have coded American as 2,

Japanese as 1 and European as 3 in the previous example. The change in the order does not have an impact on the analysis because the order of alternatives is rather arbitrary. There are, however, scenarios where the ordering of outcomes matters. For instance, consider a study of automobile ownership where households are coded as follows: Table 1: Coding of ordinal variables Categories Description 0

Household without cars

1

Household owning 1 car

2

Household owning 2 cars

3

Household owning 3 cars

4 or more

household owning 4 or more cars

In the above example, there is a natural ordering, which suggests that households categorized as 2 own more cars than the households categorized as 1 or 0. In this particular case, we cannot arbitrarily change the order. Such data, where the order of outcomes is not arbitrary, but rather systematic, is called ordered data, which is also a type of categorical data. The use of categorical variable as an explanatory variables, such as gender, is common in OLS regression models. As an explanatory variable, a categorical variable captures the systematic differences latent in data that cannot be accounted for by other variables in the model. For instance, if there is a systematic difference Categorical Data Analysis

1

between the wages of men and women in a particular profession, the gender variable will capture the gender-based wage differentials.

However, when the dependant variable in an econometric model is

categorical rather than continuous, the use of conventional regression (OLS) techniques are no longer appropriate. The OLS models are therefore modified to account for the categorical dependant variables. Such modified models are called discrete choice, categorical, limited dependant variables, or qualitative response models. We begin this chapter with a discussion of binomial variables and their analysis. This is followed by a discussion of multinomial variables and their analysis followed by a discussion of discrete choice models. This chapters explains the theory and estimation of Binomial, Multinomial, and Conditional Logit models. The discussion about the estimation uses examples of the estimation routines available in SPSS. Analysis of categorical data A wide variety of statistical techniques, methods, and models are available to analyze categorical data. Simple cross tabulations are commonly used to analyze categorical variables. To illustrate this point, we use a data set from a study of labour force participation of women (Mroz, 1987). The data contain information on 753 white, married women between the ages of 30 and 60 years. The dependant variable, lfp, reports on women’s status as employed (1), or otherwise (0). The description of other variables is listed below:. Table 2: Description of variables in the labour force study Variable lfp k5 k618 age wc hc lwg inc

Categorical Data Analysis

Description Paid Labor Force: 1=yes 0=no Number of children less than 6 years old Children between 6 and 18 years of age Wife's age in years Wife College: 1=yes 0=no Husband College: 1=yes 0=no Log of wife's estimated wages Family income excluding wife's in thousands

2

Table 3: Simple tabulations of the labour force data Paid Labor Force: 1=yes 0=no

Valid

NotInLF inLF Total

Frequency 325 428 753

Percent 43.2 56.8 100.0

Valid Percent 43.2 56.8 100.0

Cumulative Percent 43.2 100.0

The above table shows that 325 women (43.2%) in the sample were unemployed, whereas another 56.8% were employed. We are interested in determining the relationship between the educational attainment of both husband and wife on a women’s status in the labour force. The hypothesis that we would like to test is the following: If a women has received college education, she may be more likely to be in the labour force. For this, we perform a cross-tabulation in SPSS and select the chi-square option in the dialogue box. Table 4: Cross tabulation of labour force participation and woman’s education Crosstab

Paid Labor Force: 1=yes 0=no

NotInLF

inLF

Total

Count % within Wife College: 1=yes 0=no Count % within Wife College: 1=yes 0=no Count % within Wife College: 1=yes 0=no

Wife College: 1=yes 0=no NoCol College 257 68

Total 325

47.5%

32.1%

43.2%

284

144

428

52.5%

67.9%

56.8%

541

212

753

100.0%

100.0%

100.0%

The above table suggests that 68% of college educated women were employed against 52.5% of women who did not attend college.

Now we would like to know if the association between wife’s education and

participation in labour force has any statistical significance. We use the chi-square statistics to test the significance of the association between two variables.

Categorical Data Analysis

3

Table 5: Chi-square test for the association between woman’s education and employment status Chi-Square Tests

Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases

Value 14.780b 14.158 15.076

14.761

df 1 1 1

1

Asymp. Sig. (2-sided) .000 .000 .000

Exact Sig. (2-sided)

Exact Sig. (1-sided)

.000

.000

.000

753

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 91. 50.

The column Asymptotic Significance (two-sided) in the above figure indicates the significance of the relationship. A low significance value of 0. 05 or less suggests that there may be a relationship between the two variables. In the above case, the significance value of . 000 suggests that there is statistically significant relationship between women’s education and their participation in the labour force. But what about the relationship between education attainment of women’s husband and their participation in the workforce. A cross-tab suggested that 60% of the women whose husbands received college education were employed against 55% of the women whose husband did not attend college. The significance of chi-square test returned a high value of . 160 suggesting that there was no relationship between husband’s education attainment and the wife’s participation in labour force. Econometric models of binomial data We derive the binary Logit model as a latent variable model following Long and Freese (2005). Assume a latent variable y ∗ that range from −∞ to ∞. The latent variable y ∗ is related to the observed independent variables as per the following equation: y∗ = xiβ + εi

[1]

i represents the observation and ε is the random error. The above equation is similar to the OLS model, however, the dependant variable is unobserved. The following equation links the latent variable y ∗ with the observed variable y. yi =

1 if y ∗i >0 0 if y ∗i ≤0

Now returning to the example of labour force survey of married women, we set y = 1 if the women is in the labour force and y = 0, if she is unemployed. The independent variables include number of children, education, and expected income. Now consider that a woman may be about to leave her job while another

Categorical Data Analysis

4

woman is steadfast in her career. Regardless of their intentions, in both instances we only observe y = 1. Now consider that there is an "underlying propensity to work" that manifests itself as being employed y = 1 or unemployed, y = 0. We are not able to observe directly the propensity, however, at some point a change in y ∗ results in a change in y from 1 to 0 or from 0 to 1. Thus we can express the model as follows: Pry = 1|x = Pry ∗ > 0|x

[2]

We can substitute the structural model in the above equation to get the following: Pry = 1|x = Prx i β + ε i > 0|x = Prε i > −x i β|x

[3]

2 Assuming that ε is distributed logistically with variance = π , the Logit model can be expressed as 3

Pry = 1|x =

expx i β 1 + expx i β

[4]

Unlike the OLS model, where the variance can be estimated because the dependant variable is observed, in the binary Logit model the variance is assumed because the dependant variable, y ∗ , is latent. One can argue that an OLS method can be used to estimate the model. however, this leads to serious estimation problems. The first and foremost problem is the heteroscedastic error terms. Since x i β + ε i can only be 0 or 1, therefore, either x i β + ε i = 0 or x i β + ε i = 1. This leaves ε i equal to either −x i β or 1 − x i β . In such as case, variance is given by (Greene (1985), p.874): Varε i |x = x i β ∗1 − x i β

[5]

The above equation suggests that as x increase, so does the variance of ε i . Other major problems include the fact that x i β cannot be constrained to the 0 − 1 interval and that one could not avoid negative variances. Estimation of binary Logit models

To identify the binary model, we always set one category or alternative as the base case. The estimated coefficients are then interpreted as a comparison with the base case. Using the labour force example, probability of being employed is given by: Prwork =

exp Vw exp Vw +exp Vu

[7]

and probability to travel by transit is given by: Prunemp = 1 − Prwork

[8]

In this particular case, we set the coefficients for being unemployed to ‘0’. This implies that V u = 0 = β 0 + β 1 X 1 +. . . +β n X n Therefore, Prwork =

exp Vw exp Vw +1

, since exp0 = 1.

If we divide both the numerator and the denominator by exp Vw , we have Categorical Data Analysis

5

Prwork =

1 1+

1 exp V w

=

1 1+exp −Vw

Prunemp = 1 − Prwork = 1 − Prunemp =

exp −Vw 1+exp −Vw

[9] 1 1+exp −Vw

[10]

Odds Ratio

The odds ratio between the two outcomes is expressed as follows: Prwork Prunemp

=

1 1+exp −Vw exp −Vw

=

1+exp −V w

1 exp −Vw

= expVw

[11]

Log of odds

And finally, the log of odds are expressed as ln

Prwork Prunemp

= lnexpV w  = V w , which is equal to x i β.

Let us revisit the dataset of labour force participation of married women and estimate a Binary Logit model of participation in the labour force using income, children, and education of women and their husbands as explanatory variables. Interpreting binary Logit model and statistical inference

Table 6 lists the coefficients of a Binary Logit model that estimates the probability of being employed for married, white women. The column B lists the estimated coefficients (Betas), while the odds are presented in the last column, Exp(B). We first begin with the interpretation of coefficients. It is better to use the odds ratio than the actual coefficients to interpret the model. Variable k5 represents number of young children under the age of 6 in a household. The coefficient, β, is equal to −1. 463, and the odds are expressed as exp−1. 463 = 0. 232. This implies that each additional young child decreases the odds of mother being employed by a factor of 0. 23, all else being equal. The odds of college educated women being employed are 2. 242 times higher than the women who did not receive college education.

Categorical Data Analysis

6

Table 6: Estimation of Binary Logit model of labour force participation Parameter Estimates

Paid Labor Force: a 1=yes 0=no inLF

Intercept k5 k618 age wc hc lwg inc

B 3.182 -1.463 -.065 -.063 .807 .112 .605 -.034

Std. Error .644 .197 .068 .013 .230 .206 .151 .008

Wald 24.387 55.144 .902 24.189 12.321 .294 16.076 17.611

Sig. .000 .000 .342 .000 .000 .588 .000 .000

Exp(B) .232 .937 .939 2.242 1.118 1.831 .966

a. The reference category is: NotInLF.

The impact of age could be interpreted as follows, with an increase in the age by one year, the odds for being employed decline by a factor of 0. 94. But what if one is interested in determining the impact of being 10 years older, rather than being just one year older. The actual formula for odds is expβδ, where δ is the change in the number of units. For a 10-year change in age, the odds of working decline by a factor of exp−. 0628 ∗ 10 = 0. 53. That is, the odds decline by almost 50%. If we were interested in determining the odds of not working, we can simply take the inverse of expβ. Thus, each additional young child increases the odds of being unemployed by a factor of 1 = 4. 31. . 232 We can also interpret the results as a percentage change in odds using the formula: 100expβ k δ − 1. Again, each additional young child decreases the odds of being employed by 77% (100exp−1. 4629 − 1 = − 76. 8%). Statistical Inference of Binary Logit models

Logit models are interpreted similar to the OLS models. Instead of the t-stat (or Z-statistics) to evaluate the statistical significance of the model, SPSS uses Wald statistics, which is expressed as

Coefficient SE

2

.

Estimation software also reports the significance level for Wald statistics. It has been observed that when the estimated coefficient is very large, the corresponding standard error is also very large, thus returning a very small value for the Wald statistic. This often leads one to fail to reject the null hypothesis that the estimated parameter is equal to 0. In cases where the model returns a large coefficient for a variable, Wald statistics may not be the best instrument to evaluate the parameter. One may want to rescale the variable in such a case. Another more informative and reliable method is the likelihood ratio test. Each variable from the final model is eliminated and the reduced model is estimated to obtain -2 * log-likelihood (-2LL). The procedure is repeated for every variable in the final model. The log-likelihood test returns a change in the value of -2LL if the effect is removed from the final model. The difference between -2LL for the model with only an intercept Categorical Data Analysis

7

and -2LL for the reduced model has a chi-square distribution when the coefficient for the variable is 0. The significance level for the Chi-square can thus be used to evaluate the relative significance of the effect. The overall fit can also be evaluated from -2LL for the model. A smaller value for -2LL suggests a good fit. If a model returns a perfect fit, the likelihood =1 and -2LL=0. The model’s chi-square is given by: χ 2 = −2LL intercept − −2LL final 

[12]

If the observed significance level is small (0.000) for χ 2 , we can reject the null hypothesis that coefficients for variables in the final model are equal to 0. The interpretation of this statistic is similar to the interpretation of F-statistics in the OLS tradition.

The SPSS output for the above-mentioned model suggests that the

significance level of the test is very small, leading us to reject the null hypothesis that coefficients for variables in the final model are equal to 0. Table 7: Goodness-of-fit statistics Model Fitting Information Model Intercept Only Final

-2 Log Likelihood 1029.746 905.266

Chi-Square

df

124.480

Sig. 7

.000

Pseudo R-Square Cox and Snell Nagelkerke McFadden

.152 .204 .121

Other measures of goodness-of-fit statistics include McFadden R-square expressed as follows: R 2McFadden =

l0−lB l0

= 1−

lB l0

[13]

where l0 is the kernel of the log-likelihood of the intercept-only model (only information in the model are sample shares), while lB is the kernel of the log-likelihood of the final model.

This formulation of

McFadden R-square has been adopted in the logistic regression estimation techniques in some software, e.g. SPSS, which automatically generates this and other goodness-of-fit statistics. For Logit models, R-square of 0.07 and higher reflects a good fit. In fact, Louviere et al (2000) have argued that ρ 2 values of 0.2 to 0.4 are "considered to be indicative of extremely good model fits." They have cited a simulation experiment by Domenchic and McFadden (1975) who have "equialenced this range to 0.7 to 0.9 for a linear function." Multinomial Logit Models The preceding discussion leads us into the workings of the Multinomial Logit model. Consider the travel Categorical Data Analysis

8

mode choice problem where an individual may have the following three options to commute to work: auto drive, public transit, and non-motorized mode, such as bike or walk . We can code the choice set as 1,2,3. The model is represented as follows: β′x

ProbY i = j =



e j i ′ 3 e βkxi k=1

[14]



In the above equation, β j is the coefficient for variable x i when Y i = j. The subscript i on x suggests that it varies across the decision makers (i) and the subscript j on β suggests that it varies across choices (j). The above model will return a set of probabilities for J alternatives for the decision-maker with characteristics x i . It will also return J − 1 non-redundant baseline logits. As mentioned earlier, we normalize the Multinomial Logit model by assuming that one set of parameters is equal to 0, i.e., β 1 = 0, therefore e 1 = 1. The choice for the base case, whose coefficients are set to 0, is completely arbitrary. The probabilities are therefore expressed as follows: ProbY = j =

β′x

e j i ′ J e βkxi k=1

ProbY = 1 =

for j = 1,2, ... , J,



1+

1

[15]

[16]

′ J e βkxi ∑ k=1

1+

Remember that we arbitrarily set the coefficients of alternative 1 as 0. For the multinomial case, let’s say we have three modes: i) auto, ii) transit, iii) walk. We will have two logits, i.e., two sets of parameters for two choices, while the third choice will serve as the reference category. Let’s put walk as the reference category in the following example. g 1 = ln

Pauto Pwalk

g 2 = ln

Ptransit Pwalk

= β a0 + β a1 X 1 +. . . +β a2 X 2

[17]

= β t0 + β t1 X 1 +. . . +β t2 X 2

[18]

g3 = 0 Pauto =

[19] expg 1  expg 1 +expg 2 +expg 3 

Ptransit = Pwalk =

=

expg 1  1+expg 1 +expg 2 

expg 2  1+expg 1 +expg 2 

1 1+expg 1 +expg 2 

[20] [21] [22]

Interpretation of Multinomial Logit models

The interpretation of coefficients in a multinomial Logit model remains a complicated affair. It is possible to have a decline in P ij with an explanatory variable x ij , which returns a positive coefficient β ij . A model should therefore be interpreted in terms of odds ratio. In the odds ratio (ln

P ij P i0

= β ′j x i ) a positive coefficient for a

continuous explanatory variable suggests that odds of registering an observation in category j are larger than registering that observation in the reference category with the increase in that particular variable. Similarly, a negative coefficient for the explanatory variable suggests that the chances of baseline outcome are higher than Categorical Data Analysis

9

the outcome for category j. Here we are reproducing a model from Powers and Xiu (2000) to explain the interpretation of the estimated logistic regression model. A sample of 978 observations of young men between the ages of 20 and 22 was collected where their major activity was coded as working, school, and inactive. Regressors included a binary variable Black (1 if Black), NONINT (1 if the family is not intact), FCOL (1 if father has some or more college education), FAMINC (family income in thousands of dollars), UNEMP80 (local unemployment rate in 1980), and ASVB (a scholastic test score). The reference category in the model was being inactive instead of being employed or being in school. Table 8: Output from a Multinomial Logit model

Working

School

Variable Coefficient SE t-Stat EXP(B) Constant 0.726 0.347 2.091 2.07 Black -0.444 0.219 -2.032 0.64 NONINT -0.134 0.192 -0.699 0.87 FCOL 0.180 0.241 0.745 1.20 FAMINC 0.407 0.211 1.930 1.50 UNEMP80 -0.071 0.037 -1.903 0.93 ASVAB 0.308 0.110 2.794 1.36 Constant Black NONINT FCOL FAMINC UNEMP80 ASVAB

0.359 0.229 -0.547 0.241 0.268 0.012 0.177

0.333 0.196 0.186 0.235 0.209 0.035 0.107

1.078 1.166 -2.941 1.025 1.283 0.361 1.658

1.43 1.26 0.58 1.27 1.31 1.01 1.19

The first difference you will notice is that there are two sets of coefficients. One set of coefficients estimates the odds of working and being inactive and the other set of coefficient measures the odds of being in school and being inactive. The odds of black men in the labour force were exp−. 444 = 0. 64 times than that of whites and others. Stated otherwise, odds of non-blacks working were 1/0. 64 = 1. 562 5 times higher than the blacks.

The

estimated

coefficient

(−0. 444)

is

measuring

the

change

in

log-odds

(LN[Prob-Work/Prob-Inactive]) when the variable Black is increased by one unit, i.e. from 0 to 1. Whereas exp−. 444 = 0. 64 gives the ratio of odds (Prob-Work/Prob-Inactive) of working against inactive when Black=1 to when Black =0. Note that if one would like to determine the odds of blacks being inactive rather 1 than working, the odds are = 1. 56 times, the same as the odds for non-blacks working against exp−0. 444 being inactive. Similarly the odds of young men from intact families to be in school against being inactive were 1/ exp−. 547 = 1. 73 as high as those of young men from broken homes. As for continuous explanatory variables, we can see that odds of working or in school against being inactive increase with family income and test score.

A unit increase in the family income increases the odds of attending school by

exp. 268 − 1 ∗ 100 = 30. 7%. Categorical Data Analysis

10

Conditional Logit Models So far our discussion has focussed on models where the characteristics of the decision-maker, such as age, income and the like, have been used as regressors. But what about the situations where the attributes of the choice may also have an impact on the outcome. The Binary and Multinomial Logit models explained in the previous section cannot deal with situations where the outcome is also impacted by the attributes of choice. Consider for instance the travel model choice problem explained earlier.

The characteristics of the

decision-maker and the attributes of choice, such as travel time and cost by mode also impact the outcome. To deal with this problem, Professor Daniel McFadden, the 2000 Nobel Laureate in Economics, developed the Conditional or McFadden Logit model. The Conditional Logit model has been widely applied in modelling choices in diverse fields such as market research, economics, psychology, and travel demand analysis. Random Utility Model

The Conditional Logit model can be derived using Random Utility theory.

Let us assume that a

Let U a represent the utility of alternative a and

decision-maker is faced with two choices, a and b.

U b represent the utility of alternative b. The rational decision-maker will opt for the alternative that maximizes his or her utility. In addition, the utility can be divided into two components, the observed and unobserved part. The linear random utility model can be expressed as ′



U a = β a X + ε a and U b = β b X + ε b If Y = 1 denotes the consumer’s choice for alternative a, ′



Proby = 1|x = ProbU a > U b  = Probβ a x + ε a − β b x − ε b > 0|x ′



= Probβ a − β b x + ε a − ε b > 0|x ′

= Probβ x +  > 0|x

[23]

If Y is assumed to be a random variable, it can be shown that ProbY i = j =

β′z



e ij J β′z e ij j=1

[24]

For the Conditional Logit model, z ij = x ij , w i . If x ij represents the attributes of the choices, the subscript ‘ij’ on x suggests that it varies across the decision makers (i) and choices (j). Where as w i represent the characteristics of the decision maker (i) and hence it does not vary across alternatives. We can re-write the above equation as follows: ProbY i = j =

β ′ x +α ′ w



i e ij J β ′ x ij +α ′ wi e j=1

=

β′x





e ij e α wi J ′ β′x e ij e α wi j=1

[25]

It is useful to note that terms that do not vary across alternatives – that is, those specific to the individual – fall out of the probability. Therefore the above equation can be simplified as follows: ProbY i = j =

β′x



e ij J β′x e ij j=1

Categorical Data Analysis

[26]

11

To create individual-specific effects, Greene (1997) suggests that a set of dummy variables could be created for the choices, which can then be multiplied with w i . This method is analogous to the creation of interaction terms in OLS models. For example, we can use the attributes of shopping centres as regressors along with the characteristics of the shoppers while modelling the choice of a shopping centre. The assumption is that a shopper is likely to choose the destination that help minimize his or her shopping trip distance and offers the most diverse shopping experience (no. of stores). Note that for each shopping centre, the number of shops, and the distance from the shopper’s trip origin, etc. are different for each trip maker. However, the characteristics of the trip maker are the same for all alternatives. In the following table, two decision-makers are faced with three choices for shopping destination. The regressors are number of stores at each location, distance to the shopping centre, and income. It is obvious from the table that income does not change over alternatives for each decision-maker and hence if added as a regressor in the model, income will fall out of the probability equation. Table 9: Data sample for Conditional Logit models No. 1 1 1 2 2 2

Shopper David Miller David Miller David Miller Mel Lastman Mel Lastman Mel Lastman

Alternatives Eaton Centre Square 1 Mall Fairview Mall Eaton Centre Square 1 Mall Fairview Mall

Stores 125 175 100 125 175 100

Distance (km) 1.8 15.4 7.5 7.5 12.8 3.5

Income (000) 145 145 145 250 250 250

Choice 1 0 0 0 1 0

The way to accommodate income as a regressor is to introduce alternative-specific dummy variables and multiply them with the common characteristics of the individual decision-maker.

Categorical Data Analysis

12

Table 10: Example of alternative-specific income variable for Conditional Logit models Shopper David Miller David Miller David Miller Mel Lastman Mel Lastman Mel Lastman

Alternatives Eaton Centre Square 1 Mall Fairview Mall Eaton Centre Square 1 Mall Fairview Mall

Stores 125 175 100 125 175 100

Distance (km) 1.8 15.4 7.5 7.5 12.8 3.5

Inc-Eaton 145 0 0 250 0 0

Inc-Sq.One Inc-Fairview 0 0 145 0 0 145 0 0 250 0 0 250

The income variable is introduced in the model as an alternative-specific variable. For example, the variable Inc − Eaton will capture the impact of income on the utility of shopping at Eaton Centre, whereas the variable Inc − Fairview will capture the impact of income on shopping at the Fairview Mall. One can see that by interacting the characteristics of the decision-maker with the alternative-specific dummies (not shown in the above table), we have created new variables that vary across alternatives for each decision-maker. Also, remember not to include all interacted income variables in the utility function because if you add them together, they will again reproduce the original income variable and hence will out of the equation during estimation. In the above example, include any two interacted income variables in the model. Unlike the Multinomial Logit model, the Conditional Logit model returns 1 set of parameters, regardless of the number of alternatives. However, the data set has to be conditioned so that each decision-maker is repeated in the data set for the number of available alternatives, which is evident from the above two tables. Therefore, the total number of rows in the data set is equal to the number of decision-makers (i) times available alternatives (j). This is only true if all decision-makers are presented with the same choice set. The Conditional Logit model allows the modeller to restrict the number of alternatives available to a decision-maker. Consider the example of mode choice. A trip-maker without a valid driver’s license can be offered a choice set that excludes the auto-drive mode. The marginal effects for any variable x k can be computed by differentiating the Logit model with respect to the variable x k . Therefore, marginal effects are given by the following equation: δ jk =

∂P j ∂x k

= P j 1j = k − P k β

[27]

The elasticities of probabilities could be expressed as follows: ∂ ln P j ∂ ln x km

= x km 1j = k − P k β k = x km 1 − P k β k

[28]

The above is referred to as direct elasticity where m indexes the regressor (attribute) variable and j,k index the alternatives. Consider the following example where we would like to determine the direct elasticity of the auto-drive mode with respect to the cost of driving. Direct elasticity calculations require the following inputs: x km is the cost of driving, P k is the probability of auto-drive, and

Categorical Data Analysis

13

β k is the estimated coefficient for the cost variable. Cross elasticities could be computed as follows: ∂ ln P j ∂ ln x km

= −x jm P k β k

where k ≠ j,

[29]

Cross elasticity calculations for change in the auto-drive mode with respect to changes in transit costs require the following inputs: x jm is the cost of transit, P k is the probability of auto-drive, and β k is the estimated coefficient for the cost variable. In estimating Conditional Logit models, one is not restricted by the number of choices. Here the "size of the estimation problem is independent of the number of choices" (Greene,1997, p. 920). Greene further argues that the number of choices should be restricted to 100. The fact remains that even with 100 choices, interpretation of the model becomes a major concern. From the behavioural perspective, a decision-maker seldom undertakes simultaneous evaluation of 100 choices. To assume that a rational decision-maker can simultaneously evaluate 100 choices is debatable at best. The Conditional Logit model does not contain a constant term (β 0 in the OLS tradition). The Conditional Logit model can only include J − 1 alternative-specific constants. In the above-mentioned mode choice problem involving three alternatives, we can create alternative-specific constants for any two alternatives. In conditional Logit models, we do not set any category as the base case or set its systematic utility to 0. The binary choice is presented as conditional Logit: Pauto =

exp Va exp Va +exp Vt

[30]

If we divide both the numerator and the denominator by exp V a , we have Pauto =

=

1 1+

exp V t exp V a

1 1+exp Vt −Va

[31]

The above equations presents an interesting property of conditional Logit models. We do not observe the actual utility, but the difference in the utility of two choices. Ptransit = 1 −

1 1+exp Vt −Va

=

exp Vt −Va 1+exp Vt −Va

[32]

Odds ratio for Conditional Logit is therefore given by Pauto Ptransit

=

1 1+exp V t −V a exp V t −V a

= exp V a −V t

[33]

1+exp V t −V a

And the log of odds are given by ln

Pauto Ptransit

= lnexp V a −V t   = V a − V t

Categorical Data Analysis

[34]

14

In case we had a third choice as walk with the utility function expressed as V w , the probabilities are expressed as: exp Va

Pauto =

exp Va +exp Vt +exp Vw exp Vt

Ptransit =

exp Va +exp Vt +exp Vw exp Vw

Pwalk =

exp Va +exp Vt +exp Vw

If you notice the probability function carefully, we are still dealing with the difference in utilities. Let us divide both the denominator and the numerator with exp V a in Pauto Pauto =

1 1+

exp V t exp V a

+

exp Vw exp V a

=

1 1+exp Vt −Va +exp Vw−Va

[35]

Interpretation of Conditional Logit models

Let us define the following two probabilities for event j and j ′ : P ij = P ij ′ =

β′x



e ij J β′x e ij j=1 β′x ′

e ij J β′x e ij j=1



[36]

[37]

Therefore the odds of opting j over j ′ are given by the following: P ij P ij ′

=

e

β ′ x ij

β′x ′ e ij

= expβ ′ x ij − x ij ′ 

[38]

While the Logit is expressed as ln

P ij P ij ′

= lnexpβ ′ x ij − x ij ′  = β ′ x ij − x ij ′ 

[39]

The above expression suggests that the log-odds of choosing j over j ′ are given by the "weighted difference between the individual’s values on the explanatory variables for the two alternatives, with the weights being the estimated parameters", i.e., βs. The interpretation is illustrated by using a model estimated by David Hensher, which has been reproduced by Greene (1997) and Powers and Xiu (2000). The example is that of a classic mode choice problem where 152 respondents were surveyed. The original model consisted of four choices: air, bus, car, and train. Powers and Xiu (2000) have excluded air as an alternative and have reported the results for a three mode choice set where the choices are 1=train, 2=bus, and 3=car. We have retained Powers and Xiu (2000) results in this discussion. The explanatory variables are terminal wait time (TTME), in-vehicle time (INVT), in-vehicle cost (INVC), and GC which is a generalised cost measure computed as INVC + (INVT* Value of Time). Table 11: Estimates from a Conditional Logit Model with alternative-specific attributes

Categorical Data Analysis

15

Variable Coefficient SE t-Stat TTME -0.002 0.007 -0.314 INVC -0.435 0.133 -3.277 INVT -0.077 0.019 -3.991 GC 0.431 0.133 3.237

The log-odds for an individual of choosing train (1) over bus (2) are given as: ln

P i1 P i2

= −. 002TTME 1 − TTME 2  −. 435INVC 1 − INVC 2  −. 077INVT 1 − INVT 2  +. 431GC 1 − GC 2 

The above suggests that the odds of choosing a mode decline with the increase in wait time, in-vehicle travel time, and in-vehicle costs. The odds of choosing a mode, however, increase with the increase in generalised cost. When the attributes of choices as well as the characteristics of the decision-makers explain the utility of the alternatives, the model can contain alternative-specific variables as well as individual-specific covariates after multiplying individual-specific covariates with alternative-specific dummies. The model is presented as follows: ProbY i = j =

β ′ x +α ′ w



e ij j i J β ′ x +α ′ w e ij j i j=1

=

β′x



α′w

e ij e j i J α′w β′x e ij e j i j=1

[40]

where x ij are the alternative-specific covariates, while w i are individual-specific attributes. Interpretation of the above model is similar to that of Conditional Logit model discussed earlier. In the Conditional Logit model we have included alternative-specific variables. Now we include an individual-specific variable, household income (HHINC), which does not vary across alternatives. As mentioned earlier, HHINC will be multiplied with the alternative-specific dummies to enter the model as a regressor. Powers and Xiu (2000) omit the lowest coded category (train) and create two alternative-specific constants DB (dummy for bus) and DC (dummy for car). The new variables are: HHINC * Dummy for bus = HHINC_DB HHINC * Dummy for car = HHINC_DC

Categorical Data Analysis

16

Table 12: Estimates from a mixed Conditional Logit model Variable Coefficient SE t-Stat TTME -0.074 0.017 -4.360 INVC -0.619 0.152 -4.067 INVT -0.096 0.022 -4.361 GC 0.581 0.150 3.883 DB -2.108 0.739 -2.577 HHINC_DB 0.031 0.021 1.404 DC -6.147 1.029 -5.974 HHINC_DC 0.048 0.023 2.682

The results indicate that an increase in the household income increases the odds in favour of bus and car against train. However, a look at the t-statistics reveal that only HHINC_DC returns a statistically significant coefficient. Independence of Irrelevant Alternatives It has been shown in the previous section that the odds ratio

Pj Pk

is independent of the remaining

probabilities. This property of Logit models is termed as the Independence of Irrelevant Alternatives (IIA). This assumption is rooted in the earlier assumption that error terms are independent and homoscedastic. This is a highly desired property of Logit models from the estimation point of view. The IIA assumption results in strong restrictions on consumer behaviour. Problems resulting from this assumption are highlighted in the literature as the red bus/blue bus problem (McFadden, 1974, cited in Powers and Xiu, 2000). Let’s assume that a commuter’s choice set consists of four modes: red bus, blue bus, car, and train. Let’s also assume that commuters are equally likely to take any mode and hence the mode share for any particular mode is 25%. The odds between any two alternatives are 1. Let us also assume that the red bus and the blue bus are perfect substitutes for each other. Hence, if the blue bus is removed from the service (we can simply paint the blue buses red), the blue-bus riders will shift to the red bus with an increase in the red bus’s mode share to 50% from 25%. This is because the bus alternatives are substitutes of each other. The mode shares for train and car will remain at 25%. However, this is not the case with Logit models. IIA dictates that with the exclusion of blue bus, the mode share for red bus, car, and train will all be equal to 33.33%, thus maintaining the odds between any two alternatives at 1. Hausman and McFadden (1984) have posited that one can eliminate a subset of choices from the universal choice set, assuming that the subset is "truly irrelevant" (Greene (1995), p. 921). The elimination of the subset of choices will not "influence the parameter estimates systematically." However, if the odd ratios of the remaining alternatives are not completely independent, the exclusion of the subset of choice set will return inconsistent parameter estimates. Hausman’s specification test checks for independence and is presented below: ϰ 2 = β̂ s − β̂ f  ′ V̂ s − V̂ f  −1 β̂ s − β̂ f 

Categorical Data Analysis

[41]

17

where s represents the estimators obtained for the subset of the choice set and f represents estimators obtained for the complete choice set. V̂ s and V̂ f represent the estimates of the asymptotic covariance matrices. It has been proved that the test statistics is asymptotically distributed as chi-squared with K-degrees of freedom. If the above-mentioned test suggests violations of the IIA assumption, one can either estimate a Nested Logit model or a Probit model instead. Estimating Logit models in SPSS Estimating Multinomial Logit models in SPSS is straight forward. The command NOMREG estimates the Multinomial Logit model, which in fact can also be used to estimate the Binary Logit model. The dependant variable could be a dichotomous variable taking the values 1/0, or it could be a polytomous variable taking values, such as 1,2,and 3 as was the case in mode choice problem. Remember that the estimated model will return J − 1 set of estimated coefficients, where J is the total number of alternatives. The Conditional Logit model is not directly available within SPSS. However, one can trick the Cox Proportional Hazard model in SPSS to run a Conditional Logit model. The likelihood function of the Cox Proportional Hazard model is the same as the Conditional Logit model. The mechanics and theory of Hazard models are not explained here. We are only offering necessary definitions required to restructure data to run Cox Proportional Hazard model in SPSS. The Cox Proportional Hazard model estimation requires three additional variables. These are status variable, failure time variable, and the strata variable. Remember that the choice variable assumes the value 1 for the chosen alternative and 0 for non-chosen alternatives (see Table 9). We use the choice variable as the status variable in Cox Proportional Hazard model. Make sure that you identify 1 as the single value event in the option "Define Event" for the status variable. For the failure time variable in SPSS, the preferred choice (i.e., the chosen mode) should occur at time = 1, while other modes (choices) should occur at time > 1. Therefore, the time (failure time) variable should assume the value 1 for the chosen alternative and 2 for other alternatives for every individual. This can be achieved by creating a new variable t as follows: t = 2 − choicevariable

[42]

Lastly, we need a variable to control for stratification in the Cox Proportional Hazard model. The strata variable identifies individual decision-makers. Since each decision-maker is represented by multiple observations, the strata variable has a unique ID for each individual and thus acts as a grouping variable. The SPSS code for Conditional Logit is as follows: Definition Coxreg t with aasc casc tasc gc ttme hinca /status=mode(1) /strata=subject.

Categorical Data Analysis

18

References Mroz, T. A. (1987). The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica. Vol. 55, no. 4. pp.765-799. Long, J. Scott, Freese, Jeremy. (2005). Regression models for categorical dependant variables using Stata. Stata Press. Texas. Louviere, J. J., Hensher, D. A., and Swait, J. D. (2000). Stated choice methods: Analysis and application. Cambridge University Press. Domencich, T., McFadden, D. (1975). Urban travel demand: A behavioral analysis. North- Holland, Amsterdam. Greene, William H. (1997). Econometric Analysis. 3rd edition. Prentice Hall. McFadden, D. (1974). The measurement of urban travel demand. Journal of Public Economics. 3(4) 303-328. Powers, Daniel A., Xie, Yu. (2000). Statistical methods for categorical data analysis. Academic Press. California.

Categorical Data Analysis

19

Related Documents