page 1 Chapter 21 Logistic Regression for Binomial Response Variables Response variable is binomial counts, denoted by counts of binary variables. That is, if X~bernoulli(p), then Y = ∑ X i ∼ Binomial ( n, p ) i
Example 1 Case Study 21.1: Island size and bird extinctions: On each island we count the number of species that went extinct out all the species on the island. What is the relationship between the area of an island and the probability of extinction of birds present on the island? Example 2 Case study 21.2: Moth coloration and natural selection: At each distance from Liverpool we count the number of moths from each morph that were taken by predators. What is the relationship between the distance from Liverpool, where trees are dark from industrial soot, and the probability of predation on the light and dark morphs of the moth Carbonaria?
Logistic regression model for binomial counts Y = the number of successes in m binomial trials. For example, how many species went extinct in the 10 year period of the study on each island? Yi ~Binomial(mi,πi) where i is the ith island and πi is the probability of extinction on each island. X 1 ,..., X p : explanatory variables, in the extinction example, X is the area of the island.
For a particular set of values of the explanatory variables,
µ (Y X 1 ,…, X p ) = π = probability of success. Y/m = the binomial proportion. Note that the sample size in the bird extinction study is the number of islands, not the number of species. •
We model µ (Y ) = π just as we did for the binary response model:
•
logit(π) = η = β 0 + β1 X 1 + … + β p X p
•
As before:
π=
eη 1 + eη
page 2 Continuous versus Counted Proportions: Not all proportions are appropriate to model with logistic regression. We model proportions like fat calories/total calories, etc., using normal theory, usually. The only proportions that are appropriate in this context are those that result from an integer count of a certain outcome over the total number of trials or outcomes.
Variance Since the response variable, Y, is a Binomial random variable we have:
µ (Yi X 1i ,…, X pi ) = πi SD (Yi X 1i ,… , X pi ) =
miπ i (1 − π i )
Variance of Y is not constant across all i.
Estimation of Logistic Regression Coefficients As for the binary response model we use the maximum likelihood estimators (MLE’s). Model Assessment Estimated versus observed: One was to assess the appropriateness of the model and the efficacy of the estimation routine is to plot the estimated probability, πˆi , against the observed response Y proportion, π i = i . Additionally, plots of the observed logits versus one or more of the mi explanatory variables are useful for visual examination as we do for ordinary scatterplots in linear regression. See Display 21.2.
Residual analysis: As in the binary response case, we have two widely used residuals for binomial counts models. Residuals There are two standard ways to define a residual in logistic regression. 1. Pearson residual =
yi − miπˆi . miπˆi (1 − πˆi )
⎧⎪ ⎛ Y ⎞ ⎛ m − Y ⎞ ⎫⎪ 2. Deviance residual = Dresi = sign (Yi − miπˆi ) 2 ⎨Yi log ⎜ i ⎟ + ( mi − Yi ) log ⎜ i ⎟⎬ ⎝ miπˆi ⎠ ⎝ mi − miπˆi ⎠ ⎭⎪ ⎩⎪
The Pearson residual is more easily understood, but the deviance residual directly gives the contribution of each point to the lack of fit of the model.
page 3 Since the data are grouped, the residuals in a binomial counts logistic regression (either Pearson or deviance) are more useful than in the binary response regression. The residuals should be plotted against the predicted values for πis and examined for outliers or remaining patterns. Comparing two models As with the binary response case, we use the value –2ln(Maximized likelihood function) to compare models. Recall that the MLE’s of the β i ’s are the values that maximize the likelihood function of the data. So, we find that the values for the β i ’s that maximize the likelihood function, take the natural log and multiply by –2. •
The quantity –2 ln(Maximized likelihood) is also called the deviance of a model since larger values indicate greater deviation from the assumed model. Comparing two nested models by the difference in deviances is a drop-in-deviance test.
•
The difference between the values of –2ln(Maximized likelihood function) for a full and reduced model has approximately a chi-square distribution if the null hypothesis that the extra parameters are all 0 is true. The d.f. is the difference in the number of parameters for the two models.
Model Selection Both AIC and BIC can be used as model selection criteria. As with linear regression models, they are only relative measures of fit, not absolute measures of fit.
AIC = Deviance + 2p BIC = Deviance + plog(n) where p is the number of parameters in the model. Stepwise model selection methods are available in SPSS using likelihood ratio tests or Wald’s test. The LR methods are preferred. Other software programs, like S-Plus, have stepwise procedures using AIC or BIC.
Goodness-of-fit Tests Since we have multiple counts per cell there is a goodness-of-fit test similar to that in linear regression. We can compare the model with the log odds or logit is linear in the parameters to the model where each cell has a separate mean. So we are comparing the logistic regression model with p predictors to the model n different parameters, where p is the number of predictors in the logistic regression model and n is the number of categorical treatment combinations. That is we are testing
Reduced model: logit(π i ) = β 0 + β1 x1 + ... + β p x p
(p parameters)
Saturated model: logit(π i ) = α i
(n parameters)
page 4 ⎧⎪ ⎛ Y ⎞ ⎛ m −Y The test statistic is D 2 = ∑ Di 2 = ∑ 2 ⎨Yi log ⎜ i ⎟ + ( mi − Yi ) log ⎜ i i i ⎝ miπˆi ⎠ ⎝ mi − miπˆi ⎩⎪
⎞ ⎫⎪ ⎟⎬ ⎠ ⎭⎪
Both the denominators, miπˆi and mi − miπˆi , need to be large for the distribution of the test
{
statistic to be approximately χ 2 n − p . The pvalue for the test is then Pr χ 2 n − p ≥ D 2
}
Wald Test and Confidence Intervals for Single Coefficients. The Wald test performs similarly as in the binary counts case. The normal approximation used by this test is adequate so long as n is moderately large and the mπ is greater than 5. Fitting the model in Matlab.
Below is the code for fitting the bird extinction model in Matlab. case2101 =[185.8000 75.0000 105.8000 67.0000 3.0000 30.7000 66.0000 10.0000 8.5000 51.0000 6.0000 4.8000 28.0000 3.0000 4.5000 20.0000 4.0000 4.3000 43.0000 8.0000 3.6000 31.0000 3.0000 2.6000 28.0000 5.0000 1.7000 32.0000 6.0000 1.2000 30.0000 8.0000 0.7000 20.0000 2.0000 0.7000 31.0000 9.0000 0.6000 16.0000 5.0000 0.4000 15.0000 7.0000 0.3000 33.0000 8.0000 0.2000 40.0000 13.0000 0.0700 6.0000 3.0000];
5.0000
extinct=case2101(:,3); atrisk=case2101(:,2); area=case2101(:,1); [b,dev,stats]=glmfit(area,[extinct atrisk],'binomial'); x = 1:10:180; y = glmval(b,x,'logit'); plot(area,extinct./atrisk,'x',x,y,'r-')