Chap. 10, page 1 Chapter 10 Inferences About Regression Coefficients
Math 445
Chapter 10 concerns statistical inferences about individual regression coefficients, about linear combinations of coefficients, and about sets of coefficients. All these inferences, which are based on either the t or F distribution, are dependent on the assumptions of normality of the residuals, constant variance, and independence. Assessment of these assumptions is covered in Chapter 11. Example 1: Rainfall data Consider the additive model (a model without interactions is called “additive” because the effects of the variables are additive and don’t depend on the levels of the other variables):
µ (Precip Latitude, Altitude, Rainshadow ) = β 0 + β1Latitude + β 2 Altitude + β 3Rainshadow Assume that the linear regression model assumptions are satisfied; that is, that the model fits, that the residuals are normal with constant variance and that the observations are independent. The formal inferences we make below are valid only if these assumptions are satisfied. Coefficientsa
Model 1
(Constant) Latitude (degrees) Altitude (ft) Rainshadow
Unstandardized Coefficients B Std. Error -97.557 24.554 3.428 .667 .00115 .00085 -19.688 3.439
t -3.973 5.139 1.352 -5.725
Sig. .0005 .0000 .1880 .0000
95% Confidence Interval for B Lower Bound Upper Bound -148.028 -47.085 2.057 4.800 -.00060 .00290 -26.758 -12.619
a. Dependent Variable: Precipitation (in)
•
The t statistic and P-value for each coefficient are for a two-sided test of the hypothesis that the true coefficient is 0.
•
Is there evidence of an effect of latitude on precipitation? This is addressed by a two-sided test of the hypothesis H 0 : β1 = 0 . There is convincing evidence (P=.0005) that β1 is greater than 0. In addition, we estimate that mean precipitation rises about 3.43 inches for every one degree increase in latitude (95% confidence interval: 2.06 to 4.80 inches) given that altitude and rain shadow remain the same.
•
The test of H 0 : β1 = 0 (and the confidence interval) is for the model which also has Altitude and Rainshadow in it. Thus, it is a test of the effect of Latitude after the linear effects of Altitude and Rainshadow have been adjusted for. This is different than a test of H 0 : β1 = 0 without Altitude and Rainshadow in the model.
•
We do not have convincing evidence (P = .188) that mean precipitation changes with altitude, given that latitude and rain shadow remain fixed. We estimate that mean precipitation increases by 1.15 inches for every 1000 foot increase in altitude (95% confidence interval, 0.60 inch decrease to 2.90 inch increase).
•
Do locations in the rain shadow differ from those not in the rain shadow, after adjusting for the effects of latitude and altitude? (In other words, is there evidence that β 3 ≠ 0 ?) There is
Chap. 10, page 2 completely convincing evidence (P<.00005) that locations in the rain shadow receive less precipitation on average than locations of the same latitude and altitude not in the rain shadow. What is more interesting is that locations in the rain shadow are estimated to have mean precipitation 19.7 inches less (95% confidence interval: 26.8 inches to 12.6 inches less) than equivalent locations (on altitude and latitude) not in the rain shadow. Inferences and interpretation when there are interactions in the model When interactions are present in a model, the test of significance for the coefficient on a term which is involved in a higher order interaction is not useful because we must always include this term in the model anyway. In addition, the coefficient on this term does not have a meaningful interpretation. Example: In the Chapter 9 notes, we fit the following model to the rainfall data:
µ (Precip Latitude, Rainshadow ) = β 0 + β1Latitude + β 2 Rainshadow + β 3Latitude * Rainshadow Coefficientsa
Model 1
(Constant) Latitude (degrees) Rainshadow Latitude*Rainshadow
Unstandardized Coefficients B Std. Error -175.457 26.177 5.581 .705 139.839 39.019 -4.315 1.051
Standardized Coefficients Beta .895 4.240 -4.871
t -6.703 7.912 3.584 -4.105
Sig. .000 .000 .001 .000
a. Dependent Variable: Precipitation (in)
The coefficient on rainshadow is large and positive – but it does not mean that locations in the rainshadow are estimated to have mean precipitation 139.8 inches greater than locations of the same latitude not in the rainshadow! Why not?
The statistical significance of the coefficients on the first-order terms (Latitude and Rainshadow) is also irrelevant since they are both involved in the second-order term. In particular, if either coefficient were not statistically significantly different from 0 (large P-value), that would not mean that we had no evidence of an effect of that variable. For example, if the coefficient for Latitude in the above model had had a statistically nonsignificant coefficient, that would not mean that we had no evidence of an effect of latitude, because the effect of latitude also comes through the Latitude*Rainshadow interaction, which is statistically significant. Suppose we fit the following model with a 3-way interaction:
µ (Precip Latitude, Rainshadow ) = β 0 + β1Altitude + β 2 Latitude + β 3 Rainshadow + β 4 Altitude * Latitude + β 5 Altitude * Rainshadow + β 6 Latitude * Rainshadow + β 7 Altitude * Latitude * Rainshadow
•
We must include all two-way interactions which are part of the 3-way interaction.
•
Chap. 10, page 3 The coefficient on the 3-way interaction is interpreted as the difference between the effect of the two-way interaction between any pair of variables for different levels of the third variable. For example, β 7 represents the difference in the effect of the Altitude by Latitude interaction for locations in and not in the rain shadow.
•
The coefficients on all the terms below the 3-way interaction have no useful interpretation as long as the 3-way interaction is in the model, and the tests of significance of these terms are not meaningful.
•
The test of significance on the coefficient on the 3-way interaction is meaningful: we have no evidence that there is a 3-way interaction among these variables in their association with precipitation. That’s good: we generally don’t want to include a 3-way interaction unless we have strong evidence to the contrary.
•
Interactions will be addressed further in the model-building chapter. Coefficientsa
Model 1
(Constant) Altitude (ft) Latitude (degrees) Rainshadow Altitude*Latitude Altitude*Rainshadow Latitude*Rainshadow Alt*Lat*Raindshadow
Unstandardized Coefficients B Std. Error -178.154 26.390 .0248 .0172 5.5929 .7191 72.7033 50.9637 -.0006 .0004 .0067 .0233 -2.4465 1.3797 -.0002 .0006
Standardized Coefficients Beta 3.129 .897 2.205 -2.953 .572 -2.761 -.746
t -6.751 1.444 7.778 1.427 -1.358 .289 -1.773 -.376
Sig. .000 .163 .000 .168 .188 .776 .090 .711
a. Dependent Variable: Precipitation (in)
Inferences for linear combinations of parameters Sometimes, the effect of interest is a linear combination of parameters. Example 2: Exercise 9.18, p. 263, Speed of Evolution. There are two binary variables: Sex and Continent. Suppose they are coded as indicator variables as follows: Sex: 0 = Female, 1 = Male Continent: 0 = NA, 1 =EU Consider the model
µ (Wing Latitude, Sex, Continent ) = β 0 + β1Latitude + β 2Sex + β 3Continent + β 4Sex * Continent This model implies the following relationships between Wing size and Latitude:
Chap. 10, page 4 Female, NA: µ (Wing Latitude, Sex = 0, Continent = 0 ) = β 0 + β1Latitude
Female, EU: µ (Wing Latitude, Sex = 0, Continent = 1) = β 0 + β1Latitude + β 3 Male, NA: µ (Wing Latitude, Sex = 1, Continent = 0 ) = β 0 + β1Latitude + β 2
Male, EU: µ (Wing Latitude, Sex = 1, Continent = 1) = β 0 + β1Latitude + β 2 + β 3 + β 4 •
The slope coefficients are identical for all four groups since there are no interactions with Latitude.
•
The intercepts are different and the differences represent the vertical distances between the parallel lines relating Wing size to Latitude.
•
β 3 represents the difference between mean Wing size for females in NA and EU; a test of H 0 : β 3 = 0 and a confidence interval for β 3 can be obtained directly from the regression output.
•
The difference between mean Wing size for males in NA and EU is β 3 + β 4 . An estimate of this difference is βˆ + βˆ ; however, the SE and a confidence interval cannot be easily obtained 3
4
from the regression output. SE( βˆ3 + βˆ 4 ) depends on the SE’s of βˆ3 and βˆ 4 individually, but also on the covariance of βˆ3 and βˆ 4 . Although you can obtain the needed covariance from the SPSS regression output to calculate SE( βˆ + βˆ ), it is easier to simply reparameterize the 3
4
model to obtain this directly from the regression output. •
Reparameterization: reverse the coding on Sex: let 0 be male and 1 be female. The “Male” and “Female” labels are then switched in the above set of equations and β 3 in this new model represents the difference in mean wing size for males in NA and EU; i.e., it is the same as β 3 + β 4 in the old model. The SE of the estimated difference can be obtained directly from the regression output.
•
Reparameterizing changes the interpretation of individual parameters but it doesn’t change the model.
Inferences about the mean response at some combination of X’s. The estimated mean of Y at any combination of X’s is obtained by plugging in these values into the estimated regression equation. The standard error of the mean response can be obtained in SPSS by including an extra case in the data file which has the desired X’s but a missing value for Y. Then, as with simple linear regression, on the regression dialog box, choose Save…SE of mean predictions for the SE of the mean, and choose Prediction Intervals Mean for confidence intervals for the mean response and Prediction Intervals…Individual for prediction intervals for an individual response. These are individual confidence intervals and prediction intervals, not simultaneous.
Example 1: Rainfall data. Here are some results when the additive model was fit. µ (Precip Latitude, Altitude, Rainshadow ) = β 0 + β1Latitude + β 2 Altitude + β 3Rainshadow
Chap. 10, page 5 The fitted model is µˆ (Precip Latitude, Altitude, Rainshadow ) = −97.557 + 3.428 * Latitude + 0.00115 * Altitude − 19.688 * Rainshadow The predicted values, standard error of the mean (SEP), 95% confidence interval for the mean (LMCI, UMCI) and 95% prediction interval (LICI, UICI) are shown for cases 26-30 plus two new sets of X values. These confidence intervals are valid only if the assumptions of the regression model are satisfied; we have not checked these assumptions yet.
Case
Precip
Altitude
Latitude
Shadow
Pred
SEP
LMCI
UMCI
LICI
UICI
26 27 28 29 30
9.94 4.25 1.66 74.87 15.95
19 2105 -178 35 60 1000 3000
32.7 34.1 36.5 41.7 39.2 35.0 40.0
0 1 1 0 1 0 1
14.574 2.047 7.687 45.448 17.217 23.586 23.337
3.846 3.184 2.565 4.460 2.989 2.892 3.126
6.669 -4.499 2.415 36.281 11.072 17.640 16.911
22.479 8.593 12.959 54.615 23.362 29.531 29.763
-6.151 -18.198 -12.183 24.210 -2.902 3.527 3.130
35.299 22.292 27.557 66.686 37.336 43.645 43.544
. .
According to this model, the estimated mean annual precipitation for locations at 3000 feet and 40 degrees latitude which are in the rain shadow is 23.34 inches (95% confidence interval 16.9 to 29.8 inches). A 95% prediction interval for the annual precipitation at an individual location like this is 3.13 to 43.5 inches.
Extra –Sums-of-Squares Tests
We sometimes want to test a hypothesis about a set of parameters in a regression model. Recall that we did this in an ANOVA model where the overall F test tested H 0 : µ1 = µ 2 = … = µ I and where an extra sum of squares F test was used to compare two models. This test is valid only if the assumptions of the regression model (normality, constant variance, independence) are satisfied. Example 1: Meadowfoam study, Case Study 9.1
Suppose we fit the model regressing number of Flowers on Timing (binary variable; early or late) and Light Intensity where Light Intensity is treated as a factor with 6 levels. Thus there is an indicator variable for Timing called early (1 for early, 0 for late) and 5 indicator variables for Intensity, called L300, L450, L600, L750, L900 with 150 treated as the reference level. There are no interactions so the model is:
µ (Flowers early , LIGHT) = β 0 + β1early + β 2 L300 + β 3 L450 + β 4 L600 + β 5 L750 + β 6 L900 A shorthand way of describing the model (see Section 9.3.5, p. 249) is:
µ (Flowers early, LIGHT) = early + LIGHT Suppose we want to test the hypothesis that there is no effect of light intensity given that the Timing variable is in the model.. What hypothesis about the regression parameters do we want to test?
Chap. 10, page 6
To test this hypothesis, we fit a full model with early and all the indicator variables for LIGHT in the model. Then we fit a reduced model with just early in the model and carry out an extra sum-of-squares F-test just as we did in Chapter 5. Full model results: ANOVAb Model 1
Regression Residual Total
Sum of Squares 3570.464 767.472 4337.936
df 6 17 23
Mean Square 595.077 45.145
F 13.181
Sig. .000a
a. Predictors: (Constant), Early, L900, L750, L600, L450, L300 b. Dependent Variable: Flowers Coefficientsa
Model 1
(Constant) L300 L450 L600 L750 L900 Early
Unstandardized Coefficients B Std. Error 67.196 3.629 -9.125 4.751 -13.375 4.751 -23.225 4.751 -27.750 4.751 -29.350 4.751 12.158 2.743
Standardized Coefficients Beta -.253 -.371 -.644 -.769 -.814 .452
t 18.518 -1.921 -2.815 -4.888 -5.841 -6.178 4.432
Sig. .000 .072 .012 .000 .000 .000 .000
a. Dependent Variable: Flowers
Reduced model results: ANOVAb Model 1
Regression Residual Total
Sum of Squares 886.950 3450.986 4337.936
df 1 22 23
Mean Square 886.950 156.863
F 5.654
Sig. .027a
a. Predictors: (Constant), Early b. Dependent Variable: Flowers Coefficientsa
Model 1
(Constant) Early
Unstandardized Coefficients B Std. Error 50.058 3.616 12.158 5.113
Standardized Coefficients Beta .452
t 13.845 2.378
Sig. .000 .027
a. Dependent Variable: Flowers
Carry out the F-test (the coefficients above are not necessary for this test, only the ANOVA table).