Classical Regression Model

  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Classical Regression Model as PDF for free.

More details

  • Words: 2,863
  • Pages: 11
5

The Classical Model: Ordinary Least Squares

5.1

Introduction and Basic Assumptions

• We have developed the simple regression model in which we included only an intercept term and one right-hand side variable. • Oftentimes we would think that this is rather naive because we are not able to explain much of the variation in Y and we may have theoretical justification for including other variables. • We have hinted already at the idea that we would include more than one right-hand side variable in a model. Indeed, we often have any number of regressors on the right hand side. • To this end, we develop the OLS model with more than one rhs variable. To make the notation easier, we will use matrix notation. • We note that the econometric model must be linear in parameters:

yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ²i Y

= Xβ + ²

• We assume that y = [N × 1], X = [N × k] and includes a constant term (represented by a column of ones), β = [k × 1] and ² = [N × 1]. • We can only include observations with fully defined values for each Y and X. • The Full Ideal Conditions: 1. Model is linear in parameters 2. Explanatory variables (X) are fixed in repeated samples (non-stochastic) 3. X has rank of k where k < N . 4. ²i are independent and identically distributed. 5. All error terms have a zero mean: E[²i ] = 0. 6. All error terms have a constant variance: E[²²0 ] = σ 2 I. 64

7. #6 implies that the error terms have no covariance: E[²i ²j ] = 0 ∀ i 6= j. • The linear estimator for β, denoted βˆ is found by minimizing the sum of squared errors over β where SSE =

N X

²2i =

i=1

N X

(y − βX)2

i=1

or in matrix notation SSE = ²0 ² = (y − Xβ)0 (y − Xβ).

5.2

Aside: Matrix Differentiation

• We know that ²0 ² = y 0 y − 2β 0 X 0 y + β 0 X 0 X 0 β. • The second term is clearly linear in β since X 0 y is a k-element vector of known scalars, whereas the third term is a quadratic in β. • Looking at the linear term, one could write the term as f (β) = a0 β = a1 β1 + a2 β2 + · · · + ak βk = β 0 a where a = X 0 y. • Taking partial derivatives with respect to each of the βi and arranging the results in a column vector yields   a1  a2    0 0  ∂(β a)  ∂(a β) a 3 =a = =   ∂β ∂β  .   .  ak • For the linear term in the SSE, it immediately follows that ∂(2β 0 X 0 y) = 2X 0 y ∂β • The quadratic term can be rewritten as β 0 Aβ where the matrix A is of known constants, i.e.,

65

X 0 X. We can write this as 

 · f (β) =

β1 β2

¸  a11 a12 a13  β3   a21 a22 a23  a31 a32 a33



  β1     β   2    β3

= a11 β12 + a22 β22 + a33 β32 + 2a12 β1 β2 + 2a13 β1 β3 + 2a23 β2 β3

• The vector of partial derivatives is then written as   ∂f (β)  =  ∂β 

 ∂f ∂β1 ∂f ∂β2 ∂f ∂β3





  2(a11 β1 + a12 β2 + a13 β3 )    =  2(a β + a β + a β ) 12 1 22 2 23 3     2(a13 β1 + a23 β2 + a33 β3 )

   = 2Aβ  

• This result holds for any symmetric quadratic form, that is ∂(β 0 Aβ) = 2Aβ ∂β for any symmetric A. From our SSE, we have A = X 0 X and substituting we obtain ∂(β 0 X 0 Xβ) = 2(X 0 X)β. ∂β

5.3

Derivation of the OLS Estimator

• The minimization of the SSE leads to the first-order necessary conditions: ∂SSE = −2X 0 y + 2X 0 Xβ = 0 ∂β • This is a matrix version of the simple regression model. There is one fonc for every parameter to be estimated. • We solve these k first order conditions by dividing by 2, taking X 0 y to the right-hand side and solving for β.

66

• Unfortunately, we cannot divide when it comes to matrices, but we do have the matrix analogue to division, the inverse matrix. • Pre-multiply both sides of the equation by (X 0 X)−1 to obtain (X 0 X)−1 (X 0 X)β = (X 0 X)−1 X 0 y

• The first two matrices on the left hand side cancel each other out to become the identity matrix, a la A−1 A = I, which can be suppressed so that the estimator for β, denoted βˆ is βˆ = (X 0 X)−1 X 0 y

• Note that the matrix-notation version of βˆ is very analogous to the scalar version derived in the simple regression model. ˆ P x2 . • (X 0 X)−1 is the matrix analogue to the denominator of the simple regression estimator β, i ˆ P xi yi . • Likewise X 0 y is the matrix analogue to the numerator of the simple regression estimator β, • Remember that βˆ is a vector of estimated parameters, not a scalar. ˆ and var(β). ˆ • We look again at the first two moments of βˆ in matrix form: E[β] ˆ E[β] ˆ = E[(X 0 X)−1 X 0 y] where y = Xβ + ². Therefore, • The expectation of β: ˆ = E[(X 0 X)−1 X 0 (Xβ + ²)] E[β] = E[β + (X 0 X)−1 X 0 ²]: but β is a constant so, ˆ = β + (X 0 X)−1 X 0 E[²]: but E[²] = 0 so, E[β] ˆ = β+0 E[β] ˆ = β E[β]

67

ˆ is found by taking the E[(βˆ − β)(βˆ − β)0 ]. This leads to the following: • The cov(β) ˆ = E[(β + (X 0 X)−1 X 0 ² − β)(β + (X 0 X)−1 X 0 ² − β)0 ] cov(β) = E[((X 0 X)−1 X 0 ²)(X 0 X)−1 X 0 ²)0 ] = E[((X 0 X)−1 X 0 ²²0 X(X 0 X)−1 ] = (X 0 X)−1 X 0 E[²²0 ]X(X 0 X)−1 = (X 0 X)−1 X 0 σ 2 IX(X 0 X)−1 = σ 2 (X 0 X)−1 XX(X 0 X)−1 ˆ = σ 2 (X 0 X)−1 cov(β)

• How do we get an estimate of σ 2 ? • We use the fitted residuals ²ˆ and adjust for the appropriate degrees of freedom: · σ ˆ2 =

²ˆ0 ²ˆ N −k

¸

where k is the number of right-hand side variables (including the constant term). ˆ = X 0 Y − X 0 X βˆ = X 0 Y − X 0 X(X 0 X)−1 X 0 Y = X 0 Y − X 0 Y = 0 • Note: X 0 ²ˆ = X 0 [Y − X β]

5.4

The Properties of the OLS Estimator

ˆ = β and cov(β) ˆ = σ 2 (X 0 X)−1 we move to prove the Gauss-Markov • Having shown that E[β] Theorem. • The Gauss-Markov Theorem states that βˆ is BLUE or Best Linear Unbiased Estimator. Our estimator is the ”best” because it has the minimum variance of all linear unbiased estimators. • The proof is relatively straight forward. • Consider another linear estimator β˜ = C 0 y where C is some [N × k] matrix.

68

˜ = β it must be true that • For E[β] ˜ = E[C 0 y] = E[C 0 (Xβ + ²)] E[β] = E[C 0 Xβ + C 0 ²] = C 0 Xβ ˜ = β it must be true that C 0 X = I. Thus, for E[β] • Now consider the following lemmas: Lemma: β˜ = βˆ + [C 0 − (X 0 X)−1 X 0 ]y Proof: β˜ = C 0 y, thus β˜ = C 0 y + (X 0 X)−1 X 0 y − (X 0 X)−1 X 0 y = βˆ + [C 0 − (X 0 X)−1 X 0 ]y (1)

Lemma: β˜ = βˆ + [C 0 − (X 0 X)−1 X 0 ]² Proof: β˜ = βˆ + [C 0 − (X 0 X)−1 X 0 ]y, thus β˜ = βˆ + [C 0 − (X 0 X)−1 X 0 ][Xβ + ²] = βˆ + C 0 Xβ − β + [C 0 − (X 0 X)−1 X 0 ]² but C 0 X = I from before, so that = βˆ + [C 0 − (X 0 X)−1 X 0 ]²

• With these two lemmas we can continue to prove the Gauss-Markov theorem. We have determined ˆ ≤ cov(β). ˜ that both βˆ and β˜ are unbiased. Now, we must prove that cov(β)

˜ = E[(β˜ − E[β])( ˜ β˜ − E[β]) ˜ 0] cov(β)

69

Now, take advantage of our lemmas and that β˜ is unbiased to obtain ˜ = E[(βˆ + [C 0 − (X 0 X)−1 X 0 ]² − β)(βˆ + [C 0 − (X 0 X)−1 X 0 ]² − β)0 ] cov(β) = E[((X 0 X)−1 X 0 ² + [·]²)((X 0 X)−1 X 0 ² + [·]²)0 ] : [·] = [C 0 − (X 0 X)−1 X 0 ] = σ 2 (X 0 X)−1 + σ 2 [C 0 − (X 0 X)−1 X 0 ][C 0 − (X 0 X)−1 X 0 ]0

• The matrix [C 0 − (X 0 X)−1 X 0 ][C 0 − (X 0 X)−1 X 0 ]0 is non-negative semi-definite. This is the matrix analogue to saying greater than or equal to zero. ˆ ≤ cov(β) ˜ and βˆ is BLUE. • Thus, cov(β) • Is βˆ consistent? • Assume lim

N →∞

1 (X 0 X) = Qxx which is nonsingular N

• Theorem: plimβˆ = β. ˆ = β. • Proof: βˆ is asymptotically unbiased. That is, limN →∞ E[β] ˆ as Rewrite the cov(β) ˆ = σ 2 (X 0 X)−1 = cov(β)

σ2 N

µ

¶−1 1 (X 0 X) N

Then ˆ = lim lim cov(β)

N →∞

σ 2 −1 Q =0 N xx

which implies that the covariance matrix of βˆ collapses to zero which then implies plimβˆ = β

5.5

Multiple Regression Example: The Price of Gasoline

• Some express concern that there might be price manipulation in the retail gasoline market. To see if this is true, monthly price, tax, and cost data were gathered from the Energy Information Agency (www.eia.gov) and the Tax Foundation (www.taxfoundation.org). 70

50

100

150

200

250

• Here is a time plot of the retail and wholesale price of gasoline (U.S. Average)

0

50

100

150

200

250

obs gasprice

wprice

• Here are the results of a multiple regression analysis: . reg allgradesprice fedtax avestatetax wholesaleprice obs Source | SS df MS ---------+-----------------------------Model | 408989.997 4 102247.499 Residual | 2969.76256 259 11.4662647 ---------+-----------------------------Total | 411959.76 263 1566.38692

Number of obs = 264 F(4, 259) = 8917.25 Prob > F = 0.0000 R-squared = 0.9928 Adj R-squared = 0.9927 Root MSE = 3.3862

-----------------------------------------------------------------gasprice | Coef. Std. Err. t P>|t| [95% Conf. Int] --------------+--------------------------------------------------fedtax | 1.268 .159 7.94 0.000 .953 1.583 avestatetax | .725 .203 3.57 0.000 .325 1.125 wholesaleprie | 1.091 .011 92.62 0.000 1.068 1.115 trend | .033 .009 3.62 0.000 .015 .051 _cons | 5.281 2.698 1.96 0.051 -.031 10.594 -----------------------------------------------------------------

• The dependent variable is measured in pennies per gallon, as are all independent variables. • The results suggest: 1. For every penny in federal tax, the retail gasoline price increases by 1.268 pennies. 71

2. For every penny in state sales tax the price increases by only 0.725 cents. 3. For every penny in wholesale price, the retail price increases by 1.091 pennies. 4. The time trend, which advances by one unit for every month starting in January 1985, indicates that the average real price of gasoline increases by about 0.03 cents per gallon per month, everything else equal. 5. The multiple regression results do not suggest a tremendous amount of pricing power on the part of retail outlets. 6. The R2 is very high; approximately 99.2% of the variation in retail gasoline prices are explained by the variables included in the model (although it should be noted that the data are time-series in nature and therefore a high R2 is expected). 7. To return to the conspiracy theory that prices are actively manipulated by retailers, the 95% confidence interval of the wholesale price parameter is [1.068, 1.115]. At the maximum, the historical pre-tax markup on marginal cost at the retail level is approximately 11.5%, which is consistent with the rest of the retail sector. 8. One other conclusion is that while wholesale price increases are associated with retail price decreases, it is also true that wholesale price decreases are associated with retail price decreases. Or is this conclusion too strong for the given estimation? • What if we defined a dummy variable that took a value of one when the wholesale price of gasoline declined from one month to the next and included that as an additional regressor. If the retail market reacts symmetrically to increases and decreases in wholesale price changes, this dummy variable should have an insignificant parameter. . tsset obs time variable:

obs, 1 to 264

. gen wpdown = [wholesaleprice
. reg allgradesprice fedtax avestatetax wprice wpdown obs

-----------------------------------------------gasprice | Coef. Std. Err. t P>|t| -------------+---------------------------------fedtax | 1.328 .132 10.00 0.000 avestatetax | .741 .168 4.39 0.000 wprice | 1.101 .009 112.04 0.000 wpdown | 3.777 .348 10.84 0.000 obs | .027 .007 3.63 0.000 _cons | 2.245 2.258 0.99 0.321 -------------------------------------------------

• The historical data suggest that the price of gasoline is 3.777 cents higher in months when the wholesale price declines. This suggests that there is an asymmetric effect of wholesale price changes on retail prices of gasoline.

5.6

Multiple Regression Example: Software Piracy and Economic Freedom

• Many in information providing industries are anxious about software piracy. Many policy suggestions have been made and the industry is pursuing legal remedies against individuals. • However, there might be economic influences on the prevalence of software piracy. Bezmen and Depken (2006, Economics Letters) looks at the impact of various socio-economic factors on estimated software piracy rates in the United States from 1999, 2000, and 2001. • Consider the simple regression model in which piracy is related to per capita income (measured in thousands): . reg piracy lninc Regression with robust standard errors

Number of obs F( 1, 148) Prob > F R-squared Root MSE

= = = = =

150 38.48 0.0000 0.2045 8.3097

----------------------------------------------------------------Robust Piracy | Coef. Std. Err. t P>|t| [95% Conf. Interval] 73

--------+-------------------------------------------------------lninc | -24.39808 3.932914 -6.20 0.000 -32.17 -16.62616 _cons | 114.1777 13.55488 8.42 0.000 87.39163 140.9638 ----------------------------------------------------------------• As expected, states with greater income levels have lower levels of software piracy. • What if we include other factors such as Economic Freedom (from the Fraser Institute), the level of taxation (from the Tax Foundatin), unemployment (from the Bureau of Labor Statistics), and two dummy variables to control for year 2000 and year 2001: . reg piracy sfindex statetax lninc unemp yr00 yr01 Regression with robust standard errors

Number of obs = 150 F( 6, 143) = 18.87 Prob > F = 0.0000 R-squared = 0.2887 Root MSE = 7.994 ------------------------------------------------------------------| Robust piracy | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------+----------------------------------------------------sfindex | -2.652959 1.331309 -1.99 0.048 -5.284546 -.0213714 statetax | -1.914663 .6571426 -2.91 0.004 -3.213632 -.6156949 lninc | -24.83959 3.248853 -7.65 0.000 -31.26157 -18.41761 unemp | .2433874 .7928428 0.31 0.759 -1.323819 1.810594 yr00 | 1.568842 1.390505 1.13 0.261 -1.179758 4.317442 yr01 | 3.611868 1.611408 2.24 0.027 .4266101 6.797126 _cons | 150.7459 18.60083 8.10 0.000 113.9778 187.514 -------------------------------------------------------------------

• The parameter estimate on income did not change very much. 1. A 1% increase in income corresponds with a reduction in software piracy of 24%. 2. States with greater economic freedom tend to pirate less. 3. States with greater taxation (which might proxy for enforcement efforts) tend to pirate less. 4. States with greater unemployment experience do not pirate more. 5. The parameter on yr01 suggests that piracy was greater in 2001 than in 2000 or 1999. • The upshot: software piracy seems to be an inferior good.

74

Related Documents

Classical Regression Model
October 2019 20
Regression Model
October 2019 23
74 Classical Model Poses
November 2019 18
63 Classical Model Poses
November 2019 17
74 Classical Model Poses
November 2019 15
Regression
November 2019 28