Lec15 Regression Review

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Lec15 Regression Review as PDF for free.

More details

  • Words: 2,214
  • Pages: 30
Regression Review and Robust Regression

Slides prepared by Elizabeth Newton (MIT)

S-Plus Oil City Data Frame Monthly Excess Returns of Oil City Petroleum, Inc. Stocks and the Market SUMMARY: The oilcity data frame has 129 rows and 2 columns. The sample runs from April 1979 to December 1989. This data frame contains the following columns: VALUE: Oil monthly excess returns of Oil City Petroleum, Inc. stocks.

Market monthly excess returns of the market.

E Newton This output was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

2

Oil City Data (continued) • Returns = relative change in the stock price over a one month interval • Excess returns are computed relative to the monthly return of a 90-day US Treasury bill at the risk-free rate • Financial economists use least squares to fit a straight line predicting a particular stock return from the market return. • Beta= estimated coefficient of the market return. Measures the riskiness of the stock in terms of standard deviation and expected returns. • Large beta -> stock is risky compared to market, but also expected returns from the stock are large. E Newton

3

-0.1 -0.2

oilcity$Market

0.0

Plot of Market returns vs. month

0

20

40

60

80

100

120

Month

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

4

0

1

2

Oil

3

4

5

Plot of Oil City Petroleum return vs. month

0

20

40

60

80

100

120

month

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

5

0

10

20

30

40

50

Histogram of Market Returns

-0.3

-0.2

-0.1

0.0

0.1

Market

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

6

0

20

40

60

80

100

Histogram of Oil City Returns

-1

0

1

2

3

4

5

Oil

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

7

Plot of Oil City vs. Market Returns

2 1

79 106 8

20 107 100 49 57 3 6 953 66 14 117109 16 52 55 34 111 68123 450 2119 90 9183 9296 108 93 85 8746 101 86 84124 110 82 112 78 81 76 128 17 127 70 126 121 118 54 48 120 51 544 63 23 64 88 22 35 2528 38 19 113 71 42 31 61 73 99 58 115 75 18 77 69 122 60 125 15 72 26 114 32 74 13 67 39 1 95 37 80 62 116 45 24 98 56 59 33 47 10 7 27 105 104 129 40 89 102 97 21 65 36 11

0

Oil City

3

4

5

94

103 12

29 30

-0.2

-0.1

41 43

0.0

Market

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

8

0.8

Plot of Oil City vs. Market Returns without observation 94 79

0.6

105

0.2

20 3

106 99 49

41

14

953

0.0

46 116 16 52 110 68 4 85 2 90 8855 92 95 93 107 108 87 100 84 9183 109 111 82 48 81 86 78 76 127 126 17 125 25 50 54 118 122 51 120 119 544 117 12370 63 23 64 19112 71 42 28 98 58 61 18737731 114 35 6975 121 60 124 15 72 3213 113 74 94 67 39 3826 80 62 33 59 115 1 37 24 4556 97 10 47 27 104 103 128 40 89 101 96 34

22

29

-0.2 -0.4

57 6 66

11

7 30

102 12

43

21 65

36

-0.6

Oil City

0.4

8

-0.25

-0.20

-0.15

-0.10

-0.05

0.0

0.05

Market

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

9

> summary(oilcity) Oil Min.:-0.55667260 1st Qu.:-0.23968330 Median:-0.10049000 Mean:-0.07221215 3rd Qu.:-0.05821000 Max.: 5.19292000

Market Min.:-0.27857020 1st Qu.:-0.10557534 Median:-0.07277544 Mean:-0.07689209 3rd Qu.:-0.03973828 Max.: 0.07131940

E Newton This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

10

Summary oil.lm Call: lm(formula = Oil ~ Market, data = oilcity) Residuals: Min 1Q Median 3Q Max -0.6952 -0.1732 -0.05444 0.08407 4.842 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1474 0.0707 2.0849 0.0391 Market 2.8567 0.7318 3.9040 0.0002 Residual standard error: 0.4867 on 127 degrees of freedom Multiple R-Squared: 0.1071 F-statistic: 15.24 on 1 and 127 degrees of freedom, the p-value is 0.0001528 Correlation of Coefficients: (Intercept) Market 0.7956 E Newton This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

11

5

Plot of residual vs. fit for oil.lm

2 1

79 0

Residuals

3

4

94

65 -0.6

-0.4

-0.2

0.0

0.2

Fitted : Market

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

12

Plot of Cooks Distance vs. Index

2.0 1.5 1.0 0.5 0.0

Cook's Distance

2.5

3.0

94

43

0

20

40

65

60

80

100

120

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

13

0.10

Plot of hat matrix diagonals for oil.lm 103

0.06

94

0.04

29 30

0.02

hat(model.matrix(oil.lm))

0.08

12

7 22 21

43 41

35 89 105 25 65 27 70 28 34 39 62 124 91 95 8083 107 49 11 8486 26 333638 106 99 111 74 100 1416 23 118 104 52 59 64 7981 8890 96 101 40 44 5 114 121 464851 5557 78 23 20 19 127 87 92 98102 109 119 122 113 6971 616366 68 115 116 10131517 125 129 128 32 37 424547 67 117 1 4 6 89 110 18 24 31 123 72 93 97 120 126 7375 54565860 108 112 76 77 8285 5053

0

20

40

60

80

100

120

month

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

14

Summary of model without observation 94 Call: lm(formula = Oil ~ Market, data = oilcity94) Residuals: Min 1Q Median 3Q Max -0.5169 -0.1174 -0.01959 0.06864 0.859 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -0.0247 0.0304 -0.8139 0.4173 Market 1.1355 0.3137 3.6202 0.0004 Residual standard error: 0.2033 on 126 degrees of freedom Multiple R-Squared: 0.09422 F-statistic: 13.11 on 1 and 126 degrees of freedom, the p-value is 0.0004249 Correlation of Coefficients: (Intercept) Market 0.8061 E Newton This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

15

Plot of residual vs fit for model without observation 94 0.8

79

0.6

105

0.2 0.0 -0.2 -0.4

Residuals

0.4

8

-0.3

-0.2

-0.1

0.0

Fitted : Market

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

16

Weighted Least Squares Used when observatio ns, y i , have unequal variances y = Xβ + ε E (ε ) = 0, Var (ε ) = σ 2V V is non - singular positive definite V is diagonal if errors are uncorrelated, V is always symmetric ∃ nxn non - singular symmetric matrix, R such that R' R = RR = V R is sometimes called the square root of V

E Newton

17

Weighted least squares (continued) Define new variables : y * = R −1y , X * = R −1 X , ε * = R −1ε y = Xβ + ε becomes R y = R Xβ + R ε , or y* = X * β + ε * −1

−1

−1

E (ε * ) = E (R ε ) = 0 −1

E Newton

18

Weighted least squares (continued) Var (ε * ) = E {[ε * − E (ε * )][ε * − E (ε * )]' } = E (ε *ε * ' ) = E (R εε ' R ) −1

−1

= R E (ε ε ' )R −1

−1

= σ 2R −1VR −1 = σ 2R −1RRR −1 = σ 2I E Newton

19

Weighted Least Squares (continued) Q( β ) = ε * ' ε * = εV −1ε = εWε , W = V -1 = weights = ( y − Xβ )'W ( y − Xβ ) Least squares normal equations are (X' WX) βˆ = X' Wy The solution is : βˆ = (X' WX) -1 X 'Wy Var ( βˆ ) = ( X' WX) -1 X 'W var( y )WX ( XWX )−1 = σ 2 ( X' WX) -1 X 'WW −1WX ( X 'WX )−1 = σ 2 ( X 'WX )−1

E Newton

20

Robust Regression Used to reduce influence of outliers LAR Regression : n

n

i=1

i=1

minimize L1 = ∑ | y i − x i β | = ∑ | ei | LMS Regression : minimize : median{[y i − x i β ]2 } = median{e i2 } M estimators : n

n

i=1

i=1

minimize : ∑ g(y i − x i β ) = ∑ g(ei ), g a function of residuals

E Newton

21

Robust Regression (continued) IRLS, iteratively reweighted least squares Minimize e’We W is a diagonal matrix of weights, inversely proportional to magnitude of scaled residuals, ui ui=ei/s, s=MAD=median{|ei-median(ei)|} Procedure: 1. Obtain initial coefficient estimates from OLS 2. Obtain weights from scaled residuals 3. Obtain coefficient estimates from WLS 4. Return to 2. Convergence usually rapid. E Newton

22

(See Figure 10.4, and Equations 10.44 and 10.45 in Neter et al. Applied Linear Statistical Models.)

Neter et al. Applied Linear Statistical Models

23

3 2 1 0

oil.rreg$resid

4

5

Plot of residuals in oil.rreg

0

20

40

60

80

100

120

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

24

5 7 12 4

0.6

6163 8486 113 121 83 91 96 101 124 51 54 5860 7173 76 70 18 19 26 31 35 38 120 39 4244 4850 128 109 77 81 82 115118 17 110 126 123 112 8587 92 6264 6769 72 75 119 23 28 93 99 103 108 127 111 122125 78 33 29 15 59 32 88 37 55 68 13 16 22 2527 116 114 90 46 24 30 45 74 117 56 104 12 95 98 40 47 52 10 14 80 41 34 21 9 129 53 102 49 97 43 105 66 89

3 6

100

36

0.4

11 57

0.2

107 65 20

0.0

Weights

0.8

1.0

Plot of weights in robust regression for oil city data set

8 0

79 20

40

60

80

106

94 100

120

Month

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

25

0 -1

(sqrt(oil.rreg$w) * oil.rr....

1

Plot of sqrt(weights)*resid/s in oil.rreg

0

20

40

60

80

100

120

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

26

Coefficient table for oil.rreg > > > > > > > > >

x<-cbind(1,Market) beta<-solve(t(x)%*%diag(w)%*%x)%*%t(x)%*%diag(w)%*%Oil r<-Oil-x%*%beta s<- median(abs(r-median(r)))*1.4826 covm<-solve(t(x)%*%diag(w)%*%x)*s^2 se<-sqrt(diag(covm)) tvalue=beta/se prob<-2*(1-pt(abs(tvalue),127)) cbind(beta,se,tvalue,prob) beta se tvalue prob (Intercept) -0.06779903 0.02451469 -2.765649 0.0065285939 x 0.89895511 0.24902845 3.609849 0.0004394276 Covariance matrix is approximate.

E Newton This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

27

Plots of fitted regression lines for oil city data

4

5

94

1

2

Oil

3

oil.lm oil.lm94 oil.rreg

79 106 8

20 107 100 49 57 3 6 953 66 14 117109 16 52 55 34 111 68123 450 2119 90 9183 9296 108 93 85 8746 101 86 84124 110 82 112 78 81 76 128 17 127 70 126 121 118 54 48 120 51 544 63 23 64 88 22 35 2528 38 19 113 71 42 31 61 73 99 58 115 75 18 77 69 122 60 125 15 72 26 114 32 74 13 67 39 1 95 37 80 62 116 45 24 98 56 59 33 47 10 7 27 105 104 129 40 89 102 97 21 65 36

0

11

103 12

29 30

-0.2

-0.1

41 43

0.0

Market

E Newton This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

28

Least Trimmed Squares Regression q

Minimizes :

2 e ∑ i, i =1

where q is chosen to be between n/2 and n

Based on a genetic algorithm for finding a subset of data with minimum SSE. High breakdown point: fits the bulk of the data well, even if bulk is only a little more than half the data. Resulting weights are 1 or 0 E Newton

29

> summary(oil.lts) Method: [1] "Least Trimmed Squares Robust Regression." Call: ltsreg(formula = Oil ~ Market) Coefficients: Intercept Market -0.0864 0.7907 Scale estimate of residuals: 0.1468 Robust Multiple R-Squared: 0.09863 Total number of observations:

129

Number of observations that determine the LTS estimate: Residuals: Min. 1st Qu. Median 3rd Qu. -0.454 -0.088 0.032 0.097 Weights: 0 1 10 119

116

Max. 5.223

E Newton This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

30

Related Documents

Regression
November 2019 28
Regression
May 2020 27
Regression
November 2019 24