Lecture 8 Regression Analysis

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Lecture 8 Regression Analysis as PDF for free.

More details

  • Words: 2,325
  • Pages: 40
Spanos

EE290H F05

Regression Analysis

Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error

Lecture 8: Regression Analysis

1

Spanos

EE290H F05

Regression Analysis

• In general, we "fit" a model by minimizing a metric that represents the error. n

min Σ (yi - yi)2 i=1

• The sum of squares gives closed form solutions and minimum variance for linear models. Lecture 8: Regression Analysis

2

Spanos

EE290H F05

The Simplest Regression Model

Line through the origin:

y=bx

y

x

yu =βxu +ε u

u=1,2,...,n

n

min S = min Σ (yu - βxu )2 :

ε u ~N(0, σ 2R ) estimate of σ 2R

u=1

y=bx

η u =βxu

b: estimate of β y: estimate of η u , the true value of the model. Lecture 8: Regression Analysis

3

EE290H F05

Using the Normal Equation to fit “line through the origin” model

Our model only has one degree of freedom This is why our choices are confined on this line…

y2

y=bx (1 d.f.)

Lecture 8: Regression Analysis

Spanos

min Σ (y-y)2

y

y1

4

EE290H F05

Using the Normal Equation (cont) (fitting “line through the origin” model)

Spanos

Choose b so that the residual vector is perpendicular to the model vector...

Σ

(y-y)⋅ x = 0 ⇒ Σ (y - bx)⋅ x = 0 ⇒

Σ xy b= (est. of β) s2= SR (est. of σ2R) n-1 Σ x2 2 s V(b) = Σ x2

67% conf: b ±

s2 Σ x2

*

Significance test:

Lecture 8: Regression Analysis

b-β t = ~ tn-1 2 s Σ x2 5

Spanos

EE290H F05

Etch time vs. removed material: y = bx 500 400 R e m 300 o v e d 200 ( n m 100 ) 0

0.0

Variable Name Etch Time (sec) Lecture 8: Regression Analysis

0.2

0.4 0.6 0.8 Etch Time (sec) x 10^3

Coefficient

Std. Err. Estimate

t Statistic

0.501

0.0162

30.9

1.0

Prob > t

0.000 6

Spanos

EE290H F05

Model Validation through ANOVA The idea is to decompose the sum of squares into orthogonal components. Assuming that there is no need for a model at all* (always a good null Hypothesis!): H0: β*=0

Σ y2u =

Σ y2u +

Σ (yu - yu)2

n

p

n-p

total

model

residual

* This is equivalent to saying that y~N(μ,σ2), where μ and σ are constants, independent of x. Lecture 8: Regression Analysis

7

Spanos

EE290H F05

Model Validation through ANOVA (cont) Assuming a specific model: H0 : β * = b

Σ (yu - β*xu)2 =

Σ (yu - β*xu)2 +

Σ (yu - yu)2

n

p

n-p

total

model

residual

The TheANOVA ANOVAtable tablewill willanswer answerthe thequestion: question: Is Isthere thereaarelationship relationshipbetween between xxand and y? y? Lecture 8: Regression Analysis

8

Spanos

EE290H F05

ANOVA table and Residual Plot Sum of Squares

Source

Deg. of Freedom

Mean Squares

Model

1.83e+5

1

1.83e+5

Error

6.47e+3

7

9.24e+2

Total

1.89e+5

8

R e s i d u a l s

F-Ratio 1.98e+2

Prob>F 2.17e-6

60 40 20 0 -20 -40 -60 0.0

Lecture 8: Regression Analysis

0.2

0.4 0.6 0.8 Etch Time (sec) x 10^3

1.0 9

Spanos

EE290H F05

A More Complex Regression Equation - a straight line with two parameters

actual

estimated

η = α + β (x - x )

y = a + b (x - x )

yi~ N (ηi, σ2) Minimize R =Σ (yi-yi)2 to estimate α and β Σ(xi-x)yi Σ(xi-x)(yi-y) a=y b= = 2 Σ(xi-x)2 Σ(xi-x) Are a and b good estimators of α and β? Σ(xi-x)E[yi] E[a] = α E[b] = =β 2 Σ(xi-x) Lecture 8: Regression Analysis

10

Spanos

EE290H F05

Variance Estimation:

Note that all variability comes from yi! 2 Σ yi 1 σ V[a] = V = 2 Σ V[ y i] = k k k

V[b] = V

Lecture 8: Regression Analysis

Σ (x i-x)y i Σ (x i-x) 2

2 σ = Σ (x i-x) 2

min minvar. var. thanks thanksto to least least squares! squares!

11

Spanos

EE290H F05

LTO thickness vs deposition time: y = a + bx L T 4 O t h i 3 c k A x

2

1 0 ^ 1 3 1.0

1.5

Variable Name

Coefficient

Constant Dep time

6.04e+1 9.75e-1

Lecture 8: Regression Analysis

2.0 2.5 3.0 Dep time x 10^3 Std. Err. Estimate 5.61e+1 2.52e-2

t Statistic 1.08e+0 3.87e+1

3.5

Prob > t 0.030 0.000 12

Spanos

EE290H F05

Anova table and Residual Plot Source

Sum of Squares

Deg. of Freedom

Mean Squares

F-Ratio

Model

4.77e+6

1

4.77e+6

1.50e+3

Error

5.09e+4

16

3.18e+3

Total

4.82e+6

17

Prob>F 0.000

100 R e s 0 i d u a l s -100 1.0 Lecture 8: Regression Analysis

1.5

2.0 2.5 Dep time x 10^3

3.0

3.5 13

Spanos

EE290H F05

ANOVA Representation (xi,yi) y

(yi-yi)

(yi-η i) b(xi-x) (yi-η i)

(a-α)

β(xi-x) yi = a+b(xi-x)

η i = α+β(xi-x)

x

xi

x

Note differences between "true" and "estimated" model. Lecture 8: Regression Analysis

14

Spanos

EE290H F05

ANOVA Representation (cont)

( y i-ηi) =

(a- α )

+

(b- β ) ( x i- x ) +

( yi- y i)

Σ (y i-η i) 2 =

k(a- α ) 2 +

(b- β) 2 Σ (x i-x )+

Σ(y i-y i) 2

(k)

(1)

(1)

(k-2)

~ σ 2 χ 2 (k)

~ σ 2 χ 2 (1)

~ σ 2 χ 2 (1)

~ σ 2 χ 2 (k-2)

In In this this way, way, the the significance significance of of the the model model can can be be analyzed analyzedin indetail. detail.

Lecture 8: Regression Analysis

15

Spanos

EE290H F05

Confidence Limits of an Estimate

y0 = y+b(x0 -x ) 2

V(y0 ) = V(y)+(x0 -x ) V(b) 2

(x0 -x ) 2 V(y0 ) = 1 + s 2 n (x -x ) Σ prediction interval: y0 +/- t α V(y0 ) 2

Lecture 8: Regression Analysis

16

Spanos

EE290H F05

Confidence Interval of Prediction (all points) p L T 3000 O T h 2500 i c k n 2000 e s s 1500

1000 1000 1500 2000 Dep time Leverage Lecture 8: Regression Analysis

2500

3000 17

Spanos

EE290H F05

Confidence Interval of Prediction (half the points) L T 3000 O T h 2500 i c k n 2000 e s s 1500

1000 1000 1500 2000 Dep time Leverage Lecture 8: Regression Analysis

2500

3000 18

Spanos

EE290H F05

Confidence Interval of Prediction (1/4 of points) L T 3000 O T h 2500 i c k n 2000 e s s 1500

1000 1000 1500 2000 Dep time Leverage Lecture 8: Regression Analysis

2500

3000 19

Spanos

EE290H F05

Prediction Error vs Experimental Error Experimental Error Prediction error

y

Estimated Model ••Experimental ExperimentalError Error Does Doesnot notdepend depend on location on locationor or sample samplesize. size. ••Prediction PredictionError Error

True model

x Lecture 8: Regression Analysis

depends dependson on location location gets getssmaller smalleras as sample size sample size increases. increases. 20

Spanos

EE290H F05

Multivariate Regression η = β1x1 +β2x2 x2

R

y β2

The TheResidual Residualisis to toy ,,x1 ,,x2 ..

y

β1

x1

Coefficient Estimation: Σ (y-y)x1=0 Σ (y-y)x2=0

Σ yx1-b1Σ x21-b2Σ x1x2 = 0 Σ yx2-b2Σ x22-b1Σ x1x2 = 0 Lecture 8: Regression Analysis

21

Spanos

EE290H F05

Variance Estimation:

SR s2 = n-p V(b1) = 1 2 1-ρ V(b2) = 1 2 1-ρ

Lecture 8: Regression Analysis

s2 Σx21 s2 Σx22

ρ=

- Σ x 1 x2 Σ x21Σ x22

22

Spanos

EE290H F05

Thickness vs time, temp: y = a + b1 x1 + b2 x2

Variable Name Constant temp time min Lecture 8: Regression Analysis

Coefficient -7.04e+2 7.14e-1 8.69e-1

Std. Err. Estimate 7.18e+1 7.00e-2 3.89e-2

t Statistic -9.80e+0 1.02e+1 2.23e+1

Prob > t 0.000 0.000 0.000 23

Spanos

EE290H F05

Anova table and Correlation of Estimates

Source

Sum of Squares

Deg. of Freedom

Mean Squares

Model

2.58e+4

2

1.29e+4

Error

7.71e+2

18

4.28e+1

Total

2.66e+4

20

Data File: tox nm temp time min Lecture 8: Regression Analysis

regression Tox 1.000 0.410 0.896

Temp

0.410 1.000 0.000

F-Ratio 3.01e+2

Prob>F 0.000

Time

0.896 0.000 1.000 24

Spanos

EE290H F05

Multiple Regression in General

x1 x2

xn b = y + e

minimize Xb - y 2 = e 2 = ( y - Xb )T ( y - Xb ) or, min -e T Xb + e Ty

which is equiv. to: ( y - Xb )T Xb = 0 X T Xb = X T y

b = ( X T X ) -1 X T y

Lecture 8: Regression Analysis

V(b) = ( X T X ) -1 σ 2

25

Spanos

EE290H F05

Joint Confidence Region for x1 x2 p S = SR 1 + n-p Fα (p, n-p)

Σ β1- b1 2Σ Lecture 8: Regression Analysis

x12+2 β1- b1 β2- b2

Σ

x1x2+ β2- b2 2 Σ x22= S-SR 26

Spanos

EE290H F05

What if a “linear” model is not enough? 300

d e p r a t e

200

100

Variable Name Constant inlet temp Lecture 8: Regression Analysis

600

610

Coefficient -1.85e+3 3.24e+0

620 630 inlet temp

Std. Err. Estimate 4.64e+1 7.46e-2

640

t Statistic -3.99e+1 4.35e+1

650

Prob > t 0.000 0.000 27

Spanos

EE290H F05

ANOVA table and Residual Plot Sum of Squares

Source

Deg. of Freedom

Mean Squares

Model

3.65e+4

1

3.65e+4

Error

4.06e+2

21

1.93e+1

Total

3.69e+4

22

F-Ratio 1.89e+3

Prob>F 0.000

20 R 10 e s i 0 d u a -10 l s -20

Lecture 8: Regression Analysis

600

610

620 630 inlet temp

640

650 28

Spanos

EE290H F05

Multiple Regression with Replication S E= 1 2

2 (y -y ) i1 i2 Σ k

SLF =SR -S E

ni

Σ Σv

(y iv-ηi)2 =

i

k

Σ

ηi

i

k

k

k

(a-α)2Σ ηi + (b-β)2Σ ηi(x i-x)2 + Σ ηi(y i.-y i)2 + i

i

i

k

ni

Σ Σ v

(y iv-y i.)2

i

k

1

1

Σ

k-2

ηi-k

i

k

ni

Σ Σv i

k

(y iv-y)2 = Σ

Lecture 8: Regression Analysis

i

ni

Σv

k

(y iv-y i.)2 + Σ ηi(y i.-y i)2 + i

k

Σ

ηi(y-y i)2

i

29

Spanos

EE290H F05

Pure Error vs. Lack of Fit Example Lack Of Fit Source Lack Of Fit Pure Error Total Error

DF 17 4 21

Sum of Squares 401.01 4.49 405.50

Mean Square 23.59 1.12

F Ratio 21.04 Prob > F 0.005

Parameter Estimates Term Intercept inlet temp

Estimate -1850.16 3.24

Std Error 46.42 0.07

t Ratio -39.85 43.47

Prob>|t| 0.000 0.000

Model Test Source inlet temp Lecture 8: Regression Analysis

DF 1

Sum of Squares 36489.55

F Ratio 999.99

Prob > F 0.000 30

Spanos

EE290H F05

Dep. rate vs temperature: y = a + bx + cx2 300

d e p

200

r a t e

100

600

610

Variable Name

Coefficient

Constant inlet temp inlet temp ^2

8.34e+3 -2.94e+1 2.62e-2

Lecture 8: Regression Analysis

620 630 inlet temp Std. Err. Estimate 1.80e+3 5.74e+0 4.60e-3

640 t Statistic 4.66e+0 -5.13e+0 5.69e+0

650 Prob > t 0.000 0.000 0.000 31

Spanos

EE290H F05

Pure Error vs. Lack of Fit Example (cont) Lack Of Fit Source Lack Of Fit Pure Error Total Error

DF 16 4 20

Sum of Squares 150.24 4.49 154.73

Mean Square 9.39 1.12

F Ratio 8.37 Prob > F 0.026

Parameter Estimates Term Intercept inlet temp^1 inlet temp^2

Estimate 8339.05 -29.45 0.03

Std Error 1789.92 5.74 0.005

t Ratio 4.66 -5.13 5.69

Prob>|t| 0.0002 0.0001 0.0000

Model Test Source Poly(inlet temp,2) Lecture 8: Regression Analysis

DF 2

Sum of Squares 36740.32

F Ratio 999.99

Prob > F 0.0000 32

Spanos

EE290H F05

ANOVA table and Residual Plot Sum of Squares

Source

Deg. of Freedom

Mean Squares

Model

3.67e+4

2

1.84e+4

Error

1.55e+2

20

7.74e+0

Total

3.69e+4

22

F-Ratio 2.37e+3

Prob>F 0.000

6 R e s i d u a l s

4 2 0 -2 -4 -6

Lecture 8: Regression Analysis

600

610

620 630 inlet temp

640

650

33

Spanos

EE290H F05

Use regression line to predict LTO thickness... y = 60.352 + 0.97456 x R2 = 0.989

y = - 38.440 + 1.0153 x R2 = 0.989 4000

4000

3000 3000 2000 2000 1000

LTO Thick A 90%LimitLow 90%LimitHigh 1000

0 1000

2000 3000 4000 Dep Time Sec

Lecture 8: Regression Analysis

1000

2000 3000 4000 LTO Thick A 34

Spanos

EE290H F05

Response Surface Methodology • • • • • • •

Objectives: get a feel of I/O relationships find setting(s) that satisfy multiple constraints find settings that lead to optimum performance Observations: Function is nearly linear away from the peak Function is nearly quadratic at the peak

Lecture 8: Regression Analysis

35

Spanos

EE290H F05

Building the planar model A Factorial experiment with center points is enough to build and confirm a planar model.

b1, b2, b12 = -0.65 +/-0.75 b11+b22=1/4Σp+1/3Σc= -0.50 +/-1.15 Lecture 8: Regression Analysis

36

Spanos

EE290H F05

Quadratic Model and Confirmation Run Close to the peak, a quadratic model can be built and confirmed by an expanded two-phase experiment.

Lecture 8: Regression Analysis

37

Spanos

EE290H F05

Response Surface Methodology • RSM consists of creating models that lead to visual images of a response. The models are usually linear or quadratic in nature. • Either expanded factorial experiments, or regression analysis can be used. • All empirical models have a random prediction error. In RSM, the average variance of the model is: n

pσ2 1 V(y) = n Σ V(y i) = n i=1

• where “p” is the number of model parameters and “n” is the number of experiments. Lecture 8: Regression Analysis

38

Spanos

EE290H F05

Response Surface Exploration

Lecture 8: Regression Analysis

39

Spanos

EE290H F05

"Popular" RSM • • • • •

Use singe-stage Box-B or Box-W designs Use computer (simulated) experiments Rely on "goodness of fit" measures Automate model structure generation Problems?

Lecture 8: Regression Analysis

40

Related Documents

Regression Analysis
December 2019 9
Regression Analysis
June 2020 6
Regression Analysis
May 2020 6
Regression
November 2019 28