Spanos
EE290H F05
Regression Analysis
Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error
Lecture 8: Regression Analysis
1
Spanos
EE290H F05
Regression Analysis
• In general, we "fit" a model by minimizing a metric that represents the error. n
min Σ (yi - yi)2 i=1
• The sum of squares gives closed form solutions and minimum variance for linear models. Lecture 8: Regression Analysis
2
Spanos
EE290H F05
The Simplest Regression Model
Line through the origin:
y=bx
y
x
yu =βxu +ε u
u=1,2,...,n
n
min S = min Σ (yu - βxu )2 :
ε u ~N(0, σ 2R ) estimate of σ 2R
u=1
y=bx
η u =βxu
b: estimate of β y: estimate of η u , the true value of the model. Lecture 8: Regression Analysis
3
EE290H F05
Using the Normal Equation to fit “line through the origin” model
Our model only has one degree of freedom This is why our choices are confined on this line…
y2
y=bx (1 d.f.)
Lecture 8: Regression Analysis
Spanos
min Σ (y-y)2
y
y1
4
EE290H F05
Using the Normal Equation (cont) (fitting “line through the origin” model)
Spanos
Choose b so that the residual vector is perpendicular to the model vector...
Σ
(y-y)⋅ x = 0 ⇒ Σ (y - bx)⋅ x = 0 ⇒
Σ xy b= (est. of β) s2= SR (est. of σ2R) n-1 Σ x2 2 s V(b) = Σ x2
67% conf: b ±
s2 Σ x2
*
Significance test:
Lecture 8: Regression Analysis
b-β t = ~ tn-1 2 s Σ x2 5
Spanos
EE290H F05
Etch time vs. removed material: y = bx 500 400 R e m 300 o v e d 200 ( n m 100 ) 0
0.0
Variable Name Etch Time (sec) Lecture 8: Regression Analysis
0.2
0.4 0.6 0.8 Etch Time (sec) x 10^3
Coefficient
Std. Err. Estimate
t Statistic
0.501
0.0162
30.9
1.0
Prob > t
0.000 6
Spanos
EE290H F05
Model Validation through ANOVA The idea is to decompose the sum of squares into orthogonal components. Assuming that there is no need for a model at all* (always a good null Hypothesis!): H0: β*=0
Σ y2u =
Σ y2u +
Σ (yu - yu)2
n
p
n-p
total
model
residual
* This is equivalent to saying that y~N(μ,σ2), where μ and σ are constants, independent of x. Lecture 8: Regression Analysis
7
Spanos
EE290H F05
Model Validation through ANOVA (cont) Assuming a specific model: H0 : β * = b
Σ (yu - β*xu)2 =
Σ (yu - β*xu)2 +
Σ (yu - yu)2
n
p
n-p
total
model
residual
The TheANOVA ANOVAtable tablewill willanswer answerthe thequestion: question: Is Isthere thereaarelationship relationshipbetween between xxand and y? y? Lecture 8: Regression Analysis
8
Spanos
EE290H F05
ANOVA table and Residual Plot Sum of Squares
Source
Deg. of Freedom
Mean Squares
Model
1.83e+5
1
1.83e+5
Error
6.47e+3
7
9.24e+2
Total
1.89e+5
8
R e s i d u a l s
F-Ratio 1.98e+2
Prob>F 2.17e-6
60 40 20 0 -20 -40 -60 0.0
Lecture 8: Regression Analysis
0.2
0.4 0.6 0.8 Etch Time (sec) x 10^3
1.0 9
Spanos
EE290H F05
A More Complex Regression Equation - a straight line with two parameters
actual
estimated
η = α + β (x - x )
y = a + b (x - x )
yi~ N (ηi, σ2) Minimize R =Σ (yi-yi)2 to estimate α and β Σ(xi-x)yi Σ(xi-x)(yi-y) a=y b= = 2 Σ(xi-x)2 Σ(xi-x) Are a and b good estimators of α and β? Σ(xi-x)E[yi] E[a] = α E[b] = =β 2 Σ(xi-x) Lecture 8: Regression Analysis
10
Spanos
EE290H F05
Variance Estimation:
Note that all variability comes from yi! 2 Σ yi 1 σ V[a] = V = 2 Σ V[ y i] = k k k
V[b] = V
Lecture 8: Regression Analysis
Σ (x i-x)y i Σ (x i-x) 2
2 σ = Σ (x i-x) 2
min minvar. var. thanks thanksto to least least squares! squares!
11
Spanos
EE290H F05
LTO thickness vs deposition time: y = a + bx L T 4 O t h i 3 c k A x
2
1 0 ^ 1 3 1.0
1.5
Variable Name
Coefficient
Constant Dep time
6.04e+1 9.75e-1
Lecture 8: Regression Analysis
2.0 2.5 3.0 Dep time x 10^3 Std. Err. Estimate 5.61e+1 2.52e-2
t Statistic 1.08e+0 3.87e+1
3.5
Prob > t 0.030 0.000 12
Spanos
EE290H F05
Anova table and Residual Plot Source
Sum of Squares
Deg. of Freedom
Mean Squares
F-Ratio
Model
4.77e+6
1
4.77e+6
1.50e+3
Error
5.09e+4
16
3.18e+3
Total
4.82e+6
17
Prob>F 0.000
100 R e s 0 i d u a l s -100 1.0 Lecture 8: Regression Analysis
1.5
2.0 2.5 Dep time x 10^3
3.0
3.5 13
Spanos
EE290H F05
ANOVA Representation (xi,yi) y
(yi-yi)
(yi-η i) b(xi-x) (yi-η i)
(a-α)
β(xi-x) yi = a+b(xi-x)
η i = α+β(xi-x)
x
xi
x
Note differences between "true" and "estimated" model. Lecture 8: Regression Analysis
14
Spanos
EE290H F05
ANOVA Representation (cont)
( y i-ηi) =
(a- α )
+
(b- β ) ( x i- x ) +
( yi- y i)
Σ (y i-η i) 2 =
k(a- α ) 2 +
(b- β) 2 Σ (x i-x )+
Σ(y i-y i) 2
(k)
(1)
(1)
(k-2)
~ σ 2 χ 2 (k)
~ σ 2 χ 2 (1)
~ σ 2 χ 2 (1)
~ σ 2 χ 2 (k-2)
In In this this way, way, the the significance significance of of the the model model can can be be analyzed analyzedin indetail. detail.
Lecture 8: Regression Analysis
15
Spanos
EE290H F05
Confidence Limits of an Estimate
y0 = y+b(x0 -x ) 2
V(y0 ) = V(y)+(x0 -x ) V(b) 2
(x0 -x ) 2 V(y0 ) = 1 + s 2 n (x -x ) Σ prediction interval: y0 +/- t α V(y0 ) 2
Lecture 8: Regression Analysis
16
Spanos
EE290H F05
Confidence Interval of Prediction (all points) p L T 3000 O T h 2500 i c k n 2000 e s s 1500
1000 1000 1500 2000 Dep time Leverage Lecture 8: Regression Analysis
2500
3000 17
Spanos
EE290H F05
Confidence Interval of Prediction (half the points) L T 3000 O T h 2500 i c k n 2000 e s s 1500
1000 1000 1500 2000 Dep time Leverage Lecture 8: Regression Analysis
2500
3000 18
Spanos
EE290H F05
Confidence Interval of Prediction (1/4 of points) L T 3000 O T h 2500 i c k n 2000 e s s 1500
1000 1000 1500 2000 Dep time Leverage Lecture 8: Regression Analysis
2500
3000 19
Spanos
EE290H F05
Prediction Error vs Experimental Error Experimental Error Prediction error
y
Estimated Model ••Experimental ExperimentalError Error Does Doesnot notdepend depend on location on locationor or sample samplesize. size. ••Prediction PredictionError Error
True model
x Lecture 8: Regression Analysis
depends dependson on location location gets getssmaller smalleras as sample size sample size increases. increases. 20
Spanos
EE290H F05
Multivariate Regression η = β1x1 +β2x2 x2
R
y β2
The TheResidual Residualisis to toy ,,x1 ,,x2 ..
y
β1
x1
Coefficient Estimation: Σ (y-y)x1=0 Σ (y-y)x2=0
Σ yx1-b1Σ x21-b2Σ x1x2 = 0 Σ yx2-b2Σ x22-b1Σ x1x2 = 0 Lecture 8: Regression Analysis
21
Spanos
EE290H F05
Variance Estimation:
SR s2 = n-p V(b1) = 1 2 1-ρ V(b2) = 1 2 1-ρ
Lecture 8: Regression Analysis
s2 Σx21 s2 Σx22
ρ=
- Σ x 1 x2 Σ x21Σ x22
22
Spanos
EE290H F05
Thickness vs time, temp: y = a + b1 x1 + b2 x2
Variable Name Constant temp time min Lecture 8: Regression Analysis
Coefficient -7.04e+2 7.14e-1 8.69e-1
Std. Err. Estimate 7.18e+1 7.00e-2 3.89e-2
t Statistic -9.80e+0 1.02e+1 2.23e+1
Prob > t 0.000 0.000 0.000 23
Spanos
EE290H F05
Anova table and Correlation of Estimates
Source
Sum of Squares
Deg. of Freedom
Mean Squares
Model
2.58e+4
2
1.29e+4
Error
7.71e+2
18
4.28e+1
Total
2.66e+4
20
Data File: tox nm temp time min Lecture 8: Regression Analysis
regression Tox 1.000 0.410 0.896
Temp
0.410 1.000 0.000
F-Ratio 3.01e+2
Prob>F 0.000
Time
0.896 0.000 1.000 24
Spanos
EE290H F05
Multiple Regression in General
x1 x2
xn b = y + e
minimize Xb - y 2 = e 2 = ( y - Xb )T ( y - Xb ) or, min -e T Xb + e Ty
which is equiv. to: ( y - Xb )T Xb = 0 X T Xb = X T y
b = ( X T X ) -1 X T y
Lecture 8: Regression Analysis
V(b) = ( X T X ) -1 σ 2
25
Spanos
EE290H F05
Joint Confidence Region for x1 x2 p S = SR 1 + n-p Fα (p, n-p)
Σ β1- b1 2Σ Lecture 8: Regression Analysis
x12+2 β1- b1 β2- b2
Σ
x1x2+ β2- b2 2 Σ x22= S-SR 26
Spanos
EE290H F05
What if a “linear” model is not enough? 300
d e p r a t e
200
100
Variable Name Constant inlet temp Lecture 8: Regression Analysis
600
610
Coefficient -1.85e+3 3.24e+0
620 630 inlet temp
Std. Err. Estimate 4.64e+1 7.46e-2
640
t Statistic -3.99e+1 4.35e+1
650
Prob > t 0.000 0.000 27
Spanos
EE290H F05
ANOVA table and Residual Plot Sum of Squares
Source
Deg. of Freedom
Mean Squares
Model
3.65e+4
1
3.65e+4
Error
4.06e+2
21
1.93e+1
Total
3.69e+4
22
F-Ratio 1.89e+3
Prob>F 0.000
20 R 10 e s i 0 d u a -10 l s -20
Lecture 8: Regression Analysis
600
610
620 630 inlet temp
640
650 28
Spanos
EE290H F05
Multiple Regression with Replication S E= 1 2
2 (y -y ) i1 i2 Σ k
SLF =SR -S E
ni
Σ Σv
(y iv-ηi)2 =
i
k
Σ
ηi
i
k
k
k
(a-α)2Σ ηi + (b-β)2Σ ηi(x i-x)2 + Σ ηi(y i.-y i)2 + i
i
i
k
ni
Σ Σ v
(y iv-y i.)2
i
k
1
1
Σ
k-2
ηi-k
i
k
ni
Σ Σv i
k
(y iv-y)2 = Σ
Lecture 8: Regression Analysis
i
ni
Σv
k
(y iv-y i.)2 + Σ ηi(y i.-y i)2 + i
k
Σ
ηi(y-y i)2
i
29
Spanos
EE290H F05
Pure Error vs. Lack of Fit Example Lack Of Fit Source Lack Of Fit Pure Error Total Error
DF 17 4 21
Sum of Squares 401.01 4.49 405.50
Mean Square 23.59 1.12
F Ratio 21.04 Prob > F 0.005
Parameter Estimates Term Intercept inlet temp
Estimate -1850.16 3.24
Std Error 46.42 0.07
t Ratio -39.85 43.47
Prob>|t| 0.000 0.000
Model Test Source inlet temp Lecture 8: Regression Analysis
DF 1
Sum of Squares 36489.55
F Ratio 999.99
Prob > F 0.000 30
Spanos
EE290H F05
Dep. rate vs temperature: y = a + bx + cx2 300
d e p
200
r a t e
100
600
610
Variable Name
Coefficient
Constant inlet temp inlet temp ^2
8.34e+3 -2.94e+1 2.62e-2
Lecture 8: Regression Analysis
620 630 inlet temp Std. Err. Estimate 1.80e+3 5.74e+0 4.60e-3
640 t Statistic 4.66e+0 -5.13e+0 5.69e+0
650 Prob > t 0.000 0.000 0.000 31
Spanos
EE290H F05
Pure Error vs. Lack of Fit Example (cont) Lack Of Fit Source Lack Of Fit Pure Error Total Error
DF 16 4 20
Sum of Squares 150.24 4.49 154.73
Mean Square 9.39 1.12
F Ratio 8.37 Prob > F 0.026
Parameter Estimates Term Intercept inlet temp^1 inlet temp^2
Estimate 8339.05 -29.45 0.03
Std Error 1789.92 5.74 0.005
t Ratio 4.66 -5.13 5.69
Prob>|t| 0.0002 0.0001 0.0000
Model Test Source Poly(inlet temp,2) Lecture 8: Regression Analysis
DF 2
Sum of Squares 36740.32
F Ratio 999.99
Prob > F 0.0000 32
Spanos
EE290H F05
ANOVA table and Residual Plot Sum of Squares
Source
Deg. of Freedom
Mean Squares
Model
3.67e+4
2
1.84e+4
Error
1.55e+2
20
7.74e+0
Total
3.69e+4
22
F-Ratio 2.37e+3
Prob>F 0.000
6 R e s i d u a l s
4 2 0 -2 -4 -6
Lecture 8: Regression Analysis
600
610
620 630 inlet temp
640
650
33
Spanos
EE290H F05
Use regression line to predict LTO thickness... y = 60.352 + 0.97456 x R2 = 0.989
y = - 38.440 + 1.0153 x R2 = 0.989 4000
4000
3000 3000 2000 2000 1000
LTO Thick A 90%LimitLow 90%LimitHigh 1000
0 1000
2000 3000 4000 Dep Time Sec
Lecture 8: Regression Analysis
1000
2000 3000 4000 LTO Thick A 34
Spanos
EE290H F05
Response Surface Methodology • • • • • • •
Objectives: get a feel of I/O relationships find setting(s) that satisfy multiple constraints find settings that lead to optimum performance Observations: Function is nearly linear away from the peak Function is nearly quadratic at the peak
Lecture 8: Regression Analysis
35
Spanos
EE290H F05
Building the planar model A Factorial experiment with center points is enough to build and confirm a planar model.
b1, b2, b12 = -0.65 +/-0.75 b11+b22=1/4Σp+1/3Σc= -0.50 +/-1.15 Lecture 8: Regression Analysis
36
Spanos
EE290H F05
Quadratic Model and Confirmation Run Close to the peak, a quadratic model can be built and confirmed by an expanded two-phase experiment.
Lecture 8: Regression Analysis
37
Spanos
EE290H F05
Response Surface Methodology • RSM consists of creating models that lead to visual images of a response. The models are usually linear or quadratic in nature. • Either expanded factorial experiments, or regression analysis can be used. • All empirical models have a random prediction error. In RSM, the average variance of the model is: n
pσ2 1 V(y) = n Σ V(y i) = n i=1
• where “p” is the number of model parameters and “n” is the number of experiments. Lecture 8: Regression Analysis
38
Spanos
EE290H F05
Response Surface Exploration
Lecture 8: Regression Analysis
39
Spanos
EE290H F05
"Popular" RSM • • • • •
Use singe-stage Box-B or Box-W designs Use computer (simulated) experiments Rely on "goodness of fit" measures Automate model structure generation Problems?
Lecture 8: Regression Analysis
40