Optimized Curve Fitting
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
1
Case 1
Tabulated data (interest table, steam table etc.)
21/4/2006
Estimates are required at intermediate values from such tables
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
2
Case 2 Experimentation
Independent (predictor) variable X Dependent (response) variable Y Data available at discrete points or times Estimates are required at points between the discrete values (as it is impractical or expensive to actually measure them)
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
3
Case 3 Function substitution A implicit (complicated) function or program is known Results at all values are possible but time consuming
And further mathematical operations (integration, differentiation, maximum or minimum points ) is difficult
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
4
Case 4
Hypothesis testing
Alternative mathematical models are given Which is the best to use for a given situation
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
5
Solution Graphically represent data points
Develop a mathematical relation (curve fitting) which describe the relationship between variables Draw the curve for the developed mathematical relation which best represent the given data points
Use the mathematical relation or the curve
to estimate the intermediate values and extrapolation for further mathematical operations for hypothesis testing
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
6
Problem Sketch a line that visually conforms to the data
points And obtain the y value for x = 8
21/4/2006
i
1
2
3
4
5
x
2.10
6.22
7.17
10.5
13.7
y
2.90
3.83
5.98
5.71
7.74
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
7
Curve Fitting 10.00
8.00
8.00
6.00
6.00
y
y
10.00
4.00
4.00
2.00
2.00
0.00
0.00 0
2
4
6
8
10
12
14
16
0
2
4
6
x
8
10
12
14
16
10
12
14
16
x 10.00
8.00
8.00
6.00
6.00
y
y
10.00
4.00
4.00
2.00
2.00
0.00
0.00 0
2
4
6
8
10
12
14
16
0
2
x 21/4/2006
4
6
8 x
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
8
y
Curve Fitting
10.00 8.00 6.00 4.00 2.00 0.00 -2.00 0 -4.00
2
4
6
8
10
12
14
16
x
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
9
General observations Curves are dependent on subjective view point For the same data set, there should not be
different curves There may be error in reading the values from the graph
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
10
Curve Fitting Two methods - depending on error in data
Interpolation
100
Precise data Force through each data point
80
Temperature (deg F)
120
60
40
20
0
Regression
1
2
3
4
9
X values are accurate Y values are noisy (Experimental) Represent trend of the data Without matching individual points
8 7 6 5 4 3 2 1 0 0
2
4
6
8
10
12
14
x
21/4/2006
5
Time (s)
f(x)
0
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
11
16
Regression steps Y = A*exp(–X/X0) Model selection Describing Merit function for closeness-of-fit Compute values of the parameter of the model Interpretation of results & assessing goodness-
of-fit
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
12
Right Model selection Understanding of basic principle of the problem Model should represent the data trends
y = a0 + a1 x
Linear model
Polynomial model
Non-linear model
Exponential law
Power law
Logarithmic law
Gaussian law
Multiple variable
21/4/2006
y = b1x1 + b2x2 + ... + bnxn + c
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
13
Describing Merit Function Method of least squares Outliers & Weighing function 9
y = a0 + a1 x
8
y5
7 Data points
y3
6
e3
y4
f(x)
5 e2
4 y1
Residual e = y - (a 0 + a 1x )
y2
3 Regression Model y = a 0 + a 1x
2 1 0 0
2
4
6
8
10
12
14
16
x
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
14
Computing parameter values Obtain the parameters (a0 & a1) that minimizes the sum of squares of error between the data points and the line
y = a0 + a1 x
Linear regression Polynomial regression Multiple regression
y = b1x1 + b2x2 + ... + bnxn + c
Exponential law Power law Logarithmic law Can be solved explicitly
Non-linear regression Gaussian law
Iteratively solved using Levenberg-Marquardt algorithm 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
15
Goodness-of-fit Visual inspection Random distribution of residual around data points Correlation Coefficient r2 Standard error of parameters Confidence interval Prediction interval
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
16
Interpretation of results Curve fitting provides a correlation between the
variables It means that ‘X predicts Y’ not ‘X causes Y ‘ Parameter values must also make sense
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
17
Linear Regression Model selection Assume linear model Merit Function
y = a0 + a1 x
sum of square of residual error
yi = a0 + a1xi + ei
=
∑
7 Data points
y4
e2
4 y1
i= 1
3
n
2
i= 1
e3
5
ei2
∑ ( yi
y3
6
f(x)
n
y5
8
yi − a0 − a1xi
ei = Sr =
9
− a0 − a1xi )
2
Residual e = y - (a 0 + a 1x )
y2
Regression Model y = a 0 + a 1x
1 0 0
2
4
6
8
10
12
14
x
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
18
16
Linear Regression Parameter computation Find the values of a0 and a1 that minimize Sr Minimize Sr by equating derivatives WRT a0 and a1 to zero, First a0 ∂S r ∂ n 2 ( ) = y − a − a x ∑ i 0 1 i ∂a0 ∂a0 i=1 n ∂ [] 2 = ∑ i=1 ∂ a0 ∂ = ∑ 2[] [] i=1 ∂ a0 n
=
n
∑ 2[ y i=1
= 0 21/4/2006
i
− a0 − a1 xi ]( − 1 )
Finally n n na0 + ∑ xi a1 = ∑ yi i=1 i=1
a0 + x a1 = y Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
19
Linear Regression Second a1
∂S r ∂ n 2 ( ) = y − a − a x ∑ i 0 1 i ∂a1 ∂a1 i=1 n ∂ [] 2 = ∑ i=1 ∂a1 ∂ [] = ∑ 2[] i=1 ∂a1 n
=
n
∑ 2[ y i=1
= 0
21/4/2006
i
− a0 − a1 xi ]( − xi )
Finally
n n n 2 ∑ xi a0 + ∑ xi a1 = ∑ xi yi i=1 i=1 i=1 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
20
Linear Regression Equations
Solution
n n na0 + ∑ xi a1 = ∑ yi i=1 i=1
n n n 2 ∑ xi a0 + ∑ xi a1 = ∑ xi yi i=1 i=1 i=1
21/4/2006
n 2 1 n n 1 n ∑ yi ∑ xi − ∑ xi ∑ xi yi n i=1 i=1 n i=1 i=1 a0 = n 2 1 n 2 ∑ xi − ∑ xi n i=1 i=1 n
n 1 n ∑ xi yi − ∑ xi ∑ yi n i=1 i=1 i= 1 a1 = n 2 1 n 2 ∑ xi − ∑ xi n i=1 i=1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
21
Linear Regression Sum of squared values Variances & covariance
a1 a0 21/4/2006
a1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
22
Example i
xi
yi
1 2 3 4 5 Sum
2.10 6.22 7.17 10.50 13.70 39.69
2.90 3.83 5.98 5.71 7.74 26.16
n 2 1 n n 1 n ∑ yi ∑ xi − ∑ xi ∑ xi yi n i=1 i=1 n i=1 i=1 a0 = n 2 1 n 2 ∑ xi − ∑ xi n i=1 i=1 n
n 1 n ∑ xi yi − ∑ xi ∑ yi n i=1 i=1 i= 1 a1 = n 2 1 n 2 ∑ xi − ∑ xi n i=1 i=1 21/4/2006
xi
2
xi yi
4.41 38.69 51.41 110.25 187.69 392.45
6.09 23.82 42.88 59.96 106.04 238.78
yi
2
8.41 14.67 35.76 32.60 59.91 151.35
5
∑ xi = 39.69
i =1 5 2 ∑ xi = 392.3201 i =1 5 ∑ yi = 26.16 i =1 5 ∑ xi yi = 238.7416 i =1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
23
Example 5
1 1 (26.16)(392.3) − (39.69)(238.7) 5 i =1 a0 = 5 = 2.038 1 5 2 392.3 − [ 39.69] 2 ∑ xi = 392.3201 5 i =1 1 238 . 7 − (39.69)(26.16) 5 5 a1 = = 0.4023 ∑ yi = 26.16 1 i =1 392.3 − [ 39.69] 2 5 5 ∑ xi yi = 238.7416
∑ xi = 39.69
i =1
y = 2.038 + 0.4023x 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
24
Another Approach
[[Z]
[Z ] * { A} = {Y } T
]
{ A} = [[Z ]
21/4/2006
{
* [Z ] * { A} = [Z ] * {Y } T
* [Z ]
T
] * {[Z] −1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
T
* {Y }
}
} 25
Example 1 2 3 4 5
y = a0 + a1 x 2.10a1 + a0 = 2.90 6.22a1 + a0 = 3.83 7.17a1 + a0 = 5.98 10.50a1 + a0 = 5.71 13.70a1 + a0 = 7.74 21/4/2006
xi
yi
2.10 6.22 7.17 10.50 13.70
2.90 3.83 5.98 5.71 7.74
2.10 6.22 7.17 10.50 13.70
1 2.90 3.83 1 a 1 1 * = 5.98 a 0 1 5.71 7.74 1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
26
Example 2.10 6.22 2.10 6.22 7.17 10.50 13.70 1 * 7.17 1 1 1 1 10.50 13.70
392.45 39.69
1 2.90 3.83 1 a1 2.10 6.22 7.17 10.50 13.70 1* = * 5.98 a 1 1 1 1 1 5.71 1 0 7.74 1
39.69 a1 238.78 * = 5 a0 26.16
a1 0.012922 a = 0 - 0.10257
- 0.10257 238.78 * 1.014232 26.16
a1 0.4022 = 2.0395 a0
y = 2.038 + 0.4023x 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
27
Goodness-of-fit - I Visual inspection: Linear trend matching 10 8 y
6 4 2 0 0
2
4
6
8
10
12
14
16
x
y = 2.038 + 0.4023x 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
28
Goodness-of-fit - I Predicted values and e xi 2.10 6.22 7.17 10.50 13.70
yi
y
e = yi − y
2.90 3.83 5.98 5.71 7.74
2.88 4.54 4.92 6.26 7.55
0.02 -0.71 1.06 -0.55 0.19
y = 2.038 + 0.4023 x
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
29
Goodness-of-fit - I Visual inspection y
2.90 3.83 5.98 5.71 7.74
2.88 4.54 4.92 6.26 7.55
y = 2.038 + 0.4023 x
e = y i 8− y y predicted
yi
+18.5%
6 -17.5%
4 2 0 0
2
4
6
8
y m easured
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
30
Goodness-of-fit - II y
2.90 3.83 5.98 5.71 7.74
2.88 4.54 4.92 6.26 7.55
y = 2.038 + 0.4023x
Can be used to compare the mathematical models 21/4/2006
e = y i8 − y y predicted
yi
6
y = 0.8644x + 0.7097 R2 = 0.8644
4 2 0 0
2
4
6
8
y m easured
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
31
Goodness-of-fit - III Residual Analysis If a fitted equation is representative of the data then its
residuals should not form a pattern when residuals are plotted against values of experimental variables or the fitted values. Sometimes, a normal probability plot is used to see if the residuals form a pattern (the normal distribution is representative of random variation). These procedures allow us to investigate outliers, test assumptions, and fits.
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
32
Goodness-of-fit - III Residual Analysis
Residual plot shows a pattern, indicating that fitted equation is not representative of the data. 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
33
Goodness-of-fit - III Residual Plot: e vs y 1.5 1.0 e
0.5 0.0 -0.5 -1.0 0
1
2
3
4
5
6
7
8
y
There is no pattern, showing random distribution of e and indicating that fitted equation is representative of the data. 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
34
y
Goodness-of-fit - IV 9 8 7 6 5 4 3 2 1 0
St=SSyy
Sr=SSE SSR
0
2
4
6
8
10
12
14
16
Error reduction due to describing the data in terms of straight line rather than as an average value Sum of squares of residuals due to regression
SSR = St − Sr
x
Maximum possible residual Total sum of square of residuals between data point and the mean n
Syy = St = 21/4/2006
∑ i =1
(y i − y )2
Unexplained residual after linear reg. Sum of square of residuals between data point and the predicted y from the linear model
SSE = Sr =
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
n
∑ i =1
(yi − a0 − a1xi )2 35
Goodness-of-fit - IV Coefficient of Determination Fraction of total variation (residual) in y that is accounted for by the fitted equation sum of squares of residual due to regression 2 r = total sum of squares of residual
St − Sr r = St 2
For perfect fit, Sr=0
⇒
r2=1
For no improvement, Sr=St
⇒ r2=0
The magnitude of r2 is a measure of the relative strength of the linear association between x and y. 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
36
Goodness-of-fit – IV Correlation coefficient, r, assigns a signed number between -1 and 1 that is a measure of the strength of the relationship between the variables. r = 0 means there is no relationship between the variables r = 1 there is a perfect positive relationship between the variables; thus, the independent variable, y can be exactly predicted from the independent variable x, by the equation of a straight line. r = -1 there is a perfect negative relationship between the variables; again the independent variable y can be exactly predicted from the independent variable x, by the equation of a straight line. 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
37
Goodness-of-fit - IV The values of r are never exactly 1 or -1 Positive r If x gets larger, y also increases.
Negative r The variables are inversely related. As x get larger, y decreases or as x decreases, y increases.
Just because r is close to 1 does not mean that fit is necessarily good To confirm, always inspect a plot of the data along with the regression line 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
38
Goodness-of-fit - IV
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
39
Goodness-of-fit - IV Spread of dependent variable
Around the mean of dependent variable y = 5.232 10 8 y
6 4 2 0 0
2
4
6
8
10
12
14
16
x
Total sum of square of residuals Standard deviation between data point and the mean variation
Syy = St = 21/4/2006
n
∑ (y i =1
i
− y)
2
St sy = n −1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
Coeff of
c.v . =
Sy y 40
Goodness-of-fit - IV Spread of dependent variable Around the mean of dependent variable 1 2 3 4 5 Sum
y
yi
yi − y
2.9 3.83 5.98 5.71 7.74 26.16 5.232
-2.33 -1.40 0.75 0.48 2.51 St Sy
(y i − y ) 2 5.44 1.97 0.56 0.23 6.29 14.48 1.90
CV
Total sum of square of residuals Sample standard deviation sy between data point and the mean
St = 21/4/2006
n
∑ (y i =1
i
− y)
2
Coefficient of variation Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
0.364 St = n −1
c.v . =
Sy y
41
Goodness-of-fit - IV Spread of dependent variable Around the linear regression 10 8 y
6 4 2 0 0
2
4
6
8
10
12
14
16
x
Total sum of square of residuals between measured y and the y calculated with the linear model
SSE = Sr = 21/4/2006
n
∑ (y i =1
i
− a0 − a1xi )
Standard error of estimate
2
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
sy / x
Sr = n −2 42
Goodness-of-fit - IV Spread of dependent variable Around the linear regression yi
y
2.90 3.83 5.98 5.71 7.74
2.88 4.54 4.92 6.26 7.55
e = y i − y (y i − y ) 2 0.02 -0.71 1.06 -0.55 0.19 Sr S y/x
Total sum of square of residuals between measured y and the y calculated with the linear model
Sr =
n
∑ (y i =1
21/4/2006
i
− a0 − a1xi )
2
0.00 0.51 1.12 0.31 0.04 1.96 0.81
Standard error of estimate
sy / x
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
Sr = n −2 43
Goodness-of-fit - IV St = 14.48 Sr = 1.96
Coefficient of Determination St − Sr r = St 2
=0.864
Correlation Coefficient r = 0.93
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
44
Goodness of fit V Standard error of estimate
sy / x
Sr = n −2
= 0.81
Standard error in a0 and a1 a0
= 0.81492
a1
= 0.09198 cv(a0) = 0.81492/ 2.038 = 0.40 cv(a1) = 0.09198/ 0.4023 = 0.23
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
45
Goodness of fit VI If measurements are normally distributed The range y + S y to y − S y will encompass approximately 68% of the measurement The range y + 2S y to y − 2S y will encompass approximately 95% of the measurement It is possible to define an interval within which measurement is likely to fall with certain confidence (probability)
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
46
Goodness of fit VI Confidence Interval Range around estimated parameters within which the true value of parameter is expected to lie with a given probability The probability that the true mean of y, µ, falls within the bound from L to U is 1-α. where α is significance level
L = ai − S (ai )tα / 2,n −1 U = ai + S (ai )tα / 2,n −1
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
47
Regression Plot
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
48
Polynomial Regression • Minimize the residual between the data points
and the curve -- least-squares regression Linear
yi = a0 + a1xi
Quadratic
yi = a0 + a1 xi + a2 xi2
Cubic
yi = a0 + a1 xi + a2 xi2 + a3 xi3
General
yi = a0 + a1 xi + a2 xi2 + a3 xi3 + + am xim
Must find values of a0 , a1, a2, … am 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
49
Polynomial Regression Residual
ei = yi − (a0 + a1 xi + a2 xi2 + a3 xi3 + + am xim )
Sum of squared residuals n 2 n S r = ∑ ei = ∑ [ y − (a0 + a1x + a2 x 2 + a3 x 3 + + am x m )]2 i=1 i=1
• Minimize by taking derivatives
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
50
Polynomial Regression Normal Equations n n ∑x i=1 i n ∑ xi2 i=1 n m ∑ xi i=1
21/4/2006
n
n
i=1
2 x ∑ i i=1 n 3 ∑ xi i=1 n 4 x ∑ i i=1
n
n
∑ xi
i=1 n
2
∑ xi
i=1 n
3 x ∑ i
m +1 ∑ xi i=1
m+ 2
∑ xi
i=1
m ∑ xi i=1 a0 n m +1 ∑ xi a1 i=1 n m+ 2 a2 ∑ xi i=1
n ∑ yi ni=1 ∑x y i=1 i i = n 2 ∑ xi yi i=1 a m n n m 2m ∑ xi ∑ xi yi i=1 i=1 n
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
51
Polynomial Regression Solution
[[Z]
[Z ] * { A} = {Y } T
]
{ A} = [[Z ]
21/4/2006
{
* [Z ] * { A} = [Z ] * {Y } T
* [Z ]
T
] * {[Z] −1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
T
* {Y }
}
} 52
Example x
0
1.0
1.5
2.3
2.5
4.0
5.1
6.0
6.5
7.0
8.1
9.0
y
0.2
0.8
2.5
2.5
3.5
4.3
3.0
5.0
3.5
2.4
1.3
2.0
x
9.3
11.0 11.3 12.1 13.1 14.0 15.5 16.0 17.5 17.8 19.0 20.0
y
-0.3
-1.3
-3.0
-4.0 n n ∑x i=1 i n 2 ∑ xi i=1 n 3 ∑ xi i=1
-4.9 n
∑ xi
i=1 n
2
∑ xi
i=1 n
3
∑ xi
i=1 n
4
∑ xi
i=1
-4.0 n
2
∑ xi
i=1 n
3
∑ xi
i=1 n
4
∑ xi
i=1 n
5
∑ xi
i=1
-5.2
-3.0
-3.5
-1.6
-1.4
-0.1
n n 3 ∑ xi ∑ yi i=1 ni=1 a n 0 4 ∑x y ∑ xi a i i i=1 1 = i=1 n 2 n 5 a2 ∑ xi yi ∑ xi i=1 a3 i=1 n n 6 3 x yi ∑ xi ∑ i i=1 i=1
229.6 3060.2 46342.8 a0 24 − 1.30 229.6 − 316.9 3060.2 46342.8 752835.2 a1 = 752835.2 12780147.7 a2 3060.2 46342.8 − 6037.2 46342.8 752835.2 12780147.7 223518116.8 a − 9943.36 3 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
53
Example a0 − 0.3593 a 2.3051 1 = a 2 − 0.3532 a 0.0121 3
Regression Equation y = - 0.359 + 2.305x - 0.353x2 + 0.012x3
6
4
f(x)
2
0
-2
-4
-6 0
21/4/2006
5
10
15
Anuj Jain, Astt Prof, AMD, x MNNIT, Allahabad
20
25
54
Exponential function If relationship is an exponential function bx
y = ae
To make it linear, take logarithm of both sides
ln (y) = ln (a) + bx Now it’s a linear relation between ln(y) and x Linear regression gives
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
55
Exponential function Greater weights to small y values Better to minimize weighted function
Linear regression gives
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
56
Exponential function
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
57
Power Function If relationship is a power function
y = ax
b
To make linear, take logarithm of both sides
ln (y) = ln (a) + b ln (x) Now it’s a linear relation between ln(y) and ln(x)
ln(a) 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
58
Power Function x
y
X=Log(x) Y=Log(y)
1.2
2.1
0.18
0.74
2.8
11.5
1.03
2.44
4.3
28.1
1.46
3.34
5.4
41.9
1.69
3.74
6.8
72.3
1.92
4.28
7.9
91.4
2.07
4.52
2.5
100 90
2
80 70 Y=Log(y)
y
60 50
1.5
1
40 30
0.5
20 10
0
0
0
0
1
2
3
4
5
6
7
8
0.1
0.2
9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
X=Log(x)
x
x vs y 21/4/2006
X=Log(x) vs Y=log(y) Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
59
1
Power Function Using the X’s and Y’s, not the original x’s and y’s n n ∑ Xi i=1
n ∑ Xi ∑ Yi a i=1 = i=1 n 2 B n X Y ∑ Xi ∑ i i i=1 i=1 n
6 8.34 a 19.1 8.34 14.0 B = 31.4 21/4/2006
5
5
∑ X i = ∑ ln (xi ) = 8.34
i =1 i =1 5 2 5 2 ∑ X i = ∑ ln (xi ) = 14.0 i =1 i =1 5 5 ∑ Yi = ∑ ln (yi ) = 19.1 i =1 i =1 5 5 ∑ X iYi = ∑ ln (xi ) ln (yi ) = 31.4 i =1 i =1
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
60
Power Function Example – Carbon Adsorption q = pollutant mass sorbed per carbon mass C = concentration of pollutant in solution, K = coefficient n = measure of the energy of the reaction
q = K ( c)
21/4/2006
n
log10 q = log10 K + n log10 c
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
61
Power Function Logarithmic axes: logK3 = 1.8733, K = 101.6733 = 74.696, n = 0.2289 2.5
Y=Log(q)
2
1.5
log10 q = log10 K + n log10 c
1
0.5
0 0
0.5
1
1.5
2
2.5
3
X=Log(c) 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
62
Power Function Arithmetic axes: K = 74.702, and n = 0.2289 350
300
250
q
200
q = K ( c) n
150
100
50
0 0 21/4/2006
100
200
300
400
C Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
500
600 63
Nonlinear Relation Define for initial guess of λs Obtain dλ s needed to reduce dβ s to zero
In concise matrix form ATdβ = (ATA)dλ dλ =(ATA)-1(ATdβ ) 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
64
Nonlinear Relation Gaussian function
In matrix form
ATdβ = (ATA)dλ dλ =(ATA)-1(ATdβ ) 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
65
Nonlinear Relation (A, x0, σ) Initial guess (0.8, 15, 4) Converged values (1.03, 20.14, 4.86) Actual values (1, 20, 5)
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
66
Software Although method involves sophisticated
mathematics, a typical software requires initialization of model and parameters and pressing a button to provide the results with the statistical values No software can pick a model- it can only help in differentiating between models Better programs allow users to specify their own function
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
67
EXCEL functions ToolsData AnalysisRegression FORM
Input X range Input Y range Labels (column heading) Constant is zero Confidence level Output range Residuals & standardized residual Residual plots Line fit plot Normal probability plot
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
68
EXCEL functions SUMMARY OUTPUT
Regression Statistics Multiple R 0.929710151 R Square 0.864360965 Adjusted R Square 0.819147953 Standard Error 0.809178231 Observations 5 ANOVA df Regression Residual Total
1 3 4
SS 12.51757177 1.964308227 14.48188
Coefficients Standard Error 2.039476493 0.814915963 0.402182352 0.091982912
Intercept X Variable 1
MS F Significance F 12.51757177 19.11752687 0.022132988 0.654769409
t Stat P-value 2.502683204 0.087499474 4.372359417 0.022132988
RESIDUAL OUTPUT Observation 1 2 3 4 5 21/4/2006
Lower 95% -0.553952235 0.109451398 0
Upper 95% Lower 95.0% Upper 95.0% 4.632905221 -0.553952235 4.632905221 0.694913305 0.109451398 0.694913305 4.078952986
PROBABILITY OUTPUT Predicted Y 2.884059431 4.54105072 4.923123954 6.262391185 7.54937471
Residuals Standard Residuals 0.015940569 0.022747255 -0.71105072 -1.014672192 1.056876046 1.508166301 -0.552391185 -0.788264407 0.19062529 0.272023044
Percentile
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
Y 10 30 50 70 90
2.9 3.83 5.71 5.98 7.74 69
EXCEL functions Residuals
X Variable 1 Residual Plot 2 1 0 -1 0
2
4
6
8
10
12
14
16
X Variable 1
X Variable 1 Line Fit Plot
Y
10
Y
5
Predicted Y
0 0
2
4
6
8
10
12
14
16
X Variable 1
Normal Probability Plot
Y
10 5 0 0
20
40
60
80
100
Sample Percentile
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
70
EXCEL functions INTERCEPT(Xdata, Ydata) intercept with y axis of
best fit straight line SLOPE(Xdata, Ydata) intercept with y axis of best fit straight line LINEST(Xdata, Ydata, stats) best fit straight line TREND(Xdata, Ydata, newXdata,const)y values along the linear trend LOGEST(Xdata, Ydata, stats) best fit exponential line CORREL(array1, array2) correlation coefficient PEARSON(array1, array2) P correlation coefficient RSQ(array1, array2) square of P correlation coefficient
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
71
EXCEL functions DEVSQ(array) sum of squares of data points
about the sample mean STEYX(Xdata, Ydata) standard error of predicted y for each x in regression TINV(probability, dofl) student’s t-distribution CONFIDENCE(alpha, std dev, size) confidence interval for population mean CHITEST(actual range, expected range) test for independence FTEST(array1, array2) variance in arrays are not significantly different
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
72
Example 1
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
73
Example 1
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
74
Example 2 Mh A CA D C ρ s
S.No.
v g DC 2 ci
a3
dp DC
a4
Correlation
1.
Mh A CA D C ρ s
2.
Mh A CA D C ρ s
3.
21/4/2006
m = a 1 s ma
a2
SSE
ms = 0.0129 m a
v =0.0014 gD C
Mh A CA D C ρ s
2 Ci
dp = 0.04 D C
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
0.70
0.0003105
0.24
0.0011567
−0.36
0.0009327
75
Example 2
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
76
Example 2 S.No.
1.
2.
Correlation
Mh A CA D C ρ s
ms 0 . 013 = Mh ma A D ρ CA C s Mh A CA D C ρ s
21/4/2006
= 0.00096 m s m a
m 0.00069 s ma
SSE
2 v Ci g DC
1.00
0.70
dp DC
1.02
v g DC
2 Ci
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
0.54
0.00001030
0.004
0.00031046
0.54
−0.055
dp 0.00000892 DC 77
Example 2
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
78
Example 2 S.No .
Parameter s in Eq. (5.4)
Value of the parameter
Standard error of the parameter
Coefficient of variation of the parameter (%)
1.
a1
9.5739E-4
0.5185E-4
5.416
2.
a2
1.0046
0.0142
1.412
3.
a3
0.5365
0.0105
1.958
R2 = 0.987
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
79
Example 3
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
80
Example 3
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
81
Example 3
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
82
Example 3
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
83
Thanks
21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
84
Example Often it is difficult to determine which model is best simply by looking at the scatter plot. In these cases, one should find the regression equations for the most appropriate 2 or 3 models and then plot the data and graph each of the regression models in the same viewing window. Decide which model is the best fit by determining which one contains more of the data points. 21/4/2006
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad
85