Probe

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Probe as PDF for free.

More details

  • Words: 1,563
  • Pages: 6
REGRESSION AND CORRELATION Regression A major objective of many statistical investigations is to establish relationships that make it possible to predict one or more independent variables in terms of others. Thus studies are made to predict the potential sales of a new product in terms of he money spent on advertising, the patient’s weight in terms of the number of weeks he/she has been on a diet, the marks obtained by a student in terms of the number of classes he attended, etc. Although it is desirable to predict the quantity exactly in terms of the others, this is seldom possible and in most cases, we have to be satisfied with predicting average or expected values. Thus we would like to predict the average sales in terms of the money spent on advertising, the average income of a college student in terms of the number of years he/she has been out of the college. Thus given two random variables, X, Y and given that X takes th value x, the basic problem of bivariate regression is to determine the conditional expected value E(Y|x) as a function of x. In most cases, we may find that E(Y|x) is a linear function of x: E(Y|x) = α + βx, where the constants α , β are called the regression coefficients. Denoting E(X) = µ1, E(Y) = µ2,

Var (X ) = σ1,

Var (Y ) = σ2, cov(X,Y) = σ12, ρ =

σ 12 , we can show: σ 1σ 2 Theorem: (a) If the regression of Y on X is linear, then E(Y|x) = µ2 + ρ

σ2 (x -µ1) σ1

(b) If the regression of X on Y is linear, then E(X|y) = µ1 + ρ

σ1 (y -µ2) σ2

Note: ρ is called the correlation coefficient between X and Y. In actual situations, we have to “estimate” the regression coefficients α , β from a random sample { (x1,y1), (x2, y2), … (xn, yn)} of size n from the 2-dimensional random variable (X, Y). We now “fit” a straight line y = a + bx for the above data by the method of ”least 164

squares”. The method of least squares says that choose constants a and b for which the sum of the squares of the “vertical deviations” of the sample points (xi, yi) from the line y = a+bx is a minimum. I.e. find a, b so that T =

n i =1

[ y i − (a + bxi )] 2 is a minimum. Using ∂T ∂T = 0 and = 0. Thus we get ∂a ∂b

2-variable calculus, we should determine a, b so that n

the following two equations

(−2) [yi – (a + bxi)] = 0 and

i =1

n

( -2xi) [yi – (a + bxi)] = 0.

i =1

Simplifying, we get the so called “normal equations”: na + (

n i =1

(

n i =1

xi )b =

xi )a + (

n i =1

n i =1

yi

xi2 )b = (

n( Solving we get

b=

n i =1

n i =1

xi y i )

xi y i ) − (

n(

n i =1

n i =1

x )−( 2 i

xi ) ( n i =1

n i =1

xi )

yi )

n

( ; a=

i =1

2

yi ) − ( n

n i =1

xi ) b .

These constants a and b are used to estimate the unknown regression coefficients α , β. Now if x = xg, we predict y as yg = a + bxg.

Problem 1. Various doses of a poisonous substance were given to groups of 25 mice and the following results were observed:

Dose (mg) x 4 6 8 10 12 14 16

Number of deaths y 1 3 6 8 14 16 20 165

(a) Find the equation of the least squares line fit to these data (b) Estimate the number of deaths in a group of 25 mice who receive a 7 mg dose of this poison.

Solution: (a)

n = number of sample pairs (xi, yi) = 7 xi = 70,

yi = 68

xi2 = 812,

xi yi = 862

Hence b = {7 x 862 – 70 x 68 } / { 7 x 812 – (70)2 } = 1274/784 = 1.625 a = {68 – 70 x 1.625}/7 = - 6.536 Thus the least square line that fits the given data is: y = -6.536 + 1.625 x (b)

If x = 7, y = -6.536 + 1.625 x 7 = 4.839.

Problem 2: The following are the scores that 12 students obtained in the midterm and final examinations in a course in Statistics:

Mid Term Examination x 71 49 80 73 93 85 58 82 64 32 87 80

Final Examination y 83 62 76 77 89 74 48 78 76 51 73 89

166

(a) Fit a straight line to the above data (b) Hence predict the final exam score of a student who received a score of 84 in the midterm examination.

Solution: (a) n = number of sample pairs (xi, yi) = 12 xi = 854,

yi = 876

xi2 = 64222,

xi yi = 64346

Hence b = {12 x 64346 – 854 x 876 } / { 12 x 64222 – (854)2 } = 24048/41348 = 0.5816 a = {876 – 854 x 0.5816}/12 = 31.609 Thus the least square line that fits the given data is: y = 31.609 + 0.5816 x (b) If x = 84, y = 31.609 + 0.5816 x 84 = 80.46

Correlation If X, Y are two random variables, the correlation coefficient, ρ, between X and Y is defined as ρ=

cov ( X , Y )

Var ( X ) Var (Y )

.

It can be shown that (a) -1 ≤ ρ ≤ 1 (b) If Y is a linear function of X, ρ = ± 1 (c) If X and Y are independent, then ρ = 0 (d) If X, Y have bivariate normal distribution and if ρ = 0, then X and Y are independent.

Sample Correlation Coefficient If { (x1,y1), (x2, y2), … (xn, yn)} is a random sample of size n from the 2-dimensional random variable (X, Y), then the sample correlation coefficient, r, is defined by 167

n

r=

i =1 n i =1

( xi − x ) ( y i − y )

( xi − x )

2

n i =1

. ( yi − y )

2

We shall use r to estimate the (unknown) population correlation coefficient ρ. If (X, Y) has a bivariate normal distribution, we can show that the random variable, 1 1+ r 1 1+ ρ 1 Z = ln is approximately normal with mean ln and variance . 2 1− r 2 1− ρ n −3 S xy

Note: A computational formula for r is given by r =

where S xx =

S xy =

n i =1

n i =1

( xi − x ) 2 =

( xi − x ) ( y i − y ) =

n i =1

n i =1

( xi2 −

( xi y i −

n i =1

xi ) 2 n

n i =1

xi ) (

, S xx = n

i =1

S xx S yy n

i =1

,

( yi − y ) 2 =

n i =1

( y i2 −

yi )

n

.

Problem 3. Calculate r for the data { (8, 3), (1, 4), (5, 0), (4, 2), (7, 1) }.

Solution x = 25/5 = 5. y = 10/5 = 2. n i =1 n i =1 n i =1

( xi − x ) ( y i − y ) = 3 x 1 + (-4) x 2 + 0 x (-2) + (-1) x 0 + 2 x (-1) = -7 ( xi − x ) 2 = 9 + 16 + 0 + 1 + 4 = 30 ( y i − y ) 2 = 1 + 4 + 4 + 0 + 1 = 10

Hence r =

−7 (30) (10)

= - 0.404.

168

n i =1

yi ) 2 n

,

Problem 4. The following are the measurements of the air velocity and evaporation coefficient of burning fuel droplets in an impulse engine:

Air velocity x 20 60 100 140 180 220 260 300 340 380

Evaporation Coefficient y 0.18 0.37 0.35 0.78 0.56 0.75 1.18 1.30 1.17 1.65

Find the sample correlation coefficient, r.

Solution. S xx =

S xx =

S xy =

n

( xi − x ) 2 =

i =1

n i =1

( yi − y ) 2 =

n i =1

(

n i =1

xi2 −

i =1

i =1

y i2 −

( xi − x ) ( y i − y ) =

xi ) 2 n

(

n

n

n i =1

n i =1

= 532000 – (2000)2 /10 = 132000

yi ) 2 n

( xi y i −

= 9.1097 – (8.35)2 /10 = 2.13745

n i =1

xi ) (

n i =1

yi )

n

= 505.4 Hence r =

S xy S xx S yy

=

505.4 (132000) (2.13745)

= 0.9515.

************** 169

= 2175.4 –

(2000) (8.35) 10

Related Documents

Probe
November 2019 16
Cosmic Probe
July 2020 10
Probe Fair
December 2019 14
Market Probe Project2
June 2020 6
Probe Comprensi Escrita
November 2019 8