Analysis Of Variance

  • Uploaded by: api-19916399
  • 0
  • 0
  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Analysis Of Variance as PDF for free.

More details

  • Words: 1,925
  • Pages: 51
Biostatistics Xuezhong Shi M.D. Professor of Epidemiology & Biostatistics Phone:0371-66940840,66911486 E-mail: [email protected]

review

Comparison of one sample mean

z − test

yes

X − µ0 z= σ/ n

Is σ known? no yes Is sample size larger than 30? no

t −test t =

X −µ0 S/ n

z − test z=

X − µ0 S/ n

Comparison of two samples Paired t test (samples must come from normal populations):

Yes

Are the two samples dependent?

with df = n - 1

No Do n1 and n2 both exceed 30? No

t = d −0 sd n

Yes No

Are both populations normally distributed? Yes See if σ12 = σ 22 Not reject H0

t=

review z test (normal distribution): z = (x1−x2) s12 s22 + n1 n2

Data transform

Yes Use nonparametric tests Reject

H0

σ12 ≠ σ 22

σ 12 = σ 22

( x1−x2) s 2 (n − 1)+s 2 (n − 1) 1 1 2 2 ( ( 1 +1 ) n +n −2 n1 n2 1 2

t’ test

But when there are more than two samples, (three or more than three treat factors), t-test or u-test cann’t be used. Treat 1 Treat 2 Treat 3

1-2 1-3 2-3

α=0.05 α=0.05 α=0.05

when there are more than two samples, which method should be used?

Analysis of Variance

(ANOVA)

ANOVA is a technique used to test a hypothesis concerning the means of three or more populations.

One-way ANOVA two-way ANOVA

ANOVA

(randomized block design ANOVA) Repeated measurement ANOVA ……

Main contents one-way analysis of variance

Ⅰ Model assumptions Ⅱ Basic ideas of ANOVA Ⅲ Basic steps of ANOVA Ⅳ Relationship between ANOVA and t-test

Teaching aims • Master the applicable conditions and basic ideas of ANOVA • Be familiar with the steps of ANOVA

ANOVA is one of hypothesis tests of numerical variable, which is developed by R.A.FISHER, (a British statistician) So it is also called F-test.

Ⅰ Model assumptions 1 The k samples represent completely independent random samples drawn from k specific populations. 2 Each of the k populations is normal. 3 Each of the k populations has the same variances

Ⅱ Basic ideas of ANOVA The total variation(SS) is decomposed into several components. The corresponding degree of freedom is also decomposed into several components.

Decomposition of total variation

SSB

SST SST= SSB+ SSW

SSW

Decomposition of total degree of freedom

νT ν df=N − 1 ν Between group=k − 1

ν within group=N − k

ν T = ν B +ν W

νB

νW

Generally, SSB>SSW

SS B MS B F= = SSW MSW MS B = SS B /ν B

MSW = SSW /ν W

Ⅲ Basic steps of ANOVA

STEPS

The statisticians have made a set of steps as fixed as legal procedure corresponding to ANOVA, and made some formulas to calculate the T.S. we have many formulas, but their steps are same. You only remember the steps, these formulas will give you when you need.

Set up hypothesis and confirm α

STEPS

compute test statistics Find p value

P≤α

Reject H0

P>α Make conclusion

Don’t reject H0

Example 1

A gerontologist investigating various aspects of the aging process wanted to see whether staying “lean and mean,” that is, being under normal body weight would lengthen life span. She randomly assigned newborn rats from a highly inbred line to one of three diets (table 1). She maintained the rats on three diets throughout their lives and recorded their life spans. Is there evidence that diet affect life span in this study?

Table 1 life spans of different groups Unlimited

90% diet

80% diet

2.5

2.7

3.1

3.1

3.1

2.9

2.3

2.9

3.8

1.9

3.7

3.9

2.4

3.5

4.0

STEPS Set up hypothesis and confirm α

H0 : μ1=μ2=μ3 H1 : At least two of them are different α=0.05

Here, the null hypothesis will be that all population means are equal, and the alternative hypothesis is that at least one mean is different.

STEPS compute test statistics 2 ( ∑ x ) (1) SST = ∑ ∑ ( xij − x ) 2 = ∑ x 2 − = 5.597 N i =1 j =1 k

ni

ν T = 3 × 5 − 1 = 14

groups(i) 处理组

xij

total 合 计

1

2

3



k

x11 x12

x 21 x 22

x31 x32

… …

xk 1 xk 2











x1n1

x 2n2

x3n3



x knk

n1

∑x

∑x

n1

n2

1j

j =1

ni

n2

j =1

nk

n3

2j

∑x

3j



∑x



nk

j =1

n3

j =1

kj

Unlimited

90% diet

80% diet

2.5

2.7

3.1

3.1

3.1

2.9

2.3

2.9

3.8

1.9

3.7

3.9

2.4

3.5

4.0

5

∑X j =1

1j

= 12.2

5

5

∑X j =1

2j

= 15.9

∑X j =1

1j

= 17.7

n1 = 5

n2 = 5

n3 = 5

x1 = 2.44

x2 = 3.18

x3 = 3.54

(2) ss B = ∑ ni ( X i − X ) 2 i

= 5(3.145 − 3.0533) 2 + 5(3.18 − 3.0533) 2 + 5(3.54 − 3.0533) 2 = 3.145;

ν B = K −1 = 3 −1 = 2 SS B 3.145 MS B = = = 1.573 νB 2

(3) SSW = SST − SS B = 5.597 − 3.145 = 2.452

ν W = ν T −ν B = 14 − 2 = 12 SSW 2.452 MSW = = = 0.204 νW 12 MS B 1.573 F= = = 7.697 MSW 0.204

Summary table Source SS df MS SSB 3.145 2 1.573 SSW 2.452 12 0.204 SST 5.597 14

F 7.697

STEPS Find p value and make conclusion look up F critical values table F(0.05,2,12) =3.88 F> F(0.05,2,12) So reject H0 At least two of them are different

Table 4 F critical value

ν

2

10

1 ……

The outcome of ANOVA only reflects on the whole the population mean is different. It doesn’t show any two population means are different. If you want to know which two population mean are different, you should do multiple comparisons ( also called post hoc test).

multiple comparisons There are many methods in multiple comparisons. Among them, SNK - q test and LSD - t test are used often.

Input data

Tests of normality

T e s t s o f N o r m a li t y a

K o lm o g o r o v -S m ir n o v S h a p ir o -W ilk g r o u p s S ta tis tic df S ig . S ta tis tic df S ig . life s p a n su n lim ite d .2 4 5 5 .2 0 0* .9 5 1 5 .7 4 7 9 0 % d ie t .1 8 0 5 .2 0 0* .9 5 2 5 .7 5 4 8 0 % d ie t .2 9 7 5 .1 7 0 .8 4 4 5 .1 7 6 * .T h is is a lo w e r b o u n d o f th e tr u e sig n ific a n c e . a .L illie fo r s S ig n ific a n c e C o r r e c tio n

ANOVA

Test of Homogeneity of Variances lifespans Levene Statistic .598

df1

df2 2

12

Sig. .566

ANOVA life sp a ns Sum of S qu a re s B e twe e n G ro up s 3.14 5 W ith in G ro up s 2.45 2 T o ta l 5.59 7

df

M e a n S qu a re 2 1.57 3 12 .204 14

F 7 .697

S ig . .007

M u lt ip le C o m p a r is o n s D e p e n d e n t V a r ia b le : life sp a n s LSD Mean D iffe r e n c e (I) g r o u p s(J) g r o u p s (I-J) S td . E r r o r u n lim ite d9 0 % d ie t -.7 4 0 0* .2 8 5 9 8 0 % d ie t -1 .1 0 0 *0 .2 8 5 9 9 0 % d ie t u n lim ite d .7 4 0 0* .2 8 5 9 8 0 % d ie t -.3 6 0 0 .2 8 5 9 8 0 % d ie t u n lim ite d 1 .1 0 0 *0 .2 8 5 9 9 0 % d ie t .3 6 0 0 .2 8 5 9

S ig . .0 2 4 .0 0 2 .0 2 4 .2 3 2 .0 0 2 .2 3 2

9 5 % C o n fid e n c e In te r v a l L o w e r B o u nUdp p e r B o u n d -1 .3 6 3 -.1 1 7 -1 .7 2 3 -.4 7 7 .1 1 7 1 .3 6 3 -.9 8 3 .2 6 3 .4 7 7 1 .7 2 3 -.2 6 3 .9 8 3

* .T h e m e a n d iffe r e n c e is s ig n ific a n t a t th e .0 5 le v e l.

Ⅳ Relationship between ANOVA and t-test Example2

Survivable Days after taking some drug

Experiments 5 10 14 21 17

control 18 21 30 23 22 22

STEPS Set up hypothesis and confirm α

1. H0 :µ1 = µ 2

µ1 ≠ µ 2

H1 : α = 0.05

STEPS compute test statistics

Use ANOVA (∑ x) (1) SST = ∑ ∑ ( xij − x ) = ∑ x − = 466.727 N i =1 j =1 k

ni

ν T = 11 − 1 = 10

2

2

2

Survivable Days after taking some drug Experiments

5

18



∶ 17

n

x

∑x ∑x

2

total

control

22 22

5

6

13.4

22.7

11 18.45

67

136

203

1051

3162

4213

( 2) ss B = ∑ni ( X i − X ) 2 i

= 234 .194

ν B = 2 −1 =1 MS B =

SS B

νB

= 234 .194

(3) SSW = SST − SS B = 232.233

ν W = ν T −ν B = 10 − 1 = 9 SSW 232.233 MSW = = = 25.804 νW 9 MS B 234.194 F= = = 9.076 MSW 25.804

Summary Summary table table of of ANOVA ANOVA Source SST SSB SSW

Summary table SS df MS F 466.727 10 234.194 1 234.194 9.076 232.233 9 25.804

Use t-test (x −x ) s 2 (n −1)+s 2 (n −1) 2 2 (1 + 1 ) (1 1 n +n −2 n1 n2 1 2 =3.012

t=

1

2

STEPS Find p value and make conclusion

F0.05,1,9 =5.12 F = 9.076> F0.05, 1,9

, P<0.05

t = 3.012 , P<0.05

F=t2 So reject H0

When treat factors are 2, the effect of F-test and t-test is equivalent (F=t2). But it is more easier choosing t-test than choosing F-test. So when treat factors are 2, we had better choose t-test. Only when treat factors are larger than 3, can we choose F-test.

Related Documents