Biostatistics Xuezhong Shi M.D. Professor of Epidemiology & Biostatistics Phone:0371-66940840,66911486 E-mail:
[email protected]
review
Comparison of one sample mean
z − test
yes
X − µ0 z= σ/ n
Is σ known? no yes Is sample size larger than 30? no
t −test t =
X −µ0 S/ n
z − test z=
X − µ0 S/ n
Comparison of two samples Paired t test (samples must come from normal populations):
Yes
Are the two samples dependent?
with df = n - 1
No Do n1 and n2 both exceed 30? No
t = d −0 sd n
Yes No
Are both populations normally distributed? Yes See if σ12 = σ 22 Not reject H0
t=
review z test (normal distribution): z = (x1−x2) s12 s22 + n1 n2
Data transform
Yes Use nonparametric tests Reject
H0
σ12 ≠ σ 22
σ 12 = σ 22
( x1−x2) s 2 (n − 1)+s 2 (n − 1) 1 1 2 2 ( ( 1 +1 ) n +n −2 n1 n2 1 2
t’ test
But when there are more than two samples, (three or more than three treat factors), t-test or u-test cann’t be used. Treat 1 Treat 2 Treat 3
1-2 1-3 2-3
α=0.05 α=0.05 α=0.05
when there are more than two samples, which method should be used?
Analysis of Variance
(ANOVA)
ANOVA is a technique used to test a hypothesis concerning the means of three or more populations.
One-way ANOVA two-way ANOVA
ANOVA
(randomized block design ANOVA) Repeated measurement ANOVA ……
Main contents one-way analysis of variance
Ⅰ Model assumptions Ⅱ Basic ideas of ANOVA Ⅲ Basic steps of ANOVA Ⅳ Relationship between ANOVA and t-test
Teaching aims • Master the applicable conditions and basic ideas of ANOVA • Be familiar with the steps of ANOVA
ANOVA is one of hypothesis tests of numerical variable, which is developed by R.A.FISHER, (a British statistician) So it is also called F-test.
Ⅰ Model assumptions 1 The k samples represent completely independent random samples drawn from k specific populations. 2 Each of the k populations is normal. 3 Each of the k populations has the same variances
Ⅱ Basic ideas of ANOVA The total variation(SS) is decomposed into several components. The corresponding degree of freedom is also decomposed into several components.
Decomposition of total variation
SSB
SST SST= SSB+ SSW
SSW
Decomposition of total degree of freedom
νT ν df=N − 1 ν Between group=k − 1
ν within group=N − k
ν T = ν B +ν W
νB
νW
Generally, SSB>SSW
SS B MS B F= = SSW MSW MS B = SS B /ν B
MSW = SSW /ν W
Ⅲ Basic steps of ANOVA
STEPS
The statisticians have made a set of steps as fixed as legal procedure corresponding to ANOVA, and made some formulas to calculate the T.S. we have many formulas, but their steps are same. You only remember the steps, these formulas will give you when you need.
Set up hypothesis and confirm α
STEPS
compute test statistics Find p value
P≤α
Reject H0
P>α Make conclusion
Don’t reject H0
Example 1
A gerontologist investigating various aspects of the aging process wanted to see whether staying “lean and mean,” that is, being under normal body weight would lengthen life span. She randomly assigned newborn rats from a highly inbred line to one of three diets (table 1). She maintained the rats on three diets throughout their lives and recorded their life spans. Is there evidence that diet affect life span in this study?
Table 1 life spans of different groups Unlimited
90% diet
80% diet
2.5
2.7
3.1
3.1
3.1
2.9
2.3
2.9
3.8
1.9
3.7
3.9
2.4
3.5
4.0
STEPS Set up hypothesis and confirm α
H0 : μ1=μ2=μ3 H1 : At least two of them are different α=0.05
Here, the null hypothesis will be that all population means are equal, and the alternative hypothesis is that at least one mean is different.
STEPS compute test statistics 2 ( ∑ x ) (1) SST = ∑ ∑ ( xij − x ) 2 = ∑ x 2 − = 5.597 N i =1 j =1 k
ni
ν T = 3 × 5 − 1 = 14
groups(i) 处理组
xij
total 合 计
1
2
3
…
k
x11 x12
x 21 x 22
x31 x32
… …
xk 1 xk 2
…
…
…
…
…
x1n1
x 2n2
x3n3
…
x knk
n1
∑x
∑x
n1
n2
1j
j =1
ni
n2
j =1
nk
n3
2j
∑x
3j
…
∑x
…
nk
j =1
n3
j =1
kj
Unlimited
90% diet
80% diet
2.5
2.7
3.1
3.1
3.1
2.9
2.3
2.9
3.8
1.9
3.7
3.9
2.4
3.5
4.0
5
∑X j =1
1j
= 12.2
5
5
∑X j =1
2j
= 15.9
∑X j =1
1j
= 17.7
n1 = 5
n2 = 5
n3 = 5
x1 = 2.44
x2 = 3.18
x3 = 3.54
(2) ss B = ∑ ni ( X i − X ) 2 i
= 5(3.145 − 3.0533) 2 + 5(3.18 − 3.0533) 2 + 5(3.54 − 3.0533) 2 = 3.145;
ν B = K −1 = 3 −1 = 2 SS B 3.145 MS B = = = 1.573 νB 2
(3) SSW = SST − SS B = 5.597 − 3.145 = 2.452
ν W = ν T −ν B = 14 − 2 = 12 SSW 2.452 MSW = = = 0.204 νW 12 MS B 1.573 F= = = 7.697 MSW 0.204
Summary table Source SS df MS SSB 3.145 2 1.573 SSW 2.452 12 0.204 SST 5.597 14
F 7.697
STEPS Find p value and make conclusion look up F critical values table F(0.05,2,12) =3.88 F> F(0.05,2,12) So reject H0 At least two of them are different
Table 4 F critical value
ν
2
10
1 ……
The outcome of ANOVA only reflects on the whole the population mean is different. It doesn’t show any two population means are different. If you want to know which two population mean are different, you should do multiple comparisons ( also called post hoc test).
multiple comparisons There are many methods in multiple comparisons. Among them, SNK - q test and LSD - t test are used often.
Input data
Tests of normality
T e s t s o f N o r m a li t y a
K o lm o g o r o v -S m ir n o v S h a p ir o -W ilk g r o u p s S ta tis tic df S ig . S ta tis tic df S ig . life s p a n su n lim ite d .2 4 5 5 .2 0 0* .9 5 1 5 .7 4 7 9 0 % d ie t .1 8 0 5 .2 0 0* .9 5 2 5 .7 5 4 8 0 % d ie t .2 9 7 5 .1 7 0 .8 4 4 5 .1 7 6 * .T h is is a lo w e r b o u n d o f th e tr u e sig n ific a n c e . a .L illie fo r s S ig n ific a n c e C o r r e c tio n
ANOVA
Test of Homogeneity of Variances lifespans Levene Statistic .598
df1
df2 2
12
Sig. .566
ANOVA life sp a ns Sum of S qu a re s B e twe e n G ro up s 3.14 5 W ith in G ro up s 2.45 2 T o ta l 5.59 7
df
M e a n S qu a re 2 1.57 3 12 .204 14
F 7 .697
S ig . .007
M u lt ip le C o m p a r is o n s D e p e n d e n t V a r ia b le : life sp a n s LSD Mean D iffe r e n c e (I) g r o u p s(J) g r o u p s (I-J) S td . E r r o r u n lim ite d9 0 % d ie t -.7 4 0 0* .2 8 5 9 8 0 % d ie t -1 .1 0 0 *0 .2 8 5 9 9 0 % d ie t u n lim ite d .7 4 0 0* .2 8 5 9 8 0 % d ie t -.3 6 0 0 .2 8 5 9 8 0 % d ie t u n lim ite d 1 .1 0 0 *0 .2 8 5 9 9 0 % d ie t .3 6 0 0 .2 8 5 9
S ig . .0 2 4 .0 0 2 .0 2 4 .2 3 2 .0 0 2 .2 3 2
9 5 % C o n fid e n c e In te r v a l L o w e r B o u nUdp p e r B o u n d -1 .3 6 3 -.1 1 7 -1 .7 2 3 -.4 7 7 .1 1 7 1 .3 6 3 -.9 8 3 .2 6 3 .4 7 7 1 .7 2 3 -.2 6 3 .9 8 3
* .T h e m e a n d iffe r e n c e is s ig n ific a n t a t th e .0 5 le v e l.
Ⅳ Relationship between ANOVA and t-test Example2
Survivable Days after taking some drug
Experiments 5 10 14 21 17
control 18 21 30 23 22 22
STEPS Set up hypothesis and confirm α
1. H0 :µ1 = µ 2
µ1 ≠ µ 2
H1 : α = 0.05
STEPS compute test statistics
Use ANOVA (∑ x) (1) SST = ∑ ∑ ( xij − x ) = ∑ x − = 466.727 N i =1 j =1 k
ni
ν T = 11 − 1 = 10
2
2
2
Survivable Days after taking some drug Experiments
5
18
∶
∶ 17
n
x
∑x ∑x
2
total
control
22 22
5
6
13.4
22.7
11 18.45
67
136
203
1051
3162
4213
( 2) ss B = ∑ni ( X i − X ) 2 i
= 234 .194
ν B = 2 −1 =1 MS B =
SS B
νB
= 234 .194
(3) SSW = SST − SS B = 232.233
ν W = ν T −ν B = 10 − 1 = 9 SSW 232.233 MSW = = = 25.804 νW 9 MS B 234.194 F= = = 9.076 MSW 25.804
Summary Summary table table of of ANOVA ANOVA Source SST SSB SSW
Summary table SS df MS F 466.727 10 234.194 1 234.194 9.076 232.233 9 25.804
Use t-test (x −x ) s 2 (n −1)+s 2 (n −1) 2 2 (1 + 1 ) (1 1 n +n −2 n1 n2 1 2 =3.012
t=
1
2
STEPS Find p value and make conclusion
F0.05,1,9 =5.12 F = 9.076> F0.05, 1,9
, P<0.05
t = 3.012 , P<0.05
F=t2 So reject H0
When treat factors are 2, the effect of F-test and t-test is equivalent (F=t2). But it is more easier choosing t-test than choosing F-test. So when treat factors are 2, we had better choose t-test. Only when treat factors are larger than 3, can we choose F-test.