Handy Reference II
HANDY REFERENCE SHEET 2 – HRP 259 Calculation Formula’s for Sample Data: Univariate: n
Sample proportion:
pˆ
i 1
1 if success xi 0 if failure n
n
Sample mean: x =
x
i
i 1
n n
2 Sum of squares of x: SS x ( xi x )
[to ease computation: SS x
i 1
n
(x
SS Sample variance: s x2 = x = n 1
i
n
x
2 i
nx 2 ]
i 1
x)2
i 1
n 1 n
(x
SS x = = n 1
Sample standard deviation: s x
i 1
n 1 n
Standard error of the sample mean:
sx n
x)2
i
(x
i
x)2
i 1
=
n 1 n
2. Bivariate n
Sum of squares of xy: SS xy ( xi x )( y i y )
[to ease computation: SS xy
i 1
n
Sample Covariance:
2 s xy
=
SS xy n 1
=
(x
i
Sample Correlation: rˆ
s x2
s 2y
=
n
x y i
i
nx y ]
i 1
x )( y i y )
i 1
n 1 n
2 s xy
SS xy SS x SS y
(x
i
x )( y i y )
i 1
n
i 1
( xi x ) 2
n
(y
i
y) 2
i 1
Variance rules for correlated random variables: Var (x+y)=Var(x)+Var(y)+2Cov(x,y); Var (x-y)=Var(x)+Var(y)-2Cov(x,y)
vii
Handy Reference II
Hypothesis Testing The Steps: 1. Define your hypotheses (null, alternative) 2. Specify your null distribution 3. Do an experiment 4. Calculate the p-value of what you observed 5. Reject or fail to reject (~accept) the null hypothesis The Errors Your Statistical Decision
Reject H0 Do not reject H0
True state of null hypothesis H0 True
H0 False
Type I error ( )
Correct
Correct
Type II Error ( )
Power=1-
viii
Handy Reference II
Confidence intervals (estimation) For a mean (σ2 unknown):
x t n 1, / 2
sx
[if variance known or large sample size t df , / 2
n
Z / 2 ]
For a paired difference (σ2 unknown):
d t n 1, / 2
sd
[where
n
di
= the within-pair difference]
For a difference in means, 2 independent samples (σ2’s unknown but roughly equal): ( x y ) t n 2, / 2
s 2p nx
s 2p
s 2p
ny
=
SS x SS y n2
or
(n x 1) s x2 ( n y 1) s 2y n2
For a proportion: pˆ Z / 2
( pˆ )(1 pˆ ) n
For a difference in proportions, 2 independent samples: ( pˆ 1 pˆ 2 ) Z / 2
( pˆ 1 )(1 pˆ 1 ) ( pˆ 2 )(1 pˆ 2 ) n1 n2
For a correlation coefficient rˆ t n 2, / 2 *
1 rˆ 2 n2
For a regression coefficient: n
ˆ t n 2, / 2 *
2
s SS x
Common values of t and Z t10, / 2 t 20 , / 2 Confidence level 90% 1.81 1.73 95% 2.23 2.09
[ ˆ
SS xy SS x
;s2
(y
i
yˆ i ) 2
i 1
]
n2
t 30, / 2
t 50, / 2
t100 , / 2
Z / 2
1.70 2.04
1.68 2.01
1.66 1.98
1.64 1.96
ix
Handy Reference II 99% 3.17 For an odds ratio:
2.85
2.75
95% confidence limits: OR * exp 1.96
1 a
2.68
1 b
1 c
1 d
, OR * exp
2.63 1.96
1 a
2.58
1 b
1 c
1 d
For a risk ratio: 95% confidence limits: RR * exp
1 a /( a b ) 1 c /( c d ) 1.96 a c
, RR * exp
1 a /( a b ) 1 c /( c d ) 1.96 a c
x
Handy Reference II
Corresponding hypothesis tests Test for Ho: μ= μo (σ2 unknown):
t n 1
x 0 sx n
Test for Ho: μd = 0 (σ2 unknown): t n 1
d 0 sd n
Test for Ho: μx- μy = 0 (σ2 unknown, but roughly equal): t n2
( x y) 0 s 2p nx
s 2p ny
Test for Ho: p = po: Z
pˆ p 0 ( p 0 )(1 p 0 ) n
Test for Ho: p1 p2= 0: Z
( pˆ 1 pˆ 2 ) 0 ( p )(1 p ) ( p )(1 p ) n1 n2
;p
n1 pˆ 1 n 2 pˆ 2 n1 n 2
Test for Ho: r = 0: t n2
rˆ 0 1 rˆ 2 n2
Test for: Ho: β = 0 t n2
ˆ 0 s2 SS x
xi
Handy Reference II
Corresponding sample size/power
Sample size required to test Ho: μd = 0 (paired difference ttest): n
d2 ( Z power Z / 2 ) 2 d 2
Corresponding power for a given n: Z power
d d
n Z / 2
Smaller group sample size required to test Ho: μx – μy = 0 (two sample ttest): (where r=ratio of larger group to smaller group) n smaller
2 2 (r 1) ( Z power Z / 2 ) r ( x y ) 2
Corresponding power for a given n: Z power
x y
nr Z / 2 r 1
Smaller group sample size required to test Ho: p1 – p2 = 0 (difference in two proportions): (where r=ratio of larger group to smaller group) n smaller
2 (r 1) p (1 p )( Z power Z / 2 ) r ( p1 p 2 ) 2
Corresponding power for a given n: Z power
p1 p 2 p (1 p )
nr Z / 2 r 1
Sample size required to test Ho: r = 0 (correlation/equivalent to simple linear regression): (where r=ratio of larger group to smaller group) n
(1 r ) 2 ( Z power Z / 2 ) 2 r2
2
Corresponding power for a given n: Z power
r 1 r2
n 2 Z / 2
xii
Handy Reference II
Common values of Zpower
Zpower: Power:
.25 60%
.52 70%
.84 80%
1.28 90%
1.64 95%
2.33 99%
Linear regression Assumptions of Linear Regression Linear regression assumes that… 1. The relationship between X and Y is linear 2. Y is distributed normally at each value of X 3. The variance of Y at every value of X is the same (homogeneity of variances)
xiii
Handy Reference II ANOVA TABLE Source Sourceof of variation variation Between Model
d.f. d.f. k-1 k-1
groups) (k(klevels of X) Within Error
Sum of squares Sum of squares kk
SSB nn (( yyii yy))22 SSM i
k
N
n
2 SSE SSW ( y ij ( yyˆ i )
ij
j 1
Total variation Total variation
k 1k 1
ii 11
nk-k N-k N-1 nk-1
TSS= TSS= SS y
i 1 j 1
n
k
i 1
2
n
i
ij
k 1 k 1 Fk1,NkFk1,nkk SSE SSW chart chart N k nk k
2 SSW 2 SSE y i ) 2 s s N knk k
( y y) ( y y )
SS y
MeanMean Sum Sum of of Squares F-statistic p-value Squares F-statistic p-value SSM SSB SSM SSB Go toGo to
2
i 1 j 1
Coefficient of Determination: r 2 R 2
variation explained by the predictor SSB 1 SSW = TSS TSS total variation in the outcome
ANOVA TABLE FOR linear regression (more general) case Coefficient of Determination: r 2 R 2
variation explained by the predictor total variation in the outcome
SSM 1 SSE TSS TSS
xiv
Handy Reference II
Probability distributions often used in statistics: T-distribution Given n independent observations x i , t
x s/ n
The Chi-Square Distribution n
n Z 2 ; where Z~ Normal(0,1) i 1
E(χn) = n Var(χn) = 2n
The F- Distribution
n Fn,m=
m
n m
xv
Handy Reference II Summary of common statistical tests for epidemiology/clinical research: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Predictor (independent) variable/s
Outcome (dependent) variable
Statistical procedure or measure of association
Cross-sectional/case-control studies Binary
Continuous
T-test*
Categorical
Continuous
ANOVA*
Continuous
Continuous
Simple linear regression
Multivariate (categorical and continuous)
Continuous
Multiple linear regression
Categorical
Categorical
Chi-square test§
Binary
Binary
Odds ratio, Mantel-Haenszel OR
Multivariate (categorical and continuous)
Binary
Logistic regression
Cohort Studies/Clinical Trials Binary
Binary
Relative risk
Categorical
Time-to-event
Kaplan-Meier curve/ log-rank test
Multivariate (categorical Time-to-event and continuous)
Cox-proportional hazards model
Categorical
Repeated-measures ANOVA
Continuous—repeated
Multivariate (categorical Continuous—repeated and continuous)
Mixed models for repeated measures
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.
16
Handy Reference II §
Fisher’s exact test is used when the expected cells contain less than 5 subjects.
17
Handy Reference II Course coverage in the HRP statistics sequence: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Predictor (independent) variable/s
Outcome (dependent) variable
Statistical procedure or measure of association
Cross-sectional/case-control studies Binary
Continuous
T-test*
Categorical
Continuous
ANOVA*
Continuous
Continuous
Simple linear regression
Multivariate (categorical and continuous)
Continuous
Categorical
Categorical
Chi-square test§
Binary
Binary
Odds ratio, Mantel-Haenszel OR
Multivariate (categorical and continuous)
Binary
Logistic regression
HRP259
Multiple linear regression
HRP261
Cohort Studies/Clinical Trials Binary
Binary
Risk ratio
Categorical
Time-to-event
Kaplan-Meier curve/ log-rank test
Multivariate (categorical and continuous)
Time-to-event
Cox-proportional hazards model (hazard ratios)
Categorical
Continuous—repeated
Repeated-measures ANOVA
Multivariate (categorical and continuous)
Continuous—repeated
Mixed models for repeated measures
HRP262
*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.
18
Handy Reference II §
Fisher’s exact test is used when the expected cells contain less than 5 subjects.
19
Handy Reference II
Corresponding SAS PROCs: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Statistical procedure or measure of association Predictor
SAS PROC
Outcome Cross-sectional/case-control studies
Binary
Continuous
T-test*
PROC TTEST
Categorical
Continuous
ANOVA*
PROC ANOVA
Continuous
Continuous
Simple linear regression
PROC REG
Multivariate (categorical /continuous)
Multiple linear regression Continuous
Categorical
Categorical
Chi-square test§
PROC FREQ
Binary
Binary
Odds ratio, Mantel-Haenszel OR
PROC FREQ
Multivariate (categorical/ continuous)
Binary
Logistic regression
PROC LOGISTIC
PROC GLM
Cohort Studies/Clinical Trials Binary
Binary
Risk ratio
PROC FREQ
Categorical
Time-to-event
Kaplan-Meier curve/ log-rank test
PROC LIFETEST
Cox-proportional hazards model (hazard ratios)
PROC PHREG
Multivariate (categorical and Time-to-event continuous) Categorical
Continuous— repeated
Multivariate Continuous— (categorical and repeated continuous)
Repeated-measures ANOVA
PROC GLM
Mixed models for repeated measures PROC MIXED
20
Handy Reference II *Non-parametric equivalents: PROC NPAR1WAY; §Fisher’s exact test: PROC FREQ, option: exact
21