Handyref2.doc

  • Uploaded by: Rosetta Renner
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Handyref2.doc as PDF for free.

More details

  • Words: 1,971
  • Pages: 15
Handy Reference II

HANDY REFERENCE SHEET 2 – HRP 259 Calculation Formula’s for Sample Data: Univariate: n

Sample proportion:

pˆ 

 i 1

1 if success xi    0 if failure n

n

Sample mean: x =

x

i

i 1

n n

2 Sum of squares of x: SS x   ( xi  x )

[to ease computation: SS x

i 1

n

 (x

SS Sample variance: s x2 =  x = n 1

i



n

x

2 i

 nx 2 ]

i 1

 x)2

i 1

n 1 n

 (x

SS x = = n 1

Sample standard deviation: s x

i 1

n 1 n

Standard error of the sample mean:

sx n

 x)2

i

 (x

i

 x)2

i 1

=

n 1 n

2. Bivariate n

Sum of squares of xy: SS xy   ( xi  x )( y i  y )

[to ease computation: SS xy

i 1

n

Sample Covariance:

2 s xy

=

SS xy n 1

=

 (x

i

Sample Correlation: rˆ 

s x2

s 2y

=

n

x y i

i

 nx y ]

i 1

 x )( y i  y )

i 1

n 1 n

2 s xy



SS xy SS x SS y



 (x

i

 x )( y i  y )

i 1

n

 i 1

 ( xi  x ) 2

n

(y

i

 y) 2

i 1

Variance rules for correlated random variables: Var (x+y)=Var(x)+Var(y)+2Cov(x,y); Var (x-y)=Var(x)+Var(y)-2Cov(x,y)

vii

Handy Reference II

Hypothesis Testing The Steps: 1. Define your hypotheses (null, alternative) 2. Specify your null distribution 3. Do an experiment 4. Calculate the p-value of what you observed 5. Reject or fail to reject (~accept) the null hypothesis The Errors Your Statistical Decision

Reject H0 Do not reject H0

True state of null hypothesis H0 True

H0 False

Type I error ( )

Correct

Correct

Type II Error ( )

Power=1-

viii

Handy Reference II

Confidence intervals (estimation) For a mean (σ2 unknown):

x  t n 1, / 2 

sx

[if variance known or large sample size t df , / 2

n

 Z / 2 ]

For a paired difference (σ2 unknown):

d  t n 1, / 2 

sd

[where

n

di

= the within-pair difference]

For a difference in means, 2 independent samples (σ2’s unknown but roughly equal): ( x  y )  t n  2, / 2 

s 2p nx



s 2p

s 2p

ny

=

SS x  SS y n2

or

(n x  1) s x2  ( n y  1) s 2y n2

For a proportion: pˆ  Z  / 2 

( pˆ )(1  pˆ ) n

For a difference in proportions, 2 independent samples: ( pˆ 1  pˆ 2 )  Z  / 2 

( pˆ 1 )(1  pˆ 1 ) ( pˆ 2 )(1  pˆ 2 )  n1 n2

For a correlation coefficient rˆ  t n  2, / 2 *

1  rˆ 2 n2

For a regression coefficient: n

ˆ  t n  2, / 2 *

2

s SS x

Common values of t and Z t10, / 2 t 20 , / 2 Confidence level 90% 1.81 1.73 95% 2.23 2.09

[ ˆ 

SS xy SS x

;s2 

(y

i

 yˆ i ) 2

i 1

]

n2

t 30, / 2

t 50, / 2

t100 , / 2

Z / 2

1.70 2.04

1.68 2.01

1.66 1.98

1.64 1.96

ix

Handy Reference II 99% 3.17 For an odds ratio:

2.85

2.75 

95% confidence limits: OR * exp  1.96

1 a

2.68



1 b



1 c



1  d 

, OR * exp

2.63   1.96 

1 a

2.58



1 b



1 c



1  d 

For a risk ratio: 95% confidence limits: RR * exp

 1 a /( a  b ) 1 c /( c  d )    1.96  a c  

, RR * exp

 1 a /( a  b ) 1 c /( c  d )    1.96  a c  

x

Handy Reference II

Corresponding hypothesis tests Test for Ho:  μ= μo (σ2 unknown):

t n 1 

x  0 sx n

Test for Ho:  μd = 0 (σ2 unknown): t n 1 

d 0 sd n

Test for Ho:  μx- μy = 0 (σ2 unknown, but roughly equal): t n2 

( x  y)  0 s 2p nx



s 2p ny

Test for Ho:  p = po: Z

pˆ  p 0 ( p 0 )(1  p 0 ) n

Test for Ho: p1­ p2= 0: Z

( pˆ 1  pˆ 2 )  0 ( p )(1  p ) ( p )(1  p )  n1 n2

;p

n1 pˆ 1  n 2 pˆ 2 n1  n 2

Test for Ho: r = 0: t n2 

rˆ  0 1  rˆ 2 n2

Test for: Ho: β = 0 t n2 

ˆ  0 s2 SS x

xi

Handy Reference II

Corresponding sample size/power

Sample size required to test Ho:  μd = 0 (paired difference ttest): n

 d2 ( Z power  Z  / 2 ) 2 d 2

Corresponding power for a given n: Z power 

d d

n  Z / 2

Smaller group sample size required to test Ho:  μx – μy = 0 (two sample ttest): (where r=ratio of larger group to smaller group) n smaller

2 2 (r  1)  ( Z power  Z  / 2 )  r ( x   y ) 2

Corresponding power for a given n: Z power 

x  y 

nr  Z / 2 r 1

Smaller group sample size required to test Ho:  p1 – p2 = 0 (difference in two proportions): (where r=ratio of larger group to smaller group) n smaller

2 (r  1) p (1  p )( Z power  Z  / 2 )  r ( p1  p 2 ) 2

Corresponding power for a given n: Z power 

p1  p 2 p (1  p )

nr  Z / 2 r 1

Sample size required to test Ho:  r = 0 (correlation/equivalent to simple linear regression): (where r=ratio of larger group to smaller group) n

(1  r ) 2 ( Z power  Z  / 2 ) 2 r2

2

Corresponding power for a given n: Z power 

r 1 r2

n  2  Z / 2

xii

Handy Reference II

Common values of Zpower

Zpower: Power:

.25 60%

.52 70%

.84 80%

1.28 90%

1.64 95%

2.33 99%

Linear regression Assumptions of Linear Regression Linear regression assumes that… 1. The relationship between X and Y is linear 2. Y is distributed normally at each value of X 3. The variance of Y at every value of X is the same (homogeneity of variances)

xiii

Handy Reference II ANOVA TABLE Source Sourceof of variation variation Between Model

d.f. d.f. k-1 k-1

groups) (k(klevels of X) Within Error

Sum of squares Sum of squares kk



SSB  nn (( yyii  yy))22 SSM i

k

N

n



2 SSE SSW  ( y ij ( yyˆ i )

ij

j 1

Total variation Total variation

k  1k  1

ii 11

nk-k N-k N-1 nk-1

TSS= TSS= SS y 

i 1 j 1

n

k

i 1

2

n

i

ij

k 1 k 1 Fk­1,N­kFk­1,nk­k SSE SSW chart chart N  k nk  k

2 SSW 2 SSE y i ) 2 s  s  N  knk  k

( y  y)   ( y  y )

SS y 

MeanMean Sum Sum of of Squares F-statistic p-value Squares F-statistic p-value SSM SSB SSM SSB Go toGo to

2

i 1 j 1

Coefficient of Determination: r 2  R 2 

variation explained by the predictor SSB 1  SSW  = TSS TSS total variation in the outcome

ANOVA TABLE FOR linear regression (more general) case Coefficient of Determination: r 2  R 2 

variation explained by the predictor total variation in the outcome



SSM 1  SSE  TSS TSS

xiv

Handy Reference II

Probability distributions often used in statistics: T-distribution Given n independent observations x i , t 

x s/ n

The Chi-Square Distribution n

 n   Z 2 ; where Z~ Normal(0,1) i 1

E(χn) = n Var(χn) = 2n

The F- Distribution

n Fn,m=

m

n m

xv

Handy Reference II Summary of common statistical tests for epidemiology/clinical research: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Predictor (independent) variable/s

Outcome (dependent) variable

Statistical procedure or measure of association

Cross-sectional/case-control studies Binary

Continuous

T-test*

Categorical

Continuous

ANOVA*

Continuous

Continuous

Simple linear regression

Multivariate (categorical and continuous)

Continuous

Multiple linear regression

Categorical

Categorical

Chi-square test§

Binary

Binary

Odds ratio, Mantel-Haenszel OR

Multivariate (categorical and continuous)

Binary

Logistic regression

Cohort Studies/Clinical Trials Binary

Binary

Relative risk

Categorical

Time-to-event

Kaplan-Meier curve/ log-rank test

Multivariate (categorical Time-to-event and continuous)

Cox-proportional hazards model

Categorical

Repeated-measures ANOVA

Continuous—repeated

Multivariate (categorical Continuous—repeated and continuous)

Mixed models for repeated measures

*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.

16

Handy Reference II §

Fisher’s exact test is used when the expected cells contain less than 5 subjects.

17

Handy Reference II Course coverage in the HRP statistics sequence: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Predictor (independent) variable/s

Outcome (dependent) variable

Statistical procedure or measure of association

Cross-sectional/case-control studies Binary

Continuous

T-test*

Categorical

Continuous

ANOVA*

Continuous

Continuous

Simple linear regression

Multivariate (categorical and continuous)

Continuous

Categorical

Categorical

Chi-square test§

Binary

Binary

Odds ratio, Mantel-Haenszel OR

Multivariate (categorical and continuous)

Binary

Logistic regression

HRP259

Multiple linear regression

HRP261

Cohort Studies/Clinical Trials Binary

Binary

Risk ratio

Categorical

Time-to-event

Kaplan-Meier curve/ log-rank test

Multivariate (categorical and continuous)

Time-to-event

Cox-proportional hazards model (hazard ratios)

Categorical

Continuous—repeated

Repeated-measures ANOVA

Multivariate (categorical and continuous)

Continuous—repeated

Mixed models for repeated measures

HRP262

*Non-parametric tests are used when the outcome variable is clearly non-normal and sample size is small.

18

Handy Reference II §

Fisher’s exact test is used when the expected cells contain less than 5 subjects.

19

Handy Reference II

Corresponding SAS PROCs: Choice of appropriate statistical test or measure of association for various types of data by study design. Types of variables to be analyzed Statistical procedure or measure of association Predictor

SAS PROC

Outcome Cross-sectional/case-control studies

Binary

Continuous

T-test*

PROC TTEST

Categorical

Continuous

ANOVA*

PROC ANOVA

Continuous

Continuous

Simple linear regression

PROC REG

Multivariate (categorical /continuous)

Multiple linear regression Continuous

Categorical

Categorical

Chi-square test§

PROC FREQ

Binary

Binary

Odds ratio, Mantel-Haenszel OR

PROC FREQ

Multivariate (categorical/ continuous)

Binary

Logistic regression

PROC LOGISTIC

PROC GLM

Cohort Studies/Clinical Trials Binary

Binary

Risk ratio

PROC FREQ

Categorical

Time-to-event

Kaplan-Meier curve/ log-rank test

PROC LIFETEST

Cox-proportional hazards model (hazard ratios)

PROC PHREG

Multivariate (categorical and Time-to-event continuous) Categorical

Continuous— repeated

Multivariate Continuous— (categorical and repeated continuous)

Repeated-measures ANOVA

PROC GLM

Mixed models for repeated measures PROC MIXED

20

Handy Reference II *Non-parametric equivalents: PROC NPAR1WAY; §Fisher’s exact test: PROC FREQ, option: exact

21

More Documents from "Rosetta Renner"

Table 1.docx
October 2019 4
10.1.1.494.5205.pdf
October 2019 8
Handyref2.doc
October 2019 15
Cfgraphs.docx
October 2019 8
November 2019 7