Estimating and Comparing Rates Incidence Density Incidence Rate Difference and Ratio Confidence Intervals Standardized Rates and Their Comparison 1
A Definition ♦ Kleinbaum, Kupper and Morgenstern, Epidemiologic
Research: Principles and Quantitative Methods (1982), p.97: “A true rate is a potential for change in one quantity per unit change in another quantity, where the latter quantity is usually time. (…) Thus, a rate is not dimensionless and has no finite upper bound – i.e., theoretically, a rate can approach infinity.”
2
Rates ♦ A well-known example of rate is velocity, i.e., change of
distance per unit of time (given, e.g., in km/h). • In practice, it does (should?) have an upper-bound
♦ We can talk about instantaneous and average rates. • Example instantaneous: your car velocity at a particular time-point (can depend on the time-point, e.g., city and highway). • Example average: your average speed after travelling a particular distance (assumed constant across the whole trip).
♦ In epidemiology, we usually talk about average rates.
3
Incidence/Mortality Rate ♦ Kleinbaum, Kupper and Morgenstern, Epidemiologic
Research: Principles and Quantitative Methods (1982), p.100:
“The incidence rate of disease occurrence is the instantaneous potential for change in disease status (i.e., the occurrence of new cases) per unit of time at time t, or the occurrence of disease per unit of time, relative to the size of the candidate (i.e., disease-free) population at time t”. ♦ We could similarly define the mortality rate.
4
Incidence Rate ♦ Other terms: • an “instantaneous risk” (or probability); • a “hazard” (especially for mortality rates); • a “person-time incidence rate”; • a “force of morbidity”.
♦ It is expressed in units of 1/ time. ♦ It is sometimes confused with risk.
5
Rates and Risks ♦ Assume that the incidence rate is constant over time (=λ),
and the same for all individuals. ♦ The risk (probability) of developing disease in time T will
then be equal to 1-e-λT. • Risk is sometimes called a cumulative incidence. • In a disease-free (at time 0) cohort of N individuals, you would thus expect N(1-e-λT) new cases after time T. • Similarly, we could talk about the risk of death.
♦ Thus, formally these are two different quantities. 6
Estimating Rates ♦ Rates require observations of incidence in time. Thus,
they are estimated from cohort studies. ♦ Instantaneous rates are seldom obtained. Rather, the
average rates are computed. ♦ The most basic estimator is the incidence density (ID):
no. of new cases in the calendar period (t 0 , t1 ) I ID = = PT accrued population time ♦ PT is expressed in person-years, person-days etc.
7
Incidence Density ♦ A hypothetical cohort of 12 subjects. ♦ Followed for the period of 5.5 years. ♦ 7 withdrawals among non-cases • three (7,8,12) lost to follow-up; • two (3,4) due to death; • two (5,10) due to study termination.
♦ PT = 2.5+3.5+…+1.5 = 26. ♦ ID=5/26=0.192 per (person-) year
or 1.92 per 10 (person-)years. 8
Population-Time Without Individual Data ♦ E.g., population-based registries. ♦ Person-years computed using the mid-year population. ♦ For rare events, periods of several years may be used. • Ideally, one would like to use mid-year populations for each year. • Alternatively, one can use information for several time-points, or the mid-period population (these are less accurate solutions). One may face the problem of removing those not at risk (e.g., women for prostate cancer incidence). 9
Population-Time Without Individual Data: Example
10
Incidence Density: Remarks ♦ It is an estimate of an average rate. • So we will sometimes refer to it as an “incidence rate”.
♦ Any fluctuations in the instantaneous rate are obscured and
can lead to misleading conclusions. E.g., • 1000 persons followed for 1 year • 100 persons followed by 10 years
produce the same number of person-years. If the average time to disease onset is 5 years, ID in the first cohort will be lower. 11
Incidence Density: Remarks ♦ If applied to the whole cohort/population, sometimes called
crude rate. ♦ However, sex, age, race etc. can have substantial influence
on the incidence of disease. ♦ Comparing crude rates for two populations, which differ
w.r.t., e.g., age, can be misleading (confounding!). ♦ Therefore, usually standardized rates are compared. • E.g., for cancer, age- and sex-standardized rates are used. • They will be discussed later.
12
Confidence Interval for Incidence Density ♦ By using a Poisson model, standard error of ID=I / PT can
be estimated by: SE( ID) =
I ( PT ) 2
♦ Thus, an approximate 95% CI for ID is given by:
ID ± 1.96∙SE(ID). • 99% CI: ID ± 2.58∙SE(ID) .
13
Estimating Incidence Densities: Example ♦ Postmenopausal Hormone and Coronary Heart Disease Cohort Study:
• Stampfer et al., NEJM (1985). • Involving female nurses: Hormone use CHD Person-years
Yes
No
Total
30
60
90
54308.7
51477.5
105786.2
• ID1 = 30/54308.7 = 0.00055; SE(ID1) = (30/54308.72)1/2 = 0.00010 • 95% CI for ID1 = 0.00055 ± 1.96∙0.0001 = (0.00035, 0.00075) • ID0 = 60/51477.5 = 0.00116; SE(ID0) = (60/51477.52)1/2 = 0.00015 • 95% CI for ID0 = 0.00116 ± 1.96∙0.00015 = (0.00086, 0.00145) 14
Comparing Two Incidence Densities ♦ Assume data from a
cohort study:
Exposed Unexposed Cases Pop.-time
Total
I1
I0
I
PT1
PT0
PT
♦ We get two estimates for non- and exposed subjects:
ID0=I0/PT0
and
ID1=I1/PT1.
♦ To compare them, we can look at • Incidence rate difference: IRD = ID1 - ID0 . • Incidence rate ratio: IRR = ID1 / ID0 .
15
Comparing Two Incidence Densities: Example ♦ Postmenopausal Hormone and Coronary Heart Disease Cohort Study:
• Stampfer et al., NEJM (1985). • Involving female nurses: Hormone use Yes
No
Total
30
60
90
54308.7
51477.5
105786.2
CHD Person-years
• ID1 = 30/54308.7 = 0.00055; ID0 = 60/51477.5 = 0.00116 • IRD = ID1 - ID0 = -0.00061 • IRR = ID1 / ID0 = 0.474 16
Comparing Two Incidence Densities: Poisson Model Method ♦ By using a Poisson model,
standard error of IRD can be estimated by:
I0 I1 SE( IRD) = + 2 ( PT0 ) ( PT1 ) 2 ♦ Thus, an approximate 95% CI
for IRD is given by: IRD ± 1.96∙SE(IRD). • 99% CI: IRD ± 2.58∙SE(IRD)
Exposed
Unexposed
Total
I1
I0
I
PT1
PT0
PT
Cases Pop.-time
♦ Standard error of ln IRR can be
estimated by:
SE(ln IRR ) =
1 1 + I 0 I1
♦ Thus, an approximate 95% CI for IRR
is given by: exp{ ln IRR ± 1.96∙SE(ln IRR) } • 99% CI: exp{ ln IRR ± 2.58∙SE(ln IRR) } 17
Comparing Two Incidence Densities: Example Hormone use
60 30 SE( IRD) = + = 0.00018 2 2 51477.5 54308.7 CHD
SE( lnIRR ) =
1 1 + = 0.224 60 30
Personyears
Yes 30
No 60
Total 90
54308.7
51477.5
105786.2
♦95% CI for:
IRD:
-0.00061 ± 1.96∙0.00018 = (-0.00096, -0.00025)
ln IRR:
ln(0.474) ± 1.96∙0.22 = (-1.178, -0.315)
IRR:
(e-1.178, e-0.315) = (0.308, 0.729)
♦Both CIs allow to reject the null hypothesis of no difference. 18
Comparing Two Incidence Densities: “Test-Based” Method ♦ 95% “test-based” CI for IRD can
Exposed Unexposed Total
be computed as
Cases
IRD ± 1.96 ∙ SE(IRD),
Pop.-time
I1
I0
I
PT1
PT0
PT
where SE(IRD)= IRD / χ and
χ=
PT1 PT I ⋅ PT0 ⋅ PT1 PT 2
I1 − I ⋅
♦ Can be re-expressed as
(1 ± 1.96 / χ) ∙ IRD • 99% CI: (1 ± 2.58 / χ) ∙ IRD
♦ Similarly, SE(ln IRR)= (ln IRR) / χ ♦ 95% “test-based” CI for ln IRR is
ln IRR ± 1.96 ∙ (ln IRR) / χ ♦ Can be written as
( 1 ± 1.96 / χ ) ∙ ln IRR ♦ 95% CI for IRR is thus
exp{ ( 1 ± 1.96 / χ ) ∙ ln IRR} 19
Comparing Two Incidence Densities: Example χ=
χ=
PT1 PT I ⋅ PT0 ⋅ PT1 PT 2
I1 − I ⋅
54308.7 30 − 90 ⋅ 105786.2 = 3.41 90 ⋅ 54308.7 ⋅ 51477.5 105786.2 2
Exposed Unexposed Total Cases Pop.-time
I1
I0
I
PT1
PT0
PT
Hormone use Yes No Total CHD 30 60 90 Person 54308.7 51477.5 105786.2 -years
♦ 95% “test-based” CI for
IRD:
(1 ± 1.96/3.41) ∙ (-0.00061) = (-0.001, -0.0002) Close to the one based on the Poisson approximation (not in general).
ln IRR:
(1 ± 1.96/3.41) ∙ ln(0.474) = (-1.176, -0.317)
IRR:
(e-1.176, e-0.317) = (0.309, 0.728)
20
“Exact” Confidence Interval for IRR ♦ The presented CIs for ln IRR (and IRD) assume that the
estimates of ln IRR vary according to the normal distribution. • Hence their form, e.g., ln(IRR) ± 1.96 ∙ SE(ln IRR).
♦ The use of the normal distribution is an approximation. • Can be problematic, especially in small samples.
♦ It is possible to construct a CI for ln IRR using the “exact”
distribution (i.e., without approximating it by the normal). • The CI is valid in all samples; in large samples, it is close to the approximate CIs. • Computation is a bit more difficult (but easily handled by computers). 21
Standardized Rates ♦ We will introduce the standardization w.r.t. age. ♦ We will assume that our population is stratified by age (i.e.,
subdivided into age-groups). • One needs to define age-groups (e.g., 0-4, 5-9,…).
♦ One needs to compute age-specific rates (ID). • Population-time and no. of cases for each age-group are required.
♦ There are two methods of standardization: • Direct; • Indirect.
22
Standardization ♦ Direct method • Age-specific rates of the study population are applied to the agedistribution of the standard population (rates study → age standard) • Theoretical rate that would have occurred if the rates
observed in the study population applied to the standard population.
♦ Indirect method • Age-specific rates from the standard population are applied to the age-distribution of the study population. (rates standard → age study) 23
Direct Standardization Age Group <40 40-64 65+ Total
Study Population Observed
Person-years
Rate
I1 I2 I3 It
PT1 PT 2 PT 3 PT t
I1/ PT1 I 2/ PT 2 I 3/ PT 3 I t/ PT t
Standard Population (e.g., USA 1990) Observed Population Rate
B1 B2 B3 Bt
N1 N2 N3 Nt
B1/N1 B2/N2 B3/N3 Bt/Nt
♦ Crude Rate in study population = It / PTt . ♦ Directly Standardized Rate (DSR):
DSR = { (I1/PT1)N1 + (I2/PT2)N2 + (I3/PT3)N3 } / Nt = (I1/PT1)(N1/Nt) + (I2/PT2)(N2/Nt) + (I3/PT3)(N3/Nt). Make sure units are consistent!!!
24
Direct Standardization ♦ If there is no confounding, crude rate is adequate. ♦ DSR by itself is not meaningful – it makes sense only when
comparing two or more populations. • If possible, compare age-specific rates.
• The rates should exhibit more or less similar trends (also in the standard).
♦ DSR depends on the choice of the standard population. • The age-distribution of the latter should not be radically different from the compared populations. • There are several standard populations (e.g., for the world, continents etc.). 25
Indirect Standardization ♦ Direct standardization requires age-specific rates for all
compared populations. ♦ If these are not available, or they are imprecise, the
indirect method is preferred. ♦ Both should lead to similar conclusions; if they do not, the
reason should be investigated.
26
Indirect Standardization Age Group
Study Population Obs
Personyears
Rate
I1 I2 I3 It
PT1 PT2 PT3
I1 / PT1 I2 / PT 2 I3 / PT 3
<40 40-64 65+ Total
Standard Population (e.g., USA 1990) Obs Population Rate
B1 B2 B3
N1 N2 N3
B1 /N1 B2 /N2 B3 /N3
Expected
E1= PT 1* (B1 /N1) E2= PT 2* (B2 /N2) E3= PT3* (B3 /N3) E1+ E2+ E3=∑Ej
♦ Standardized (Incidence or Mortality) Ratio (SIR or SMR):
SIR
= It / ∑ Ej = Observed / Expected .
♦ Take Indirectly Standardized Rate (ISR) as:
ISR
= SIR ∙(crude rate for the standard population).
Make sure units are consistent!!!
27
Standardization of Rates: Example ♦ Infant deaths (for children less than 1 year of age) in Colorado and
Louisiana in 1987. • Colorado: 527 deaths out of 53808 life births; crude rate = 9.8 per 1000. • Louisiana: 872 deaths out of 73967 life births; crude rate = 11.8 per 1000. ♦ Crude infant mortality rate for Colorado is lower than for Louisiana. ♦ In the US, infant mortality depends on race.
Race Black White Other Total
Life Births 641567 2992488 175339 3809394
USA, 1987 %Life Infant Births Deaths 16.8 11461 78.6 25810 4.6 1137 100 38408
Rate (x1000) 17.9 8.6 6.5 10.1
28
Standardization of Rates: Example ♦ The distribution of race of new-
born children is different in the two states.
Race
Black White Other ♦ Infant mortality rates depend on Total race.
Colorado Life Birth 3166 48805 1837 53808
% 5.9 90.7 3.4 100
Louisiana Life Births 29670 42749 1548 73967
% 40.1 57.8 2.1 100
• Race is a confounder.
Race ♦ Compare race-specific infant
mortality rates. • Unclear (differences in various directions).
Black White Other Total
Colorado Rate
Louisiana Rate
(x1000)
(x1000)
16.4 9.6 3.3 9.8
17.7 8.0 1.9 11.8 29
Standardization of Rates: Example ♦ Direct standardization: apply state- and race-specific rates to the
standard race distribution (US, 1987). Race
US, 1987
Colorado Rate*Ni
Black
Ni Ni / Nt Rate (x1000) (Births) 641567 0.168 16.4
White
2992488
0.786
Other
175339
0.046
Total
3809394
1
Louisiana Rate (x1000)
Rate*Ni
10521.7
Rate*Ni /Nt (x 1000) 2.76
17.7
11355.7
Rate*Ni /Nt (x1000) 2.98
9.6
28727.9
7.54
8.0
23939.9
6.28
3.3
578.6
0.15
1.9
333.1
0.09
39828.2
10.45
35628.7
9.35
♦ DSR for Colorado: 10.45 (per 1000 life births; crude: 9.8). ♦ DSR for Louisiana: 9.35 (per 1000 life births; crude: 11.8).
30
Standardization of Rates: Example ♦ Indirect standardization: apply race-specific rates of a standard
population (US, 1987) to the race-distribution of the states. Race
US Rate (x1000)
Black White Other Total
17.9 8.6 6.5 10.1
Colorado Life Births Deaths Rate*PTi (PTi)
3166 48805 1837 53808
(Obs.)
52 469 6 527
(Exp. Deaths)
56.7 419.7 11.9 488.3
Louisiana Life Births Deaths Rate*PTi (PTi)
29670 42749 1548 73967
(Obs.)
(Exp. Deaths)
525 344 3 872
531.1 367.6 10.1 908.8
♦ SMR for Colorado: 527/488.3 = 1.08 (8% higher than the US). • ISR = SMR x 10.1 = 10.9 (race-adjusted infant mortality-rate). ♦ SMR for Louisiana: 872/908.8 = 0.96 (4% lower than the US). • ISR = SMR x 10.1 = 9.7 (race-adjusted infant mortality-rate). 31
Standardization of Rates: Example ♦ Is it reasonable to use the
adjusted rates? •The plot of race-specific rates shows similar trend (black>white>other). •The distribution of race in the US is similar to the two states (white>black>other). •Results for both standardization methods are similar.
32
Comparison of Directly Standardized Rates ♦ If we have two standardized rates, we may want to compare them.
♦ For the direct method, assume we have DSR1 and DSR2. ♦ 95% CI can then be obtained using the normal approximation:
(DSR1 - DSR2) ± 1.96 ∙ SE(DSR1 - DSR2) . • 99% CI: (DSR1 - DSR2) ± 2.58 ∙ SE(DSR1 - DSR2) . ♦ The standard error is given by
Nk SE( DSR1 − DSR2 ) = ∑ ⋅ SE( IRDk ) Nt
2
where IRDk is the stratum-specific intensity rate difference. 33
Comparison of Directly Standardized Rates ♦Alternatively, we might look at the standardized rate ratio:
SRR=DSR1/DSR2. ♦95% CI for SRR can be written as: SRR 1 ± (1.96 / Z), where
DSR1 − DSR2 Z= SE( DSR1 − DSR2 ) • 99% CI can be written as: SRR 1 ± (2.58 / Z ).
34
Comparison of Directly Standardized Rates: Example ♦ DSR1 (Colorado): 0.01045 (10.45 per 1000 life births). ♦ DSR2 (Louisiana): 0.00935 (9.35 per 1000 life births).
Race
US Colorado %Births Births Deaths Rate (Ni/Nt)
Black White Other Total
16.8 78.6 4.6 100
(PTi)
3166 48805 1837 53808
SE( DSR1 − DSR2 ) =
(Ii)
52 469 6 527
Louisiana Births Deaths Rate
(IDi)
(PTi)
16.4 9.6 3.3 9.8
29670 42749 1548 73967
(Ii)
525 344 3 872
(IDi)
17.7 8.0 1.9 11.8
IRDi
SEi
(x1000)
(x1000)
-1.3 1.6 1.4
2.4 0.6 1.7
( 0.168 ⋅ 0.0024) 2 + ( 0.786 ⋅ 0.0006) 2 + ( 0.046 ⋅ 0.0017 ) 2 = 0.0006 35
Comparison of Directly Standardized Rates: Example SE( DSR1 − DSR2 ) =
( 0.168 ⋅ 0.0024) 2 + ( 0.786 ⋅ 0.0006) 2 + ( 0.046 ⋅ 0.0017 ) 2 = 0.0006
♦ DSR1 = 0.01045; DSR2 = 0.00935.
♦ DSR1 - DSR2 = 0.0011.
• 95% CI: 0.0011 ± 1.96∙0.0006 = (-0.0002, 0.002). • CI includes 0 - we cannot reject H0 of no difference. ♦ SRR = DSR1 / DSR2 = 1.12.
• Z = (DSR1 - DSR2) / SE = 1.83. • 95% CI: 1.12 1 ± (1.96 / 1.83) = (0.99, 1.26). 36
Comparison of Indirectly Standardized Rates ♦ In directly standardized rates, stratum specific-rates for different study
populations are combined using the same weights (relative stratumsizes in the standard population). ♦ In indirectly standardized rates, the weights (PTi / “expected Ii”) differ. ♦ Thus, technically speaking, ISRs (SIRs) should not be compared. ♦ On the other hand, it is valid to ask whether SIR (or SMR) is different
from 1. ♦ To do that, one can construct a 95% CI, e.g., as follows:
SIR ± 1.96∙(√observed events)/(expected events). 37
Standardization of Rates ♦ Standardization is a simple way to remove effect of
confounding. ♦ It can be extended to more than one confounder. ♦ Similar techniques can be used for differences or ratios of
rates. ♦ An alternative is a stratified analysis (later).
38