EPID 600; Class 11 Screening University of Michigan School of Public Health
1
The New York Times Sunday, October 31, 1999; pg. 5
Bedtime stories. Telephone bills. Life as usual. It ends quickly with the trauma of a breast biopsy – even though most breast biopsies turn out to be benign. This fact has inspired clinical trials of an adjunctive breast screening device designed to distinguish benign from malignant lesions without a breast biopsy. So, life can return to normal for a little sooner for everyone. We invite you to help us. If you’re scheduled for a breast biopsy, ask your doctor about participating in our clinical trials. 2
Why screen? To find people with the disease (or at risk of the disease) who don’t know it In other words…to find people who are pre-symptomatic
3
Why try to find asymptomatic diseased people? To treat disease To cure disease To prevent disease spread To slow down disease progress To study disease natural history
4
Different from identifying people at risk but without disease Identifying people at risk of disease but without disease is done to prevent the disease altogether, to delay disease onset, or to study the “precondition” state
5
Additional thought... Why else might we encourage screening or promote a specific screening test?
6
Additional thought... Why else might we encourage screening or promote a specific screening test? Because we want to do something Because we can For money, fame, and glory
7
A digression (1).... What is a disease? Colon cancer Myocardial infarction What is a condition? High blood pressure High cholesterol What is a marker? High Prostate Specific Antigen 8
A digression (2)... Binary tests Yes vs No Continuous measures Multiple values; may require the choice of a cutoff
point
9
A note: primary vs. secondary prevention Primary prevention Screening that aims to identify risk factors or etiologic factors for disease so that disease occurrence can be prevented Secondary prevention The early detection of disease in the hope of improving prognosis
10
Natural history of a disease
Detectable Preclinical Phase (DPCP)
Onset
Detectable by Screening
Symptomatic
Death
11
Screening
Disease ? Pos
POSITIVES
Test Neg
NEGATIVES
12
Screening
DISEASE Yes TEST
No
Pos
TP
FP
Neg
FN
TN
13
Sensitivity (Sn)
Probability of test positive if disease is present TP Sn =
true positives =
TP + FN
everyone with disease
14
Specificity (Sp) Probability of a negative test if disease is not present
TN Sp =
true negatives =
TN + FP
everyone without disease
15
Sensitivity and specificity Sensitivity and Specificity are characteristics of TEST itself, i.e., how good is the test Changing cutoffs generally increases one at the expense of the other
16
Changing cutoffs
0
6
8
10
12
Disease No Disease
Disease Test
Yes
No
Pos (+)
TP
FP
Neg (–)
FN
TN
14
Changing cutoffs
0
6
8
10
12
Disease No disease
Disease Test
Yes
No
Pos (+)
5
FP
Neg (–)
FN
TN
14
Changing cutoffs
0
6
8
10
14
12
Disease No disease
Disease Test
Yes
No
Pos (+)
5
3
Neg (–)
FN
TN
Changing cutoffs
0
6
8
10
12
Disease No disease
Disease Test
Yes
No
Pos (+)
5
3
Neg (–)
1
9
14
Changing cutoffs
0
6
8
10
14
12
Disease No disease
Disease Test
Yes
No
Pos (+)
5
3
Neg (–)
1
9
Changing cutoffs
0
6
8
10
12
14
Disease No disease
Disease Test
Yes
No
Pos (+)
5
3
Neg (–)
1
9
Sn = 5/(5+1) = 0.83 Sp = 9/(9+3) = 0.75
Changing cutoffs
0
6
8
10
12
14
Disease No disease
Disease Test
Yes
No
Pos (+)
TP
FP
Neg (–)
FN
TN
Changing cutoffs
0
6
8
10
12
14
Disease No disease
Disease Test
Yes
No
Pos (+)
3
1
Neg (–)
3
11
Sn = 3/(3+3) = 0.50 Sp = 11/(11+1) = 0.92
Number Screened
NonCases
1
2
3
4
Cases
5
6
7
8
9 10 11 12 13 14 15 16
Score on Screen 25
Overlapping Area
Number Screened
Screening Level Set at >5
Screening Level Set at >7
NonCases
1
2
3
4
Cases
5
6
7
8
9
10 11 12 13 14 15 16
Score on Screen
26
Issues about sensitivity vs. specificity What is “gold standard” that actually determines if disease is present or not? Cost of false positives and false negatives Anxiety/emotional distress Inconvenience Subsequent testing and mortality
27
Classification of test results Disease yes
no
TP
FP
FN
TN
Sensitivity =
Specificity =
TP
TN
TP + FN
FP +TN 28
Characteristics of tests Validity (accuracy) How close does the test result get to the correct (true) number Reliability (precision) How close are repeat measurements on the same sample?
29
Validity vs Reliability Baby scale examples • Well calibrated scale
• Well calibrated scale
• Allowed to settle before measurement recorded
• Not allowed to settle before measurement recorded
• Scale 6oz off
• Scale 6oz off
• Allowed to settle before measurement recorded
• Not allowed to settle before measurement recorded
X XX XXX
X
X X
X
X XX XX
X Truth = 8lbs
Valid and reliable
X
X X X
Biased = 7lbs 6oz
Valid but not reliable
Not valid but reliable
Not valid and not reliable 30
Four sources of variability Biological variation Test method itself Intra-observer Inter-observer
31
Example...blood pressure variability
BP
Patient A
Patient C
Lowest
86/47
123/78
Highest
126/79
153/107
Casual
108/64
137/103
32
Question addressed so far... If we screen a population, what percent of people with the disease, and without the disease, will be correctly identified by our test? How well does the test work in a population?
33
The clinical question however is If a specific patient has a positive test, what is the probability that this patient really has the disease?
34
Screening...
DISEASE
TEST
Yes
No
Pos
TP
FP
Neg
FN
TN
35
Positive predictive value Likelihood that disease is present IF test is positive
TP PPV =
true positives =
TP + FP
all positives
36
Negative predictive value Likelihood that disease is NOT present IF test is negative
TN NPV =
true negatives =
TN + FN
all negatives
37
Classification of screening test results
TEST (Screening Survey)
pos neg
TP FN
FP
Predictive TP Value (positive) TP + FP
TN
Predictive TN Value (negative) FN +TN
38
PPV and NPV PPV and NPV are characteristics of test and of disease prevalence PPV is influenced by disease prevalence and more by the specificity of test* The greater the prevalence and the specificity, the greater is the PPV NPV is influenced by disease prevalence and more by the sensitivity of test* The lower the prevalence and the greater the sensitivity, the greater is NPV *when
disease is rare 39
PPV, example 1
Disease Prevalence = 1% True Status Sick
Not-Sick
Total
495
-
99 1
9405
594 9406
Total
100
9900
10,000
+
Test Result
Test Sensitivity = 99% Test Specificity = 95%
Positive Predictive Value
99 =
99 + 495
=
17%
40
PPV, example 2
Disease Prevalence = 5% True Status Sick
Not-Sick
Total
475
-
495 5
9025
970 9030
Total
500
9500
10,000
+
Test Result
Test Sensitivity = 99% Test Specificity = 95%
Positive Predictive Value
495 =
495 + 475
=
51%
41
Relationship of disease prevalence to predictive value of a positive test True Status
Test Sensitivity = 99% Test Specificity = 95%
Test Result
Case
Non-Case
+
TP
FP
-
FN
TN
Total
Prevalence Rate = 1%
Predictive Value (positive) = 17%
Prevalence Rate = 5%
Predictive Value (positive) = 51%
Total
10,000
42
Classification of screening test results Disease yes TEST (Screening Survey)
pos neg
no
FP
Predictive TP Value (positive) TP + FP
FN
TN
Predictive TN Value (negative) FN +TN
Sensitivity =
Specificity =
TP
TN
TP + FN
FP +TN
TP
43
Epidemiologic approach to the evaluation of screening programs Key question: do patients benefit from early detection of disease? 1. 2. 3. 4. 5. 6. 7.
Can the disease be detected early? What are the sensitivity and specificity of the test? What is the predictive value of the test? How serious is the problem of false-positive results? What is the cost of early detection in terms of funds, resources, and emotional impact? Are the subjects harmed by the screening tests? Do the individuals in whom disease is detected early benefit from the early detection, and is there an overall benefit to those who are screened? 44
Mammography and mortality reduction The US recommends annual screening for breast cancer for women above age 40 From a public health perspective it may be argued that this is justifiable only if screening reduces breast cancer mortality If screening is offered to all women in the target group, no well defined control group is available A study was done in Denmark to examine the varying estimates of breast cancer mortality reduction based on different control groups
45 Olsen et al. Estimating the benefits of mammography screening: the impact of study design. Epidemiology. 2007; 18: 487-492
Mammography and mortality reduction The study population included all women invited to screen in Copenhagen from April 1991 to March 2001 The women were followed for breast cancer mortality Person years at risk counted as date of first invitation until date of death, emigration from Denmark, or end of followup (March 2001)
46 Olsen et al. Estimating the benefits of mammography screening: the impact of study design. Epidemiology. 2007; 18: 487-492
Mammography and mortality reduction Control group 1: Concurrent regional. Women in the same age group living in Denmark from April 1991-2001, outside the region of organized screening programs Control group 2: Local historical. These were women from the same age group living at any time between April 1981 and March 1991 (10 years before the program) Control group 3: Historical-regional. These women were in the same age group and living in Denmark, from 1981-1991, living outside of the region that later implemented organized screening programs 47 Olsen et al. Estimating the benefits of mammography screening: the impact of study design. Epidemiology. 2007; 18: 487-492
Mammography and mortality reduction 1. Local historical. This analysis showed a reduction of 20%; the “lesser benefit” was probably due to the increase in incidence in breast cancer over time 2. Concurrent regional. This analysis yielded a reduction in breast cancer mortality of 9%. Breast cancer incidence and mortality was higher in Copenhagen than in the rest of Denmark before screening. 3. Historical regional. This analysis estimated a 25% decrease in breast cancer mortality. This controlled for time and region. Probably the best method. 48 Olsen et al. Estimating the benefits of mammography screening: the impact of study design. Epidemiology. 2007; 18: 487-492
Factors influencing epidemiologic approach to the evaluation of screening programs 1.
Natural history of disease
2.
Pattern of disease progression
3.
Methodologic issues
4.
Study designs for evaluation of screening
5.
Problems in assessing sensitivity and specificity of tests
6.
Interpreting study results that show no benefit of screening
7.
Cost benefit analysis of screening
49
Natural history To discuss methodologic issues involved in evaluating the benefit of screening, we need to understand natural history of disease
50
Natural history
51
Pattern of disease progression
52
Methodologic issues There are concerns particular to screening and an understanding of why decisions about whether or not to use screening tests are controversial requires consideration of the biases that can arise with screening Detection Lead time bias Length time bias
53
Detection Screening appears to have a positive effect since disease precursor is detected in persons who would not ultimately develop symptoms or die from the disease
Screening dx Initiation
Disease Detectable by Screening
NO Clinical Symptoms
Death from other causes NO Complications from disease
54
Detection Screening appears to have a positive affect since disease precursor is detected in persons who would not ultimately develop symptoms or die from the disease Example: Blood pressure screening leads to people with high blood pressure being told that they have hypertension. While people with hypertension are more likely to develop diseases such as stroke, not all of them will.
55
Lead-time bias Survival appears to be increased among screen-detected cases because diagnosis was made earlier in the disease
Screening dx
Initiation
Disease detectable by screening
Usual dx
Clinical Complications symptoms from the disease
Death
56
Lead-time bias Screening for lung cancer with chest X-rays is an example of lead time bias. When tumors can be detected earlier, screening will seem to prolong life compared to persons who are not screened and in whom disease is detected later
Lead Time Bias Positive Screening Outcomes
57
Lead-time bias and 5 year survival
58
Length-time bias People with a more protracted preclinical phase have a greater probability of coming to screening. If a protracted preclinical phase is associated with a better prognosis or survivorship, then screening may actually look better than it is because of its affiliation with a protracted preclinical phase.
Initiation
Initiation
Disease Detectable by Screening
Disease Detectable by Screening
Death Clinical Complications from the disease Symptoms
Clinical Symptoms
Complications from the disease
Death
59
Length-time bias example Example: Length-time bias may occur when carcinomas-in-situ are picked up with breast screening. These may be slow-growing precursors to cancer. Their early detection and treatment may appear to improve mortality from the disease.
60
Epidemiologic study designs to evaluate screening Non randomized studies Case-control Individuals with and without disease are compared; controls should be representative of the population from which disease cases emerged Cohort Compare the rate of disease in those who chose to be screened vs. who choose not to be screened Randomized studies Randomized trials Most evidence about the efficacy of screening comes from nonexperimental designs: randomize to screening vs. no screening and compare rates of disease
61
Problems in assessing the Sensitivity and Specificity of tests New screening programs are frequently initiated after a screening test becomes available for the first time. Usually claims are made (by manufacturers of test kits, investigators etc.) that the test has high Sn and Sp. However, not always easy to demonstrate.
62
Interpreting study results that show no benefit of screening The apparent lack of benefit may be inherent in the natural history of the disease (e.g., the disease has no detectable preclinical phase or an extremely short detectable preclinical phase). The therapeutic intervention currently available may not be any more effective when it is provided earlier than when it is provided at the time of usual diagnosis. The natural history and currently available therapies may have the potential for enhanced benefit, but inadequacies of the care provided to those who screen positive may account for the observed lack of benefit (that is, there is efficacy, but poor effectiveness).
63
Cost-benefit analysis of screening Cost issues when evaluating screening include financial but also non-financial issues. 1.
There must be good evidence that each test or procedure recommended is medically effective in reducing morbidity and mortality
2.
The medical benefits must outweigh risks
3.
The costs of each test or procedure must be reasonable compared to expected benefits
4.
The recommended actions must be practical and feasible
Source: American Cancer Society
64
Screening conclusions Screening assumes that we can do something with the positive screen There are real costs of false negatives and false positives We should not be screening “just because we can”
65