Discus – A software program to assess judgment of glaucomatous damage in optic disc photographs.
Short title
Discus
Words, Figures, Tables
3600, 3, 2
Codes & Presentations
GL, Poster at ARVO meeting in May 2008 (program # 3625)
Keywords
glaucoma, optic disc, sensitivity, specificity, diagnostic performance
www.wordle.net courtesy Jon Feinberg
Authors
Jonathan Denniss, MCOptom1,2 Damian Echendu, OD, MSc1 David B Henson, PhD, FCOptom1,2 Paul H Artes, PhD1,2,3 (corresponding author)
Affiliations & Correspondence
1
Research Group for Eye and Vision Sciences, University of Manchester, England 2
Manchester Royal Eye Hospital, Manchester, England
3
Ophthalmology and Visual Sciences, Dalhousie University Rm 2035, West Victoria 1276 South Park St, Halifax, Nova Scotia B3H 2Y9, Canada
[email protected] Commercial Relationships
None
Support
College of Optometrists PhD studentship (JD) Nova Scotia Health Research Foundation Grant Med-727 (PHA)
1
1
Abstract
2
Aim
3
To describe a software package (Discus) for evaluating clinicians’ assessment of optic disc damage,
4
and to provide reference data from a group of expert observers.
5
Methods
6
Optic disc images were selected from patients with manifest or suspected glaucoma or ocular
7
hypertension who attended the Manchester Royal Eye Hospital. Eighty images came from eyes
8
without evidence of visual field (VF) loss in at least 4 consecutive tests (VF-negatives), and 20
9
images from eyes with repeatable VF loss (VF-positives). Software was written to display these
10
images in randomized order, for up to 60 seconds. Expert observers (n=12) rated optic disc damage
11
on a 5-point scale (definitely healthy, probably healthy, not sure, probably damaged, definitely
12
damaged).
13
Results
14
Optic disc damage as determined by the expert observers predicted VF loss with less than perfect
15
accuracy (mean area under receiver-operating curve [AUROC], 0.78; range 0.72 to 0.85). When the
16
responses were combined across the panel of experts, the AUROC reached 0.87, corresponding to a
17
sensitivity of ~60% at 90% specificity. While the observers’ performances were similar, there were
18
large differences between the criteria they adopted (p<0.001), even though all observers had been
19
given identical instructions.
20
Conclusion
21
Discus provides a simple and rapid means for assessing important aspects of optic disc interpretation.
22
The data from the panel of expert observers provide a reference against which students, trainees, and
23
clinicians may compare themselves. The program and the analyses described in this paper are freely
24
accessible from http://discusproject.blogspot.com/.
2
25
Introduction
26
The detection of early damage of the optic disc is an important yet difficult task.1, 2
27
In many patients with glaucoma, optic disc damage is the first clinically detectable sign of disease. In
28
the Ocular Hypertension Treatment Study, for example, almost 60% of patients who converted to
29
glaucoma developed optic disc changes before exhibiting reproducible visual field damage.3, 4
30
Broadly similar findings were obtained in the European Glaucoma Prevention Study; in
31
approximately 40% of those participants who developed glaucoma, optic disc changes were
32
recognised before visual field changes.5 However, the diverse range of optic disc appearances in a
33
healthy population, combined with the many ways in which glaucomatous damage may affect the
34
appearance of disc, make it difficult to detect features of early damage.6, 7
35
While several imaging technologies have been developed in the last decades (confocal scanning laser
36
tomography, nerve fibre layer polarimetry, and optical coherence tomography) which provide
37
reproducible assessment of the optic disc and retinal nerve fibre layer, the diagnostic performances
38
of these technologies have not been consistently better than that achieved by clinicians.8-11 Subjective
39
assessment of the optic disc, either by slitlamp biomicroscopy or by inspection of photographs,
40
therefore still plays a pivotal role in the clinical care of patients at risk from glaucoma.8
41
Many papers describe the optic disc changes in glaucoma6, 7, 12-14 and several authors have looked at
42
either at the agreement between clinicians in diagnosing glaucoma, differentiating between different
43
types of optic disc damage, or in estimating specific parameters such as cup/disc ratios.15-25
44
However, because there is no objective reference standard for optic disc damage, it is difficult for
45
students, trainees, or clinicians to assess their judgments against an external reference.
46
In this paper, we describe a software package (“Discus”) which observers can use to view and
47
interpret a set of selected optic disc images under controlled conditions. We further present reference
48
data from 12 expert observers against which future observers can be evaluated, or evaluate
49
themselves.
50
3
50
Methods
51
Selection of Images
52
To obtain a set of optic disc images with a wide spectrum of early glaucomatous damage, data were
53
selected from patients who had attended the Optometrist-lead Glaucoma Assessment (OLGA) clinics
54
at the Royal Eye Hospital (Manchester, UK) between June 2003 and May 2007. This clinic sees
55
patients who are deemed at risk of developing glaucoma, for example due to ocular hypertension, or
56
who have glaucoma but are thought of as being at low risk of progression and are well controlled on
57
medical therapy. Patients undergo regular examinations (normally in intervals of 6 months) by
58
specifically trained optometrists. During each visit, visual field examinations (Humphrey Field
59
Analyzer program 24-2, SITA-Standard) and non-stereoscopic fundus photography are performed
60
(Topcon TRC-50EX, field-of-view 20 degrees, resolution 2000×1312 pixels, 24 bit colour).
61
For this study, images were considered for inclusion if the patient had undergone at least 4 visual
62
field tests on each eye (n=665). The 4 most recent visual fields were then analysed to establish two
63
distinct groups, visual field (VF-) positive and VF-negative (Table 1). Images from patients who did
64
not meet the criteria of either group were excluded. Table 1: Inclusion criteria for VF-positive and VF-negative groups. For inclusion in the VF-negative group, the criteria had to be met with both eyes. In addition, the between-eye differences in MD and PSD had to be less than 1.0 dB
MD
PSD
VF-positive
between -2.5 and -10.0 dB
between 3.0 and 15.0 dB
VF-negative
better than [>] -1.5 dB 1
better than [<] 2.0 dB 1
65
If both eyes of a patient met these criteria, a single eye was randomly selected. A small number of
66
eyes (n=17) were excluded owing to clearly non-glaucomatous visual field loss (for example,
67
hemianopia) or non-glaucomatous lesions visible on the fundus photographs (eg chorioretinal scars).
68
There were 155 eyes in the VF-positive and 144 eyes in the VF-negative group.
69
To eliminate any potential clues other than glaucomatous optic disc damage, we matched the image
70
quality in VF-negative and VF-positive groups. One of the authors (DE) viewed the images on a
71
computer monitor in random order and graded each one on a five-point scale for focus and
72
uniformity of illumination. During grading, the observer was unaware of the status of the image (VF-
73
positive or -negative), and the area of the disc had been masked from view. A final set of 20 VF-
74
positive images and 80 VF-negative images was then created such that the distribution of image
75
quality was similar in both groups (Table 2). The total size of the image set (100), and the ratio of 4
76
VF-positive to VF-negative images (20:80), had been decided on beforehand to limit the duration of
77
the experiments and to keep the emphasis on discs with early damage. Table 2: Characteristics of VF-positive and VF-negative groups Image quality was scored subjectively on a scale from 1 to 5. Differences between groups were tested for statistical significance by Mann-Whitney U (MWU) tests.
Image Quality
Age, y
MD, dB
PSD
VF-positive (n=20)
1.82 (1.20)
66.0 (13.1)
-6.20 (1.76)
5.58 (2.15)
VF-negative (n=80)
1.68 (1.33)
61.3 (9.3)
+0.60 (0.4)
1.50 (0.16)
p-value (MWU)
0.67
0.35
<0.001
<0.001
78
Expert Observers
79
For the present study, 12 expert observers (either glaucoma fellowship-trained ophthalmologists
80
working in glaucoma sub-speciality clinics (n=10) or scientists involved in research in the optic disc
81
in glaucoma (n=2) were selected as observers. Observers were approached ad-hoc during scientific
82
meetings or contacted by e-mail or letter with a request for participation.
83
Prior to the experiments, the observers were given written instructions detailing the selection of the
84
image set. The instructions also stipulated that responses should be given on the basis of apparent
85
optic disc damage rather than the perceived likelihood of visual field damage.
86
5
86
Experiments
87
In order to present images under controlled conditions, and to collect the observers’ responses, a
88
software package Discus (3.0E, figure 1) was developed in Delphi (CodeGear, San Francisco, CA).
89
Details on availability and configuration of the software are provided in the Appendix.
90
The software displayed the images, in random order, on a computer monitor. After the observer had
91
triggered a new presentation by hitting the “Next” button, an image was displayed until the observer
92
responded by clicking one of 5 buttons (definitely healthy, probably healthy, not sure, probably
93
damaged, definitely damaged). After a time-out period of 60 seconds the image would disappear, but
94
observers were allowed unlimited time to give a response. To guard against occasional finger-errors,
95
observers were also allowed to change their response, as long as this occurred before the “Next”
96
button was hit.
97
To assess the consistency of the observers, 26 images were presented twice (2 in the VF-positive
98
group, 24 in the VF-negative group). No feedback was provided during the sessions.
Fig 1: Screenshot of Discus software. Images remained on display for up to 60 seconds, or until the observer clicked on one of the 5 response categories. A new presentation was triggered by hitting the “Next” button.
99
100
Analysis
101
The responses were transformed to a numerical scale ranging from -2 (“definitely healthy”) to +2
102
(“definitely damaged”. The proportion of repeated images in which the responses differed by one or
103
more categories was calculated, for each observer. For all subsequent analyses, however, only the
104
last of the two responses was used. All analyses were carried out in the freely available open-source
105
environment R, and the ROCR library was used to plot the ROC curves.26, 27 6
106
Individual observers’ ROC curves
107
To obtain an objective measure of individual observers’ performance at discriminating between eyes
108
with and without visual field damage, ROC curves were derived from each set of responses. For this
109
analysis, the visual field status was the reference standard, and responses in the “not sure” category
110
were interpreted as between “probably healthy” and “probably damaged”. If an observer had used all
111
five response categories, the ROC curve would contain 4 points (A – D). Point A, the most
112
conservative criterion (most specific but least sensitive) gave the sensitivity and specificity to visual
113
field damage when only the “definitely damaged” responses were treated as test positives while all
114
other responses (“probably damaged”, “not sure”, “probably healthy”, “definitely healthy”) were
115
interpreted as test negatives. For point D, the least conservative criterion (most sensitive but least
116
specific), only “definitely healthy” responses were interpreted as test negatives, and all other
117
responses as test positives.
118
Individual observers’ criteria
119
When using a subjective scale, as in the current study, the responses are dependent on the observer’s
120
interpretation of the categories and their individual inclination to respond with “probably damaged”
121
or “definitely damaged” (response criterion). A cautious observer, for example, might regard a
122
particular ONH as “probably damaged” whilst an equally skilled but less cautious observer might
123
respond with “not sure” or “probably healthy”. To investigate the variation in criteria within our
124
group, we compared the observers’ mean responses across the entire image set.
125
Combining responses of expert observers
126
To estimate the performance of a panel of experts, and to obtain a reference other than visual field
127
damage for judging current as well as future observer’s responses, the mean response of the 12
128
expert observers was calculated for each of the 100 images.
129
To estimate if the expert group (n=12) was sufficiently large, we investigated how the performance
130
of the combined panel changed depending on the number of included observers. Areas under the
131
ROC curve were calculated for all possible combinations of 2, 3, 4…11 observers to derive the mean
132
performance, as well as the minimum and maximum.
133
Relationship between responses of individual observers and expert panel
134
As a measure of overall agreement between the expert observers, independent of their individual
135
response criteria, the Spearman rank correlation coefficient between the 12 sets of responses was
136
computed. The underlying rationale of this analysis is that, by assigning each image to one of five
137
ordinal categories, each observer had in fact ranked the 100 images. If two observers had performed
138
identical ranking, the Spearman coefficient would be 1, regardless of the actual responses assigned. 7
139
Results
140
The experiments took between 13 and 46 minutes (mean, 29 min) to complete. On average, the
141
observers responded 7 seconds after the images were first presented on the screen, and the median
142
response latencies of individual observers ranged from 4 to 16 seconds. The reproducibility of
143
individual observer’s responses was moderate - on average, discrepancies of one category were seen
144
in 44% (12) of 26 repeated images (range, 23 – 62%).
145
Individual observers’ results are shown in Fig. 2A-L. The points labelled A, B, C, and D represent
146
the trade-off between the positive rates in the VF-positive (vertical axis) and VF-negative groups
147
(horizontal axis) achieved with the four possible classification criteria. Point A, for example, shows
148
the trade-off when only discs in the “definitely damaged” category are regarded as test-positives.
149
Point B gives the trade-off when discs in both “definitely damaged” and “probably damaged”
150
categories are regarded as test-positives. For D, the least conservative criterion, only responses of
151
“definitely healthy” were interpreted as negatives. To indicate the precision of these estimates, the
152
95% confidence intervals were added to point B.
153
Areas under the curve (AUROC) ranged from 0.71 (95% CI, 0.58, 0.85) to 0.88 (95% CI, 0.82,
154
0.96), with a mean of 0.79. There was no relationship between observers’ overall performance and
155
their median response latency (Spearman’s rho = 0.34, p = 0.29).
156
In contrast to their similar overall performance, the observers’ response criteria differed substantially
157
(p<0.001, Friedman test). For example, the proportion of discs in the VF-positive category which
158
were classified as “definitely damaged” ranged from 15% to 90%, while the proportion of discs in
159
the VF-negative category classified as “definitely healthy” ranged from 8% to 68%. In Fig 2A-L, the
160
response criterion is represented by the inclination of the red line with its origin in the bottom right
161
corner. If the responses had been exactly balanced between the “damaged” and “healthy” categories,
162
the inclination of the line would be 45 degrees. A more horizontal line represents a more
163
conservative criterion (less likely to respond with “probably damaged” or “definitely damaged”,
164
while a more vertical line represents a less conservative criterion. There was no relationship between
165
the observers’ performance (AUROC) and their response criterion (Spearman’s rho 0.41, p = 0.18).
166
To derive the “best possible” performance as a reference for future observers, the responses of the
167
expert panel were combined by calculating the mean response obtained for each image. The ROC
168
curve for the combined responses (grey curve in Fig. 2A-L) enclosed an area of 0.87.
169
8
169
Fig. 2. Receiver-operating characteristic (ROC) curves for the classification of optic disc photographs by the 12 expert observers (A-L), with a reference standard of visual field damage. The x-axis (positive rate in VF-negative group) measures specificity to visual field damage, while the y-axis (positive rate in the VF-positive group) gives the sensitivity. Point A (most conservative criterion) shows the trade-off between sensitivity and specificity when only“definitely damaged” responses are interpreted as test positives. For point D (the least conservative criterion) shows the trade-off when all but “definitely healthy” responses are interpreted as test positives. Boxplots (right) give the distributions of response latencies, and the number of times each response was selected. 170
9
170
10
Fig. 2 (cont). To facilitate comparison, the grey ROC curve, and the dotted grey line, represent the performance and the criterion of the group as a whole, respectively. Results provided in numerical format are the area under the ROC curve (AUC), the percentage of the AUC as compared to that of the entire group (individual ROC area – 0.5) / (expert panel ROC area – 0.5), the Spearman rank correlation of the individual’s responses with those of the entire group, the mean difference between repeated responses, and the average response as a measure of criterion (-2=”definitely healthy”, -1=”probably healthy”, 0=”not sure”, 1=”probably damaged”, and 2=”definitely damaged”.
171
11
171
To investigate how the performance of an expert panel varies with the number of contributing
172
observers, the area under the ROC curve was derived for all possible combinations of 2, 3, 4, etc, up
173
to 11, observers (Fig. 5). The limit of the ROC area was approached with 6 or more observers, and it
174
appeared that a further increase in the number of observers would not have had a substantial effect
175
on the performance of the panel. Fig. 3 Performance (area under ROC curve) of combined expert panel as a function of included observers. All possible combinations of 2 to 11 observers were evaluated. The mean area under the ROC curve approaches its limit with approximately 6 observers.
176
177
Individual observers’ Spearman rank correlation coefficient with the combined expert panel ranged
178
from 0.62 to 0.86, with a median of 0.79. There was no relationship between the Spearman
179
coefficient and the area under the ROC curve (r = 0.09, p = 0.78).
180
181
12
181
Discussion
182
The objective of this work was to establish an easy-to-use tool for clinicians, trainees, and students to
183
assess their skill at interpreting optic discs for signs of glaucoma-related damage, and to provide data
184
from a panel of experts as a reference for future observers. The study also showed that meaningful
185
experiments with Discus can be performed within a relatively short time.
186
All observers in this study had ROC areas significantly smaller than 1, and even when the judgments
187
of the observers were averaged, the combined responses of the panel failed to discriminate perfectly
188
between optic discs in the VF-positive and VF-negative groups. These findings are not surprising,
189
given the lack of a strong association between structure and function in early glaucoma that has been
190
reported by many previous studies.28-33 However, the experiments provide a powerful illustration of
191
how difficult it is to make diagnostic decisions in glaucoma based solely on the optic disc.
192
Estimated at a specificity fixed at 90%, the combined panel’s sensitivity to visual field loss was 60%.
193
This is within the range of performances previously reported for clinical observers and objective
194
imaging tools.9, 34-37 Unfortunately, objective imaging data are not available for the patients in the
195
current dataset and we are therefore unable to perform a direct comparison. However, the
196
methodology developed in this paper may prove useful for future studies that compare diagnostic
197
performance between clinicians and imaging tools in different clinical settings. A potential weakness
198
of our study was the relatively small size of the expert group (n = 12). However, by averaging every
199
possible combination of 2 to 11 observers within the group, we demonstrated that our panel was
200
likely to have attained near-maximum performance, and that a larger group of observers was unlikely
201
to have changed our findings substantially.
202
One challenging issue is how to derive complete and easily interpretable summary measures of
203
performance, in the absence of a reference standard of optic disc damage. Such summary measures
204
would be useful for giving feedback and for establishing targets for students and trainees. We used
205
visual field data as the criterion to separate optic disc images into VF-positive and VF-negative
206
groups, and there was no selection based on the presence or type of optic disc damage which would
207
have biased our sample.38-40 The ROC area therefore measures the statistical separation between an
208
observer’s responses to optic discs in eyes with and without visual field damage.41, 42 However,
209
owing to the lack of a strong correlation between structure and function, visual field loss is not an
210
ideal metric for optic disc damage in early glaucoma. For example, it is likely that a substantial
211
proportion of the VF-negative images show early structural damage, whereas some optic discs in the
212
VF-positive group may still appear healthy.
213
We have attempted to address the problem of a lacking reference standard in two complementary
214
ways. First, a new observer’s ROC area can be compared to that of the expert panel, such that the 13
215
statistic is re-scaled to cover a potential range from near zero (corresponding to chance performance,
216
AUROC = 0.5) to around 100% (AUROC = 0.87, performance of expert panel).
217
Second, we suggest that the Spearman rank correlation coefficient may be useful as a measure of
218
agreement between a future observer’s responses and those of the expert panel.43 Because this
219
coefficient takes into account the relative ranking of the responses, and not their overall magnitude, it
220
is independent of the observer’s response criterion. Consider, for example, three images graded as
221
“probably damaged”, “probably healthy”, and “definitely healthy” by the expert group. An observer
222
responding with “definitely damaged”, “not sure”, and “probably healthy” would differ in criterion
223
but agree on the relative ranking of damage, and their rank correlation with the expert panel would
224
be 1.0 (perfect). Our data suggest that observers may achieve similar ROC areas with rather different
225
responses (consider observers D and F as an example), and the lack of association between the ROC
226
area and the rank correlation means that these statistics measure somewhat independent aspects of
227
decision-making.
228
A surprising finding was that individual observers in our study adopted very different response
229
criteria, even though they had been provided with identical written instructions and identical
230
information on the source of the images and the distribution of visual field damage in the sample
231
(compare observers A and E, for example). It is possible that we might have been able to control the
232
criteria more closely, for example by instructing observers to use the “probably damaged” category if
233
they believed that the chances for the eye to be healthy were less than, say, 10%. More importantly,
234
however, our findings underscore the need to distinguish between differences in diagnostic
235
performance, and differences in diagnostic criterion, whenever subjective ratings of optic disc
236
damage are involved. This is the principal reason for why we avoided the use of kappa statistics
237
which measure overall agreement but do not isolate differences in criterion.44, 45
238
The outpatient clinic from which our images were obtained sees a relatively high proportion of
239
patients suspected of having glaucoma who do not have visual field loss. Because our image sample
240
is not representative of an unselected population, the ROC curves are likely to underestimate
241
clinicians’ true performance at detecting glaucoma by ophthalmoscopy. However, the use of a
242
“difficult” data set may also be seen as an advantage as it allows observers’ performance to be
243
assessed on the type of optic disc more likely to cause diagnostic problems in clinical practice.
244
In addition to the source of our images, here are several other reasons for why the performance on
245
Discus should not be regarded as providing a truly representative measure of an observer’s real-
246
world diagnostic capability. First, we used non-stereoscopic images. Stereoscopic images would
247
have been more representative of slitlamp biomicroscopy, the current standard of care, and there is
248
evidence that many features of glaucomatous damage may be more clearly apparent in stereoscopic
249
images.46 However, the gain over monoscopic images is probably not large.47-50 Second, Discus does 14
250
not permit a comparison of fellow eyes which often provides important clues in patients with early
251
damage.51 Third, through the display of photographic images on a computer monitor we can not
252
assess an observer’s aptitude at obtaining an adequate view of the optic disc in real patients.
253
Notwithstanding these limitations, we believe that Discus provides a useful assessment of some
254
important aspects of recognising glaucomatous optic disc damage. Further studies with Discus are
255
now being undertaken to examine the performance of ophthalmology residents and other trainees as
256
compared to our expert group. These studies will also provide insight into which features of
257
glaucomatous optic disc damage are least well recognised, and how clinicians use information on
258
prior probability in their clinical decision-making.
259
Conclusions
260
The Discus software may be useful in the assessment and training of clinicians involved in the
261
detection of glaucoma. It is freely available from http://discusproject.blogspot.com, and interested
262
users may analyse their results using an automated web server on this site.
263
Acknowledgements
264
Robert Harper, Amanda Harding and Jo Marcks of the OLGA clinic at the Manchester Royal Eye
265
Hospital supported this project and contributed ideas. Jonathan Layes (Medicine) and Bijan Farhoudi
266
(Computer Science) of Dalhousie University helped to improve the software and to implement an
267
automated analysis on our server. We are most grateful to all 12 anonymous observers for their
268
participation.
269
Appendix
270
At present, Discus is available only for the Windows operating systems. The software can be called
271
with different start-up parameters. These parameters (and their defaults) are:
272
1) Duration of image presentations, in ms (10000)
273
2) Rate of Repetitions in the visual field positive group (0.1)
274
3) Rate of Repetitions in the visual field negative group (0.3)
275
4) Save-To-Desktop status (1)
276
If the Save-To-Desktop status is set to 1, a tab delimited file will be saved to the desktop. The user
277
can then upload this file to our server and retrieve their results after a few seconds.
278
279
15
279
References
280 281
1.
Weinreb RN, Tee Khaw P. Primary open-angle glaucoma. Lancet 2004;363:1711-1720.
282
2.
Garway-Heath DF. Early diagnosis in glaucoma. In: Nucci C, Cerulli L, Osborne NN, Bagetta G (eds), Progress in Brain Research; 2008:47-57.
3.
Gordon MO, Beiser JA, Brandt JD, et al. The Ocular Hypertension Treatment Study: Baseline Factors That Predict the Onset of Primary Open-Angle Glaucoma. Archives of Ophthalmology 2002;120:714.
4.
Keltner JL, Johnson CA, Anderson DR, et al. The association between glaucomatous visual fields and optic nerve head features in the Ocular Hypertension Treatment Study. Ophthalmology 2006;113:1603-1612.
5.
Predictive Factors for Open-Angle Glaucoma among Patients with Ocular Hypertension in the European Glaucoma Prevention Study. Ophthalmology 2007;114:3-9.
6.
Broadway DC, Nicolela MT, Drance SM. Optic Disk Appearances in Primary Open-Angle Glaucoma. Survey of Ophthalmology 1999;43:223-243.
7.
Jonas JB, Budde WM, Panda-Jonas S. Ophthalmoscopic evaluation of the optic nerve head. Survey of Ophthalmology 1999;43:293-320.
8.
Lin SC, Singh K, Jampel HD, et al. Optic Nerve Head and Retinal Nerve Fiber Layer Analysis: A Report by the American Academy of Ophthalmology. Ophthalmology 2007;114:1937-1949.
9.
Sharma P, Sample PA, Zangwill LM, Schuman JS. Diagnostic Tools for Glaucoma Detection and Management. Survey of Ophthalmology 2008;53.
10.
Zangwill LM, Bowd C, Weinreb RN. Evaluating the Optic Disc and Retinal Nerve Fiber Layer in Glaucoma II: Optical Image Analysis. Seminars in Ophthalmology 2000;15:206 220.
11.
Mowatt G, Burr JM, Cook JA, et al. Screening Tests for Detecting Open-Angle Glaucoma: Systematic Review and Meta-analysis. Invest Ophthalmol Vis Sci 2008;49:5373-5385.
12.
Fingeret M, Medeiros FA, Susanna Jr R, Weinreb RN. Five rules to evaluate the optic disc and retinal nerve fiber layer for glaucoma. Optometry 2005;76:661-668.
13.
Susanna Jr R, Vessani RM. New findings in the evaluation of the optic disc in glaucoma diagnosis. Current Opinion in Ophthalmology 2007;18:122-128.
14.
Caprioli J. Clinical evaluation of the optic nerve in glaucoma. Transactions of the American Ophthalmological Society 1994;92:589.
15.
Lichter PR. Variability of expert observers in evaluating the optic disc. Transactions of the American Ophthalmological Society 1976;74:532.
16.
Tielsch JM, Katz J, Quigley HA, Miller NR, Sommer A. Intraobserver and interobserver agreement in measurement of optic disc characteristics. Ophthalmology 1988;95:350-356.
17.
Nicolela MT, Drance SM, Broadway DC, Chauhan BC, McCormick TA, LeBlanc RP. Agreement among clinicians in the recognition of patterns of optic disk damage in glaucoma. American journal of ophthalmology 2001;132:836-844.
18.
Spalding JM, Litwak AB, Shufelt CL. Optic nerve evaluation among optometrists. Optom Vis Sci 2000;77:446-452.
19.
Harper R, Reeves B, Smith G. Observer variability in optic disc assessment: implications for glaucoma shared care. Ophthalmic Physiol Opt 2000;20:265-273.
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322
16
323
20.
Harper R, Radi N, Reeves BC, Fenerty C, Spencer AF, Batterbury M. Agreement between ophthalmologists and optometrists in optic disc assessment: training implications for glaucoma co-management. Graefes Archive Clin Exp Ophthalmol 2001;239:342-350.
21.
Spry PG, Spencer IC, Sparrow JM, et al. The Bristol Shared Care Glaucoma Study: reliability of community optometric and hospital eye service test measures. The British journal of ophthalmology 1999;83:707-712.
22.
Abrams LS, Scott IU, Spaeth GL, Quigley HA, Varma R. Agreement among optometrists, ophthalmologists, and residents in evaluating the optic disc for glaucoma. Ophthalmology 1994;101:1662-1667.
23.
Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for glaucoma. Ophthalmology 1992;99:215-221.
24.
Azuara-Blanco A, Katz LJ, Spaeth GL, Vernon SA, Spencer F, Lanzl IM. Clinical agreement among glaucoma experts in the detection of glaucomatous changes of the optic disk using simultaneous stereoscopic photographs. American journal of ophthalmology 2003;136:949950.
25.
Sung VCT, Bhan A, Vernon SA. Agreement in assessing optic discs with a digital stereoscopic optic disc camera (Discam) and Heidelberg retina tomograph. BMJ; 2002:196202.
26.
Ihaka R, Gentleman R. R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 1996;5:299-314.
27.
Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics 2005;21:3940-3941.
28.
Anderson RS. The psychophysics of glaucoma: Improving the structure/function relationship. Progress in Retinal and Eye Research 2006;25:79-97.
29.
Garway-Heath DF, Holder GE, Fitzke FW, Hitchings RA. Relationship between electrophysiological, psychophysical, and anatomical measurements in glaucoma. Investigative Ophthalmology and Visual Science 2002;43:2213-2220.
30.
Johnson CA, Cioffi GA, Liebmann JR, Sample PA, Zangwill LM, Weinreb RN. The relationship between structural and functional alterations in glaucoma: A review. Seminars in Ophthalmology 2000;15:221-233.
31.
Harwerth RS, Quigley HA. Visual field defects and retinal ganglion cell losses in patients with glaucoma. Archives of Ophthalmology 2006;124:853-859.
32.
Caprioli J. Correlation of visual function with optic nerve and nerve fiber layer structure in glaucoma. Survey of Ophthalmology 1989;33:319-330.
33.
Caprioli J, Miller JM. Correlation of structure and function in glaucoma. Quantitative measurements of disc and field. Ophthalmology 1988;95:723-727.
34.
Deleon-Ortega JE, Arthur SN, McGwin Jr G, Xie A, Monheit BE, Girkin CA. Discrimination between glaucomatous and nonglaucomatous eyes using quantitative imaging devices and subjective optic nerve head assessment. Invest Ophthalmol Vis Sci 2006;47:3374-3380.
35.
Mardin CY, Jünemann AGM. The diagnostic value of optic nerve imaging in early glaucoma. Current Opinion in Ophthalmology 2001;12:100-104.
36.
Greaney MJ, Hoffman DC, Garway-Heath DF, Nakla M, Coleman AL, Caprioli J. Comparison of optic nerve imaging methods to distinguish normal eyes from those with glaucoma. Investigative Ophthalmology and Visual Science 2002;43:140-145.
324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366
17
367
37.
Harper R, Reeves B. The sensitivity and specificity of direct ophthalmoscopic optic disc assessment in screening for glaucoma: a multivariate analysis. Graefe's Archive for Clinical and Experimental Ophthalmology 2000;238:949-955.
38.
Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of Variation and Bias in Studies of Diagnostic Accuracy: A Systematic Review. Annals of Internal Medicine 2004;140:189-202.
39.
Medeiros FA, Ng D, Zangwill LM, Sample PA, Bowd C, Weinreb RN. The effects of study design and spectrum bias on the evaluation of diagnostic accuracy of confocal scanning laser ophthalmoscopy in glaucoma. Investigative Ophthalmology and Visual Science 2007;48:214222.
40.
Harper R, Henson D, Reeves BC. Appraising evaluations of screening/diagnostic tests: the importance of the study populations. British Journal of Ophthalmology 2000;84:1198.
41.
Hanley JA. Receiver operating characteristic (ROC) methodology: The state of the art. Critical Reviews in Diagnostic Imaging 1989;29:307-335.
42.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36.
43.
Svensson E. A coefficient of agreement adjusted for bias in paired ordered categorical data. Biometrical journal 1997;39:643-657.
44.
Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin 1971;76:378-382.
45.
Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990;43:543-549.
46.
Morgan JE, Sheen NJL, North RV, Choong Y, Ansari E. Digital imaging of the optic nerve head: Monoscopic and stereoscopic analysis. British Journal of Ophthalmology 2005;89:879884.
47.
Hrynchak P, Hutchings N, Jones D, Simpson T. A comparison of cup-to-disc ratio measurement in normal subjects using optical coherence tomography image analysis of the optic nerve head and stereo fundus biomicroscopy. Ophthalmic and Physiological Optics 2004;24:543-550.
48.
Parkin B, Shuttleworth G, Costen M, Davison C. A comparison of stereoscopic and monoscopic evaluation of optic disc topography using a digital optic disc stereo camera. BMJ; 2001:1347-1351.
49.
Vingrys AJ, Helfrich KA, Smith G. The role that binocular vision and stereopsis have in evaluating fundus features. Optom Vis Sci 1994;71:508-515.
50.
Rumsey KE, Rumsey JM, Leach NE. Monocular vs. stereospecific measurement of cup-todisc ratios. Optometry and Vision Science 1990;67:546-550.
51.
Harasymowycz P, Davis B, Xu G, Myers J, Bayer A, Spaeth GL. The use of RADAAR (ratio of rim area to disc area asymmetry) in detecting glaucoma and its severity. Canadian Journal of Ophthalmology 2004;39:240-244.
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407
18