Dissections
DIAGNOSIS 23 May 2009
Evidence-based Medicine for Surgeons Diagnosing ruptured appendicitis preoperatively in pediatric patients
Authors: Williams RF, Blakely ML, Fischer PE, et al Journal: J American College of Surgeons 2009; 208:819–828 Centre: Division of Pediatric Surgery, University of Tennessee Health Science Center, Memphis, TN, USA
BACKGROUND
The rate of ruptured appendicitis (RA) is higher in children than in adults (30% to 74%). While treatment for acute appendicitis (AA) consists predominantly of urgent appendectomy, treatment for RA has much greater variation. Whether urgent appendectomy or initial antibiotics with interval appendectomy should be the preferred treatment for children with RA remains controversial. Distinguishing ruptured from acute appendicitis is very important if the treatment differs for the two conditions. The pediatric surgeon’s ability to distinguish these two conditions preoperatively has not been prospectively studied.
RESEARCH QUESTION Population All patients younger than 18 years referred for abdominal pain at a regional children’s hospital.
Authors' claim(s): “...Pediatric surgeons differentiate AA from RA and not appendicitis preoperatively with high accuracy and sensitivity [ability to rule in], but the specificity [ability to rule out] for diagnosing ruptured appendicitis is lower. The scoring system improved the specificity of the preoperative diagnosis.”
IN
SUMMARY Diagnostic performance of pediatric surgeons
Indicator variable Patients diagnosed as having acute appendicitis (AA), ruptured appendictis (RA) or no appendictis. Outcome variable Sensitivity, specificity, positive and negative likelihood ratios for the diagnosis of ruptured appendicitis.
Acute appendictis Ruptured Not appendictis (AA) appendictis (RA) (NA) Number
98
53
96
Sensitivity
92.60%
96.40%
98.70%
Specificity
94.90%
83.00%
93.80%
0.05
0.21
0.06
As seen in the study
Comparison
Negative likelihood ratio *
Diagnostic accuracy of a derived scoring system applied to the data from the same cohort.
Applying scoring system
Details on following page
Sensitivity
47.00%
Specificity
98.00% * = Measure of true negatives (ruling out)
THE TISSUE REPORT The literature is rife with reports of scoring systems that are extracted from existing data, put through the magic black box of a linear regression analysis, and retrospectively revalidated on the cohort from which they were extracted. Bad science. In addition, the authors of this paper do a lot of mumbling and beating around the bush when discussing the results of application of the scoring system. From the data published, contradictory ideas emerge regarding the overall accuracy (see the table above). A lot of fuzzy pie-splitting is indulged in. The only way to demonstrate the validity of this scoring system would have been to use two groups of identically competent pediatric surgeons and in a randomized fashion, allocate patients to one group that relied only on their conventional diagnostic skills and the other that used only the score and show that the latter were better. Moreover, why do we need an elaborate study to be told that generalized abdominal tenderness and an abscess seen on CT scan are the strongest predictors of a ruptured appendix?!
EBM-O-METER Evidence level
Overall rating
Bias levels
Double blind RCT
Sampling
Randomized controlled trial (RCT) Prospective cohort study - not randomized Case controlled study Case series - retrospective
Trash Life's too short for this
Swiss cheese Full of holes
Safe Holds water
Newsworthy “Just do it”
Comparison Measurement
l | Novel l | Feasible l Ethical l | Resource saving l
Interesting
The devil is in the details (more on the paper) ...
© Dr Arjun Rajagopalan
SAMPLING Sample type
Inclusion criteria
Simple random
All patients < 18 yrs of age referred for evaluation of abdominal pain
Exclusion criteria
Final score card
Not stated
AA
RA
NA
Target
?
?
?
Accessible
?
?
?
Consecutive
Intended
?
?
?
Convenience
Drop outs
?
?
?
Judgmental
Study
98
53
96
Stratified random Cluster
= Reasonable | ? = Arguable | = Questionable Duration of the study: February 2007 to October 2007
Sampling bias: The authors have not provided any details of the sampling process. The study was done in a referral pediatric hospital.
COMPARISON Randomized
Case-control
Non-random
Historical
None
Controls - details Allocation details
The pediatric surgical team recorded an agreed initial (preoperative) diagnosis using all data available. The use of advanced imaging (CT or ultrasonography) was decided by emergency department physicians, referral physicians, or pediatric surgeons. Using the predictors identified with multivariable analysis, a scoring system was constructed to evaluate whether an objective score based on available data might improve the ability to accurately diagnose RA.
Comparability
-
Disparity
-
Comparison bias: The scoring system was derived from existing data, put through a linear regression analysis and retrospectively revalidated on the cohort from which they were extracted. This is not a valid comparison.
MEASUREMENT Measurement error Blinding
N
Scoring
?
Protocols
Y
Training
Device suited to task
Observer error Gold std.
Device error Repetition
Device used
1.Final diagnosis - clinical team
?
N
?
N
N
N
N
2. Scoring system for RA (see table below)
?
N
N
N
Y
Y
N
Final diagnosis was determined using operative findings, pathology reports, or discharge diagnosis in those not undergoing operation. Final diagnosis in patients who did not undergo an operation was confirmed with follow up telephone contact and follow up review of the electronic medical record aimed at identifying care received after the initial discharge. Variable
Points
Generalized tenderness
4
Abscess on CT
3
Duration > 48 hrs
3
WBC count > 19,400/ ml
2
Fecalith on CT
1
Univariable analysis was performed on all preoperative variables comparing patients with a discharge diagnosis of RA to those with AA. Using the predictors identified with multivariable analysis, a scoring system was constructed as shown alongside. The patient’s score was calculated by adding the appropriate points based on the number of significant preoperative variables present. The results of this scoring system, as reported in the study, are difficult to interpret. No detailed explanation is given for why the authors believe that specificity is improved after applying the score.
Measurement bias: Only 79% of patients in the study had a CT scan. No attempt was made to measure observer variability. © Dr Arjun Rajagopalan