Utilizing Text Mining Techniques to Identify Fall Related Injuries
Dr. Monica Chiarini Tremblay Florida International University
VISN 8 Patient Safety Center of Inquiry Copyright © 2007, SAS Institute Inc. All rights reserved.
Research Team Researchers: Dr. Monica Chiarini Tremblay – Florida International University Dr. Donald J. Berndt – University of South Florida Dr. Stephen Luther – VA Dr. Philip Foulis – VA Dr. Gail Powell-Cope - VA Dr. Dustin French - VA Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
1
VISN 8 Patient Safety Center of Inquiry Goal: 1. To promote personal freedom and safety for frail elderly and persons with disabilities, across the continuum of care 2. To build a "culture of safety" to support clinicians in providing safe patient care and safe working environments.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Outline of Presentation Motivation Data Acquisition Chart Review Text Mining Unsupervised Learning – Cluster Analysis Supervised Learning – Cluster/NN/Decision Tree Conclusion/Future Directions Questions Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
2
Motivation
Burden of fall related injuries Unintentional injury due to falls is a serious problem among elderly - $20.2 billion dollars/year Veterans 65 and over served by VHA expected to increase from 43% in 2010 to 51% in 2020 Veterans 85 and over • Highest risk for serious injurious falls • Projected to increase 154,000 in 1990 to 1.3 million in 2010
Fall related injuries represent a large volume of service in the VA system
Copyright © 2007, SAS Institute Inc. All rights reserved.
Motivation
E-codes ICD-9-CM “external causes of injury” • Assigned to administrative data to supplement primary diagnosis codes • Only way to determine if an injury was due to an adverse event
Under utilized and/or inappropriately used Assigned directly by clinicians in VA ambulatory setting • Not a medical or administrative high priority
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
3
Electronic Medical Record Past decade the Veterans Healthcare Administration (VA) has invested extensively in the implementation of an electronic medical record (EMR) system. The written medical record presumably contains specific references to falls • Text-based notes are not easily searched.
Need to develop and validate new techniques to identify and characterize injuries.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Approach
Challenges Large amount of available data Normally ideal condition challenges.
introduces many
• Data selection • Combining disparate data sources • Manipulation and processing of very large text fields (such as progress notes) • Extraction of keywords and concepts for the purpose of building predictive models.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
4
Approach
Data Staging/Preprocessing/Transformation Administrative records for outpatient treatment of any injury during FY 200 Abstracted complete medical record for patients treated for injury Remove dates of service and other data sensitive elements in response to IRB and Privacy Board Request
Copyright © 2007, SAS Institute Inc. All rights reserved.
Approach
Extract Patient Information
Ambulatory event data (Austin)
Clinical Health Summary from VISTA
Create a ASCII file
Relational Database
Local Vista
Move VISTA clinical data into
Electronic Medical Record
Relational Database
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
5
Approach
Database Structure E-code Yes/No PatientID
Demographic PatientID (unique)
OutPatient PatientID OutPatientID (unique) VisitDate
ProgressNote PatientID NoteDate ProgressNote
Link OutPatient visit to ProgressNote by date
OutPatientDx OutPatientID DiagnosisCodes
Copyright © 2007, SAS Institute Inc. All rights reserved.
Approach
Challenges Patient could have a series of progress notes for events How to group records – “episode of care”: • Brute force approach • Create “sliding window”, allowing for some overlap and deciding on a time frame • Clustering on other data, essentially creating “types” of progress notes, and mining them as separate inputs
Initial model defines the episode of care as all the notes and data collected during a 48 hour period Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
6
Approach
Identifying Fall Related Injuries E-codes (E880-E888) signify “fallrelated injuries” due to slips, trips, or falls unrelated to transportation. Almost always correctly coded. • Those that did not contain fall-related ICD-9 codes were in-fact not coded and should have been identified as falls.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Providing Gold Standard Chart reviews (of progress notes) conducted by a trained data abstractor (registered nurses). GUI front end built on database:
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
7
Comparison of Results
Chart Review
Fall Injury Non-fall injury Total
Administrative Data Fall Non-fall Injury injury 138 135
Total 273
54
333
387
192
468
660
Copyright © 2007, SAS Institute Inc. All rights reserved.
Text Mining
Text Mining Stem terms (for example BIG:BIG, BIGGER, BIGGEST), provide initial synonym lists. Modify by applying domain knowledge - combine terms with the same meaning A term by document frequency matrix is created, with the row dimension of the matrix limited to the 100 most frequent terms.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
8
Text Mining
Text Mining Reduce dimensionality of term matrix using Latent Semantic Indexing (LSI). LSI reduces dimensionality by using Singular Value Decomposition (SVD) . SVD corresponds to their estimated similarity (Deerwester 1990). Select weighting scheme: • Information Gain (weightings target-based) for supervised learning. • Entropy for Unsupervised learning. Copyright © 2007, SAS Institute Inc. All rights reserved.
Unsupervised Learning
Cluster Analysis Exploratory step - is it possible to identify clusters of terms that are indicative of a FRI? Clusters are based on the SVD dimensions calculated with the entropy weighting scheme. Entropy is a concept from communication theory (Shannon,1948) and is a measure of information content (disorder). Entropy gives high weights to terms that are infrequent in all the data, but frequent in a few documents (Woodfield 2003).
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
9
Text Mining
Frequency Matrix with Weights / Unsupervised Term
Freq
# Documents Keep
Weight
Role
Attribute
+ pain
1639
510
Y
0.151
Noun
Alpha
+ he
1314
364
Y
0.215
Pron
Alpha
active
1090
136
Y
0.343
Prop
Alpha
pain
893
407
Y
0.184
Prop
Alpha
+ fall
461
348
Y
0.167
Verb
Alpha
+ will
404
225
Y
0.249
Aux
Alpha
+ list
330
179
Y
0.277
Verb
Alpha
+ tablet
301
69
Y
0.430
Noun
Alpha
medications
293
157
Y
0.284
Prop
Alpha
Using Entropy – Target Unknown Copyright © 2007, SAS Institute Inc. All rights reserved.
Unsupervised Learning
C Fall
-
NonFall
Descriptive Terms
Freq
%
1
25%
75% supplies, education, alcohol, active, medications, + year, screen, regular, education, + instruct, during, + tablet, understanding, + vital, past, meds, reason, mental, + referral, male
136
14%
2
48%
52% + will, + make, + continue, treatment, reason, mental, + month, some, + complete, + loss, walker, + referral, + fall, + hand, + orient, education, + plan, + instruct, + movement, + comment
324
32%
3
61%
38% + tablet, active, medications, supplies, + vital, male, reactions, + pain, pain, back, adverse, + comment, + fall, alcohol, + movement, + list, past, + month, education, understanding
171
17%
4
81%
18% vital, xray, + list, past, meds, pain, xray, adverse, reactions, floor, water, down, + fall, male, past, mental, + vital, back, + swell, + knee
65
7%
5
31%
69% + note, clinical, + symptom, xray, + call, + orient, + comment, + complete, understanding, during, education, floor, + injure, + swell, + plan, + make, treatment, screen, mental, + will
131
13%
6
65%
31% + abrasion, wrist, + he, + leg, + knee, adverse, + swell, + list, some, reactions, + fall, + hand, past, + injury, understanding, + will, + instruct, + plan, + fall, + vital
173
17%
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
10
Text Mining
Frequency Matrix with Weights/ Supervised Term
Freq
# Documents
Keep
Weight
Role
+ fall
880
682
Y
1.000
Verb
+ fall
550
394
Y
0.460
Noun
xray
202
152
Y
0.251
Prop
+ abrasion
176
113
Y
0.183
Noun
+ month
354
305
Y
0.170
Noun
+ he
4503
1132
Y
0.161
Pron
+ injury
328
268
Y
0.161
Noun
+ continue
467
373
Y
0.141
Verb
+ treatment
286
222
Y
0.135
Noun
Using Information Gain – Target is Known Copyright © 2007, SAS Institute Inc. All rights reserved.
Supervised Learning
Cluster Analysis #
Fall
NonFall
Descriptive Terms
Freq %
1
9.83%
90.17%
+ complete, + referal, + continue, + will, + comment, + note, education, screen, + treatment, + pain, + make, during, + month, pain, understanding, + symptom, + orient, + year, + instruct, + leg
529
53%
2
98.81 %
1.19%
+ fall, + fall, + abrasion, + knee, back, past, regular, + leg, mental, + vital, + injury, + month, during, + hand, + tablet, active, medications, some, + wrist, + orient
84
8%
3
89.43 %
10.57%
+ fall, loss, understanding, some, + instruct, 123 during, + tablet, + year, supplies, + leg, education, alcohol, + call, regular, + injury, male, + symptom, + knee, + pain, active
12%
4
96.59 %
3.41%
+ fall, + list, adverse, reactions, xray, past, + 264 swell, + wrist, + vital, meds, male, vital, + knee, + injury, some, supplies, pain, + tablet, + he, medications
26%
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
11
Supervised Learning
Additional Predictive Models Models utilized administrative data and text mining results for a binary prediction (falls). Preliminary results. Need to improve feature selection.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Supervised Learning
Neural Networks An artificial neural network (ANN) - information processing paradigm inspired by the way biological nervous systems, process information. Learn by example - trained for a specific application, such as pattern recognition or data classification. Neural networks are used to extract patterns and detect complex trends. The most commonly used NN architecture is the multilayer perceptron (MLP), which is a special type of feedforward network. A MLP is composed of an input layer, a hidden layer composed of hidden units, and an output layer (Walsh 2002). An MLP was used as an initial predictive model. Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
12
Supervised Learning
Neural Networks
Actual/Predicted
N
Y
Total
N
149
45
194
Y
23
171
194
Total
172
216
388
Sensitivity (Hit Rate) 88% - Specificity 77 % - False Positive Rate 23% - False Negative Rate 12% Copyright © 2007, SAS Institute Inc. All rights reserved.
Supervised Learning
Decision Trees Many different algorithms available for tree construction. Among the most commonly used - CHAID (chi-squared automatic interaction detection), C4.5/5.0, and CART (classification and regression trees) (Han et al. 2001). There are two main variations: regression trees and classification trees that assign class A decision tree induction approach based on the CART algorithm was employed as an alternative prediction strategy.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
13
Supervised Learning
Decision Trees Actual/Predicted
N
Y
Total
N
183
11
194
Y
28
166
194
Total
211
177
388
Sensitivity (Hit Rate) 86% - Specificity 94 % - False Positive Rate 6% False Negative Rate 14% Copyright © 2007, SAS Institute Inc. All rights reserved.
Conclusions and Future Directions
Future Directions Investigate NLP algorithms Imbed in decision support tools, prompt/suggest E-codes Frequency and nature of fall related injuries prevention programs.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
14
Conclusions and Future Directions
Future Directions Compare data mining results between gold standard and un-cleaned data. Explore why text mining more predictive then admin data • Better selection of attributes from the administrative data.
Run full study - including more VA hospitals
Copyright © 2007, SAS Institute Inc. All rights reserved.
Questions
Questions?
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
15
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
16