Print Tremblay Monica

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Print Tremblay Monica as PDF for free.

More details

  • Words: 2,175
  • Pages: 16
Utilizing Text Mining Techniques to Identify Fall Related Injuries

Dr. Monica Chiarini Tremblay Florida International University

VISN 8 Patient Safety Center of Inquiry Copyright © 2007, SAS Institute Inc. All rights reserved.

Research Team Researchers: ƒ Dr. Monica Chiarini Tremblay – Florida International University ƒ Dr. Donald J. Berndt – University of South Florida ƒ Dr. Stephen Luther – VA ƒ Dr. Philip Foulis – VA ƒ Dr. Gail Powell-Cope - VA ƒ Dr. Dustin French - VA Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

1

VISN 8 Patient Safety Center of Inquiry Goal: 1. To promote personal freedom and safety for frail elderly and persons with disabilities, across the continuum of care 2. To build a "culture of safety" to support clinicians in providing safe patient care and safe working environments.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Outline of Presentation ƒ Motivation ƒ Data Acquisition ƒ Chart Review ƒ Text Mining ƒ Unsupervised Learning – Cluster Analysis ƒ Supervised Learning – Cluster/NN/Decision Tree ƒ Conclusion/Future Directions ƒ Questions Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

2

Motivation

Burden of fall related injuries ƒ Unintentional injury due to falls is a serious problem among elderly - $20.2 billion dollars/year ƒ Veterans 65 and over served by VHA expected to increase from 43% in 2010 to 51% in 2020 ƒ Veterans 85 and over • Highest risk for serious injurious falls • Projected to increase 154,000 in 1990 to 1.3 million in 2010

ƒ Fall related injuries represent a large volume of service in the VA system

Copyright © 2007, SAS Institute Inc. All rights reserved.

Motivation

E-codes ƒ ICD-9-CM “external causes of injury” • Assigned to administrative data to supplement primary diagnosis codes • Only way to determine if an injury was due to an adverse event

ƒ Under utilized and/or inappropriately used ƒ Assigned directly by clinicians in VA ambulatory setting • Not a medical or administrative high priority

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

3

Electronic Medical Record ƒ Past decade the Veterans Healthcare Administration (VA) has invested extensively in the implementation of an electronic medical record (EMR) system. ƒ The written medical record presumably contains specific references to falls • Text-based notes are not easily searched.

ƒ Need to develop and validate new techniques to identify and characterize injuries.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Approach

Challenges ƒ Large amount of available data ƒ Normally ideal condition challenges.

introduces many

• Data selection • Combining disparate data sources • Manipulation and processing of very large text fields (such as progress notes) • Extraction of keywords and concepts for the purpose of building predictive models.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

4

Approach

Data Staging/Preprocessing/Transformation ƒ Administrative records for outpatient treatment of any injury during FY 200 ƒ Abstracted complete medical record for patients treated for injury ƒ Remove dates of service and other data sensitive elements in response to IRB and Privacy Board Request

Copyright © 2007, SAS Institute Inc. All rights reserved.

Approach

Extract Patient Information

Ambulatory event data (Austin)

Clinical Health Summary from VISTA

Create a ASCII file

Relational Database

Local Vista

Move VISTA clinical data into

Electronic Medical Record

Relational Database

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

5

Approach

Database Structure E-code Yes/No PatientID

Demographic PatientID (unique)

OutPatient PatientID OutPatientID (unique) VisitDate

ProgressNote PatientID NoteDate ProgressNote

Link OutPatient visit to ProgressNote by date

OutPatientDx OutPatientID DiagnosisCodes

Copyright © 2007, SAS Institute Inc. All rights reserved.

Approach

Challenges ƒ Patient could have a series of progress notes for events ƒ How to group records – “episode of care”: • Brute force approach • Create “sliding window”, allowing for some overlap and deciding on a time frame • Clustering on other data, essentially creating “types” of progress notes, and mining them as separate inputs

ƒ Initial model defines the episode of care as all the notes and data collected during a 48 hour period Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

6

Approach

Identifying Fall Related Injuries ƒ E-codes (E880-E888) signify “fallrelated injuries” due to slips, trips, or falls unrelated to transportation. ƒ Almost always correctly coded. • Those that did not contain fall-related ICD-9 codes were in-fact not coded and should have been identified as falls.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Providing Gold Standard ƒ Chart reviews (of progress notes) conducted by a trained data abstractor (registered nurses). ƒ GUI front end built on database:

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

7

Comparison of Results

Chart Review

Fall Injury Non-fall injury Total

Administrative Data Fall Non-fall Injury injury 138 135

Total 273

54

333

387

192

468

660

Copyright © 2007, SAS Institute Inc. All rights reserved.

Text Mining

Text Mining ƒ Stem terms (for example BIG:BIG, BIGGER, BIGGEST), provide initial synonym lists. ƒ Modify by applying domain knowledge - combine terms with the same meaning ƒ A term by document frequency matrix is created, with the row dimension of the matrix limited to the 100 most frequent terms.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

8

Text Mining

Text Mining ƒ Reduce dimensionality of term matrix using Latent Semantic Indexing (LSI). ƒ LSI reduces dimensionality by using Singular Value Decomposition (SVD) . ƒ SVD corresponds to their estimated similarity (Deerwester 1990). ƒ Select weighting scheme: • Information Gain (weightings target-based) for supervised learning. • Entropy for Unsupervised learning. Copyright © 2007, SAS Institute Inc. All rights reserved.

Unsupervised Learning

Cluster Analysis ƒ Exploratory step - is it possible to identify clusters of terms that are indicative of a FRI? ƒ Clusters are based on the SVD dimensions calculated with the entropy weighting scheme. ƒ Entropy is a concept from communication theory (Shannon,1948) and is a measure of information content (disorder). ƒ Entropy gives high weights to terms that are infrequent in all the data, but frequent in a few documents (Woodfield 2003).

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

9

Text Mining

Frequency Matrix with Weights / Unsupervised Term

Freq

# Documents Keep

Weight

Role

Attribute

+ pain

1639

510

Y

0.151

Noun

Alpha

+ he

1314

364

Y

0.215

Pron

Alpha

active

1090

136

Y

0.343

Prop

Alpha

pain

893

407

Y

0.184

Prop

Alpha

+ fall

461

348

Y

0.167

Verb

Alpha

+ will

404

225

Y

0.249

Aux

Alpha

+ list

330

179

Y

0.277

Verb

Alpha

+ tablet

301

69

Y

0.430

Noun

Alpha

medications

293

157

Y

0.284

Prop

Alpha

Using Entropy – Target Unknown Copyright © 2007, SAS Institute Inc. All rights reserved.

Unsupervised Learning

C Fall

-

NonFall

Descriptive Terms

Freq

%

1

25%

75% supplies, education, alcohol, active, medications, + year, screen, regular, education, + instruct, during, + tablet, understanding, + vital, past, meds, reason, mental, + referral, male

136

14%

2

48%

52% + will, + make, + continue, treatment, reason, mental, + month, some, + complete, + loss, walker, + referral, + fall, + hand, + orient, education, + plan, + instruct, + movement, + comment

324

32%

3

61%

38% + tablet, active, medications, supplies, + vital, male, reactions, + pain, pain, back, adverse, + comment, + fall, alcohol, + movement, + list, past, + month, education, understanding

171

17%

4

81%

18% vital, xray, + list, past, meds, pain, xray, adverse, reactions, floor, water, down, + fall, male, past, mental, + vital, back, + swell, + knee

65

7%

5

31%

69% + note, clinical, + symptom, xray, + call, + orient, + comment, + complete, understanding, during, education, floor, + injure, + swell, + plan, + make, treatment, screen, mental, + will

131

13%

6

65%

31% + abrasion, wrist, + he, + leg, + knee, adverse, + swell, + list, some, reactions, + fall, + hand, past, + injury, understanding, + will, + instruct, + plan, + fall, + vital

173

17%

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

10

Text Mining

Frequency Matrix with Weights/ Supervised Term

Freq

# Documents

Keep

Weight

Role

+ fall

880

682

Y

1.000

Verb

+ fall

550

394

Y

0.460

Noun

xray

202

152

Y

0.251

Prop

+ abrasion

176

113

Y

0.183

Noun

+ month

354

305

Y

0.170

Noun

+ he

4503

1132

Y

0.161

Pron

+ injury

328

268

Y

0.161

Noun

+ continue

467

373

Y

0.141

Verb

+ treatment

286

222

Y

0.135

Noun

Using Information Gain – Target is Known Copyright © 2007, SAS Institute Inc. All rights reserved.

Supervised Learning

Cluster Analysis #

Fall

NonFall

Descriptive Terms

Freq %

1

9.83%

90.17%

+ complete, + referal, + continue, + will, + comment, + note, education, screen, + treatment, + pain, + make, during, + month, pain, understanding, + symptom, + orient, + year, + instruct, + leg

529

53%

2

98.81 %

1.19%

+ fall, + fall, + abrasion, + knee, back, past, regular, + leg, mental, + vital, + injury, + month, during, + hand, + tablet, active, medications, some, + wrist, + orient

84

8%

3

89.43 %

10.57%

+ fall, loss, understanding, some, + instruct, 123 during, + tablet, + year, supplies, + leg, education, alcohol, + call, regular, + injury, male, + symptom, + knee, + pain, active

12%

4

96.59 %

3.41%

+ fall, + list, adverse, reactions, xray, past, + 264 swell, + wrist, + vital, meds, male, vital, + knee, + injury, some, supplies, pain, + tablet, + he, medications

26%

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

11

Supervised Learning

Additional Predictive Models ƒ Models utilized administrative data and text mining results for a binary prediction (falls). ƒ Preliminary results. ƒ Need to improve feature selection.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Supervised Learning

Neural Networks ƒ An artificial neural network (ANN) - information processing paradigm inspired by the way biological nervous systems, process information. ƒ Learn by example - trained for a specific application, such as pattern recognition or data classification. ƒ Neural networks are used to extract patterns and detect complex trends. ƒ The most commonly used NN architecture is the multilayer perceptron (MLP), which is a special type of feedforward network. ƒ A MLP is composed of an input layer, a hidden layer composed of hidden units, and an output layer (Walsh 2002). ƒ An MLP was used as an initial predictive model. Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

12

Supervised Learning

Neural Networks

Actual/Predicted

N

Y

Total

N

149

45

194

Y

23

171

194

Total

172

216

388

Sensitivity (Hit Rate) 88% - Specificity 77 % - False Positive Rate 23% - False Negative Rate 12% Copyright © 2007, SAS Institute Inc. All rights reserved.

Supervised Learning

Decision Trees ƒ Many different algorithms available for tree construction. ƒ Among the most commonly used - CHAID (chi-squared automatic interaction detection), C4.5/5.0, and CART (classification and regression trees) (Han et al. 2001). ƒ There are two main variations: regression trees and classification trees that assign class ƒ A decision tree induction approach based on the CART algorithm was employed as an alternative prediction strategy.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

13

Supervised Learning

Decision Trees Actual/Predicted

N

Y

Total

N

183

11

194

Y

28

166

194

Total

211

177

388

Sensitivity (Hit Rate) 86% - Specificity 94 % - False Positive Rate 6% False Negative Rate 14% Copyright © 2007, SAS Institute Inc. All rights reserved.

Conclusions and Future Directions

Future Directions ƒ Investigate NLP algorithms ƒ Imbed in decision support tools, prompt/suggest E-codes ƒ Frequency and nature of fall related injuries prevention programs.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

14

Conclusions and Future Directions

Future Directions ƒ Compare data mining results between gold standard and un-cleaned data. ƒ Explore why text mining more predictive then admin data • Better selection of attributes from the administrative data.

ƒ Run full study - including more VA hospitals

Copyright © 2007, SAS Institute Inc. All rights reserved.

Questions

Questions?

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

15

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

16

Related Documents

Print Tremblay Monica
November 2019 7
Monica
June 2020 23
Monica
November 2019 29
Monica
June 2020 13
David Tremblay
December 2019 8
Cheryl Tremblay
December 2019 17