Print Potts Greg

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Print Potts Greg as PDF for free.

More details

  • Words: 3,221
  • Pages: 16
SAS M2007 Data Mining Conference October 1-2, 2007

Data Preparation for Data Mining in Health Care using SAS

Las Vegas

S. Greg Potts, MBA 1 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Introduction

Data mining practitioners are well aware that most of the total effort required to complete a data mining project is not spent in the “trendier” aspects of the project such as problem definition or algorithm/technique selection, application, and interpretation of the results.

Data Preparation for Data Mining in Health Care using SAS

Unfortunately, most time – up to 80% often cited – is spent “in the trenches” acquiring the data (i.e., most business data today is stored in transactional data warehouses where data elements essential for mining are dispersed across multiple tables), getting to know the data (i.e., conducting exploratory data analysis), and preparing the data mining table (i.e., summarizing data to the “unit of analysis” and creating derived variables to be used as targets and inputs in the modeling analysis) in the form required by the data mining algorithm (i.e., most current algorithms require data in the form of a a one-row-per-subject data table).

Contents

This presentation will present two case studies in using SAS to extract and prepare data for data mining. The first case will explore how to extract and prepare transactional (Medicaid claims) data for directed data mining, where the goal is to explain or predict the value(s) of a particular target variable. The second case will explore data extraction and preparation for undirected data mining (cluster analysis) using hospital-level data supplied to Medicare Quality Improvement Organizations (QIOs). ©2007 Arkansas Foundation for Medical Care, Inc.

<< Previous | Next >> AFMC and QI in Medicare and Medicaid . . . . . . . . . 3 AFMC as QIO and EQRO . . . . . . . . . . . . . . .4 Data Preparation for Directed Data Mining. . . . 7 Data Preparation for Undirected Data Mining .17 References . . . . . . . . .29

2

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

1

AFMC and QI in Medicare and Medicaid • Medicare provides health insurance for people age 65 and over, those with permanent kidney failure and certain people with disabilities (more than 400,000 individuals in Arkansas)

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• Medicaid is a jointly-funded, Federal-State medical assistance program for certain low income and needy people (more than 600,000 individuals in Arkansas).

3 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

AFMC as QIO and EQRO • Centers for Medicare & Medicaid Services (CMS) designated Quality Improvement Organization (QIO) for the state of Arkansas. - Assist providers (Hospitals, Physicians, Nursing Homes, etc.) with measuring and reporting quality measures and redesigning care processes; provide statistical support and assistance in interpreting data results.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• External Quality Review Organization (EQRO-like) & Review Agent for the Arkansas Medicaid Program. - Prior Authorization Reviews, Retrospective Reviews, HEDIS measures, Patient Satisfaction Surveys, and Data Mining. • Multi-disciplinary team of clinicians, statisticians, and consultants • Goal: To ensure that everyone receives the right care, at the right place, at the right time – every time.

4 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

2

AFMC’s Data Mining Team Subject Matter/Policy Experts

Data Experts

Data Preparation for Data Mining in Health Care using SAS

William E. Golden, MD, FACP Vice President of Quality Improvement; UAMS Professor of Medicine and Public Health

Clayton Wells, MS

<< Previous | Next >>

Medicaid Data Analysis Manager S. Greg Potts, MBA

Nena Sanchez, MS Vice President, Medicare & Medicaid Operations

Data Mining Team Leader

AFMC’s Data Mining Team combines clinical, policy, and data experts.

Donna West, RPh, PhD Consultant UAMS Associate Professor of Pharmacy

5 SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

©2007 Arkansas Foundation for Medical Care, Inc.

Today’s Health Care Environment Data Preparation for Data Mining in Health Care using SAS

EMPLOYER/CONSUMER DEMANDS FOR ACCOUNTABILITY & TRANSPARENCY

RISING COSTS $$$$$$$$$

EXPLOSION IN CLINICAL KNOWLEDGE

QUALITY

WIDE VARIATIONS IN QUALITY

<< Previous | Next >>

Many challenges exist in today’s dynamic health care environment.

IMPROVEMENT

INDUSTRY FRAGMENTATION

PROVIDER/PAYER DATA SILOS

6 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

3

Data Preparation for Data Mining in Health Care using SAS

Data Preparation for Directed Data Mining

<< Previous | Next >>

7 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Directed Data Mining in Medicaid • AFMC Medicaid Data Mining Projects are often conducted with an eye toward identifying high cost drivers for utilization review and/or cost containment or to identify care coordination inefficiencies, which can be opportunities for quality improvement.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• Projects are client and/or literature-based. • In directed data mining, the goal is to explain or predict the value(s) of a target variable. • Recipient/Member is the often the “Unit of Analysis”. Target is (usually) total costs ($) per member while inputs are (usually) binary and represent diagnosis, procedure, drug, and provider type code classes. • Techniques used include decision trees and (infrequently) regression.

8 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

4

Medicaid Data Source • Arkansas Medicaid Decision Support System (DSS) Data Warehouse - contains over 6 years of historical medical claims data among 500+ data tables (some containing millions of rows of data) at granular levels of detail along with eligibility and demographic data.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

Claims Analysis Columns

Drug Detail Columns

NDC Code Drug Nam Drug Class ……..

Clm Num Dtl Num Recip ID Amt Paid NDC Code

Eligibility Columns

Recip ID Plan Code Elig Curr Elig Beg Elig End …….

Enrollment Columns

Recip ID County DOB Gender Race …… 9

©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Data Acquisition: Medicaid Data Queries • Access Oracle® Data Warehouse through BusinessObjects® to learn table relationships and build Oracle® SQL queries when necessary. • AFMC licenses SAS/ACCESS® Interface to ODBC. • Established a Windows® ODBC connection with Oracle® Data Warehouse. • Copy and Paste Oracle® SQL code into SAS program and execute.

• Advantages: 1) Pulls data back in SAS Data set and, 2) Bypasses BusinessObjects’® query size limitations. ©2007 Arkansas Foundation for Medical Care, Inc.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

Data is acquired via SQL queries.

10 SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

5

Medicaid Transactional Claims Data Example Recipient ID

Clm Num

DTL Num

Prov Type

Dx

Proc

Amt_ Paid

001

050600407

1

Phys

250.00

99212

$25.00



001

050600407

2

Phys

640.01

81000

$43.38



002

050600408

1

Pharm

002

050600409

1

Phys

250.10

99212

$45.13



003

050600410

1

Phys

427.0

A0426

$240.46



003

050600411

2

Phys

427.0

A0390

$225.72



$106.45

NDC Code

Data Preparation for Data Mining in Health Care using SAS

00406035705

Drug Class

280808



<< Previous | Next >>



• Multiple rows of claim detail data make up one claim per recipient/member. • Challenge is to summarize data to the recipient level and to create target and input variables to be used in modeling analysis. 11 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

SAS Procedures for Transactional Data Preparation – Summing Costs • Use PROC SQL to summarize paid amounts (AMT_PAID) to recipientlevel and to break claim costs out by medical vs. pharmacy.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• Total Costs (target) are calculated from the variables created in these tables using a DATA step later in the data prep program.

12 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

6

SAS Procedures for Transactional Data Preparation – Creating Binary Inputs • Use PROC SQL to create table of recipient IDs with non-missing Dx codes.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

As shown, a number of SAS procedures and language statements are needed to transform transactional data into a one-row-per unit-of-analysis format suitable for directed data mining techniques.

• Use FIRST. by-processing to create an enumeration variable.

• Sort Data Set • Convert sorted data from rows to columns using PROC TRANSPOSE

• Array to create binary input variables (0,1) for code classes.

• Repeat process for Procedure Class, Drug Class, and Provider Types

13

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

©2007 Arkansas Foundation for Medical Care, Inc.

Sample Directed Data Mining Modeling Data Set

Data Preparation for Data Mining in Health Care using SAS

Total Cost = f (Clinical Diagnoses, Procedures, Prescription Drugs, Providers Seen, Age, Gender) per Recipient

<< Previous | Next >> Input Variables

Target Variable

Total SFY 2004 Costs

Dx Group1

Dx Group2

Recipient ID

(001-139 Infectious & Parasitic Diseases)

(140-239 Neoplasms)



001

$15,232.84

1

1



1



1



002

$2,006.72

1

0



1



0



003

$8,354.89

0

1



1



1



.

.

.

.



.



.



.

.

.

.



.



.



Procedure Code Group1



(00100-01999 Anesthesiology)

Thera Class1



(04:00.00 Antihistamines)

Mining data set contains both Interval (i.e., Total Costs) and binary variables (i.e., DX_GROUP_1)

14 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

7

Why use Decision Trees?

Data Preparation for Data Mining in Health Care using SAS

• Excellent Tool for Data Mining - Profiling • Allow you to easily see patterns in data with respect to a target variable (i.e., Interval – total cost $ per recipient).

<< Previous | Next >>

• Decision Tree algorithm “reads” the data and determines the best variable on which to “split” the data. • Splits continue as long as they are statistically significant.

15 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Sample Decision Tree Results SFY 2004 Continuously-Enrolled Medicaid Recipients with a Diabetes Dx (250.0X-250.9X) (Partial Tree Results)

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

This group of recipients with a Diabetes Dx accrued average total costs more than three times other recipients with the same diagnosis. Why? Child nodes from decision tree may require more drill-down analysis or model tuning in the form of variable reduction and reapplication. ©2007 Arkansas Foundation for Medical Care, Inc.

16

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

8

Data Preparation for Data Mining in Health Care using SAS

Data Preparation for Undirected Data Mining

<< Previous | Next >>

17 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Undirected Data Mining in Medicare • AFMC Medicare Data Mining Projects are often conducted with an eye toward identifying opportunities for quality improvement and safeguarding Medicare program funds.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• In undirected data mining, the goal is to uncover the hidden structure in data without respect to a target variable. • Cluster analysis (PROC FASTCLUS) is technique often used.

18 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

9

Hospital Payment Monitoring Program (HPMP) • The Centers for Medicare and Medicaid Services (CMS) developed the Hospital Payment Monitoring Program (HPMP) primarily to calculate and monitor the Medicare fee-for-service paid claims error rate for inpatient acutecare hospital services.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

HPMP is a QIO Statement of Work (SOW) priority.

• Under contracts with CMS, several companies – including QIOs like AFMC – are responsible for operating the HPMP.

19 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

HPMP, QIOs, and Reporting • Each quarter, QIOs receive state hospital-level data containing the calculated paid claims rates and other summary measures for 14 different target areas identified by CMS as prone to payment errors. This quarterly report is known as the First-Look Analysis Tool for Hospital Outlier Monitoring (FATHOM).

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

Example target area paid claims rate: Target 12: Three Day Transfer to SNF Numerator: count of discharges to a SNF with a three-day length of stay Denominator: count of all discharges to a SNF or swing bed • From the FATHOM data, QIOs can generate and distribute the Program for Evaluating Payment Patterns Electronic Report (PEPPER). The PEPPERs are hospital-specific reports that allow Inpatient Prospective Payment System (IPPS) hospitals to compare their own billing practices in the 14 target areas with other IPPS hospitals within the state. • Each QIO uses these data tools to work with hospitals in their state to reduce improper admissions and Diagnosis-Related Group (DRG) payment errors. ©2007 Arkansas Foundation for Medical Care, Inc.

20

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

10

Resource Challenge and Answer • With 50+ IPPS Arkansas hospitals and limited staff, AFMC desired to find a way to identify hospitals that have been “extreme outliers” (95th percentile or above on two measures) over time, thus indicating a need for close monitoring and possible notification.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• Answer: Cluster Analysis to segment hospitals into 3 “like” groups based on 5 of the 11 hospital/target-level calculated measures in the data (see right), then graphing the results of two of the 5 measures to determine “extreme outliers”.

21 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Data Acquisition: Medicare Hospital-Level Data

Data Preparation for Data Mining in Health Care using SAS

• The FATHOM hospital-level data is provided to each QIO from another CMS/HPMP contractor in a Microsoft Access® database (*.mdb).

<< Previous | Next >>

• PROC IMPORT is used to import data table from the *.mdb file.

22 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

11

SAS Procedures for Data Preparation at UA-level – Standardization • Per SAS/STAT 9.1.3 Online Documentation regarding PROC FASTCLUS, Pg. 1382-1383: “Variables with larger variances exert a larger influence in calculating the clusters...Therefore it is necessary to standardize the variables before performing the cluster analysis.”

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• PROC STANDARD is used to standardize all variables used in the cluster analysis to mean=0, std=1 prior to invoking PROC FASTCLUS.

23 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Cluster Analysis • Run PROC FASTCLUS against output data set containing standardized variables.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

24 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

12

Plot to Produce Visual Representation • Run PROC GPLOT to plot two of the five variables used in cluster analysis to determine outliers (note: highlighted syntax produces red vertical and horizontal references to the 95th percentile value for each variable being plotted):

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

25 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Plot Results

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• (Standardized) Variables used in cluster analysis being plotted are: - Outlier Value: A single number from -10 to 10, describing how unusual a hospital is compared to all IPPS hospitals in the state. - Outlier Times Count: Outlier value weighted by number of discharges. This measure captures both the unusualness of the hospitals’ target outlier value and the volume of target discharges. • Outliers facilities are those facilities that consistently fall in the upper-right-most quadrant (>95th percentile on BOTH measures). 26 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

13

Sample Letter to Hospitals Regarding Outlier Status

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

27 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Conclusions • Most time spent in a Data Mining project is spent acquiring and preparing data, not on algorithm/technique selection and application.

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

• SAS has a vast arsenal of tools to help you acquire, prepare/transform, and mine your data.

28 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

14

References Data Preparation for Data Mining Using SAS Mamdouh Refaat (Morgan Kaufmann/Elsevier) http://books.elsevier.com/us/mk/us/subindex.asp?isbn=97 80123735775&country=United+States&community=mk&r ef=&mscssid=KHFKVKDF6HNU8HJND9V1RB81QR712R 2E or http://www.amazon.com/Preparation-Mining-KaufmannManagementSystems/dp/0123735777/ref=pd_bbs_sr_1/102-55832127763329?ie=UTF8&s=books&qid=1179937157&sr=1-1 Base SAS Procedures and syntax (PROC IMPORT, PROC SQL, PROC TRANSPOSE) http://support.sas.com/documentation/onlinedoc/91pdf/sas doc_913/base_proc_8977_new.pdf SAS Language Reference (Arrays) http://support.sas.com/documentation/onlinedoc/91pdf/sas doc_913/base_lrconcept_9196.pdf

Decision Trees in Enterprise Miner http://support.sas.com/documentation/onlin edoc/91pdf/sasdoc_91/em_gs_7281.pdf SAS/STAT Procedures (PROC FASTCLUS) http://support.sas.com/documentation/onlin edoc/91pdf/sasdoc_91/stat_ug_7313.pdf or http://www2.sas.com/proceedings/sugi24/ Stats/p270-24.pdf

Data Preparation for Data Mining in Health Care using SAS << Previous | Next >>

Hospital Payment Monitoring Program (HPMP) http://oig.hhs.gov/oas/reports/region3/3050 0007.pdf

29 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

Acknowledgements

Data Preparation for Data Mining in Health Care using SAS

• M2007 Co-Chairs: •Jerry Oglesby, SAS Institute

<< Previous | Next >>

•Goutam Chakraborty, Oklahoma State University • Rona Bellinger, AFMC Manager of Web & Graphic Services • Karen Gabel and Tori Gammill, AFMC HPMP Team Members • AFMC’s Data Mining Team

30 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

15

Questions??? Contact: S. Greg Potts, MBA Data Mining Team Leader Arkansas Foundation for Medical Care Office of Projects and Analysis 401 West Capitol, Suite 410 Little Rock, AR 72201 (501) 212-8734 Phone (501) 375-1201 Fax E-mail: [email protected]

31 ©2007 Arkansas Foundation for Medical Care, Inc.

SAS M2007 Data Mining Conference | October 1-2, 2007 | Las Vegas

16

Related Documents

Print Potts Greg
November 2019 7
Lionnel Potts
May 2020 6
Greg As
April 2020 27
Greg Morris
June 2020 21
Broome, Greg
June 2020 25
Paul Potts Helside Lav
October 2019 3