INDUCTIS DATA MINING SHOOTOUT PRIYA JHANJI FERY SURYADI DEV KANNABIRAN MOHAMMED DWIKAT MALLIKARJUNA JAYANTY Copyright © 2007, SAS Institute Inc. All rights reserved.
Executive summary/Requirements M K Nurich (a subscription-based magazines company) wanted us to: • Run a predictive model that rank orders customers • Build a model that predicts the revenue generated by new customers We followed the CRISP-DM steps to meet the company objectives Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Business Understanding Task 1: Customer Acquisition Campaign • Top 40% of the population (Top 2820 observations).
Task 2: Predict revenue generated by new customers. • Here too the company is interested only in the top 40% (Top 3040 observations).
Copyright © 2007, SAS Institute Inc. All rights reserved.
Data Understanding Task1_Modeling_Pop 177 variables 7054 Customers One dependant variable REV_ALL
Task1_Score_Pop used to score the model
Task2_Pop 7596 observations Used as scoring dataset to predict the revenue generated by new customers Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Data Preparation Role of dependant variable REV_ALL changed to Target Variables with the role ‘classification’ were changed to input to reduce complexity All residual variables were rejected
Copyright © 2007, SAS Institute Inc. All rights reserved.
Data Preparation Data Partition Split the data into 2 parts using stratified sampling 70% of the data was used to build the model 30% of the data was used to validate the performance of the model
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Data Preparation Transformation amp_coverage and a few other variables were right skewed Transformed these variables using Maximum Normal transformation to maximize normality and reduce its skewness
Copyright © 2007, SAS Institute Inc. All rights reserved.
Transformation
Before Transformation Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
After Transformation
Imputation Why? 15674 missing value for interval variables and no missing value for class variables To improve performances of certain predictive models We used a tree surrogate as our default impute method for class variable Copyright © 2007, SAS Institute Inc. All rights reserved.
IMPUTATION
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
VARIABLE SELECTION Select input variables
Copyright © 2007, SAS Institute Inc. All rights reserved.
Modeling When building the models, we used several techniques: - decision tree - logistic regression - artificial neural network - an ensemble model
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Modeling Decision Tree Autonomous decision trees Autonomous decision tree with Gini criteria Autonomous decision tree with Entropy criteria Linear Regression Stepwise selection We chose validation error as our selection criterion.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Modeling Neural Network Neural network with 8 hidden nodes Auto neural network We chose to use MLP architecture Ensemble Model - Average - Maximum Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Evaluation Running all model (9 model) Use comparison node Criteria : • Average Squared Error
Copyright © 2007, SAS Institute Inc. All rights reserved.
Evaluation (Result)
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Deployment The best model to be decision tree model • The smallest average square error Use the model to score: - Task1_Score_Pop dataset - Task2_Pop dataset
Copyright © 2007, SAS Institute Inc. All rights reserved.
Autonomous DT
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Conclusion In general we built 9 models. Our best model: Autonomous Decision Tree Total revenue: Task1 : $1,092,705.4 Task2 : $473,552.7
Copyright © 2007, SAS Institute Inc. All rights reserved.
Thank You for your time
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.
Copyright © 2007, SAS Institute Inc. All rights reserved.