Print Shootout Kannabiran Dev

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Print Shootout Kannabiran Dev as PDF for free.

More details

  • Words: 726
  • Pages: 11
INDUCTIS DATA MINING SHOOTOUT PRIYA JHANJI FERY SURYADI DEV KANNABIRAN MOHAMMED DWIKAT MALLIKARJUNA JAYANTY Copyright © 2007, SAS Institute Inc. All rights reserved.

Executive summary/Requirements ƒ M K Nurich (a subscription-based magazines company) wanted us to: • Run a predictive model that rank orders customers • Build a model that predicts the revenue generated by new customers ƒ We followed the CRISP-DM steps to meet the company objectives Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Business Understanding ƒ Task 1: Customer Acquisition Campaign • Top 40% of the population (Top 2820 observations).

ƒ Task 2: Predict revenue generated by new customers. • Here too the company is interested only in the top 40% (Top 3040 observations).

Copyright © 2007, SAS Institute Inc. All rights reserved.

Data Understanding ƒ Task1_Modeling_Pop 177 variables 7054 Customers One dependant variable REV_ALL

ƒ Task1_Score_Pop used to score the model

ƒ Task2_Pop 7596 observations Used as scoring dataset to predict the revenue generated by new customers Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Data Preparation ƒ Role of dependant variable REV_ALL changed to Target ƒ Variables with the role ‘classification’ were changed to input to reduce complexity ƒ All residual variables were rejected

Copyright © 2007, SAS Institute Inc. All rights reserved.

Data Preparation Data Partition ƒ Split the data into 2 parts using stratified sampling ƒ 70% of the data was used to build the model ƒ 30% of the data was used to validate the performance of the model

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Data Preparation Transformation ƒ amp_coverage and a few other variables were right skewed ƒ Transformed these variables using Maximum Normal transformation to maximize normality and reduce its skewness

Copyright © 2007, SAS Institute Inc. All rights reserved.

Transformation

Before Transformation Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

After Transformation

Imputation ƒ Why? 15674 missing value for interval variables and no missing value for class variables To improve performances of certain predictive models ƒ We used a tree surrogate as our default impute method for class variable Copyright © 2007, SAS Institute Inc. All rights reserved.

IMPUTATION

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

VARIABLE SELECTION ƒ Select input variables

Copyright © 2007, SAS Institute Inc. All rights reserved.

Modeling ƒ When building the models, we used several techniques: - decision tree - logistic regression - artificial neural network - an ensemble model

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Modeling Decision Tree Autonomous decision trees Autonomous decision tree with Gini criteria Autonomous decision tree with Entropy criteria Linear Regression Stepwise selection We chose validation error as our selection criterion.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Modeling Neural Network Neural network with 8 hidden nodes Auto neural network We chose to use MLP architecture Ensemble Model - Average - Maximum Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Evaluation ƒ Running all model (9 model) ƒ Use comparison node Criteria : • Average Squared Error

Copyright © 2007, SAS Institute Inc. All rights reserved.

Evaluation (Result)

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Deployment ƒ The best model to be decision tree model • The smallest average square error ƒ Use the model to score: - Task1_Score_Pop dataset - Task2_Pop dataset

Copyright © 2007, SAS Institute Inc. All rights reserved.

Autonomous DT

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Conclusion ƒ In general we built 9 models. ƒ Our best model: Autonomous Decision Tree ƒ Total revenue: Task1 : $1,092,705.4 Task2 : $473,552.7

Copyright © 2007, SAS Institute Inc. All rights reserved.

ƒ Thank You for your time

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Related Documents

Print Shootout Parekh Kunal
November 2019 13
Dev
May 2020 34
Dev
April 2020 61
Dev
June 2020 26
Dev
April 2020 30