Solving Business Problems with Oracle Data Mining Purpose This tutorial shows you how to use Oracle Data Mining to solve business problems. Time to Complete Approximately 2 hours
Topics This tutorial covers the following topics: Overview Scenario Prerequisites Launching Oracle Data Miner Preparing Data for Data Mining Using the Mining Activity Guides Summary Viewing Screenshots Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.) Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
Overview Data Mining is sometimes called Knowledge Discovery-its goal is to provide actionable information, not found by other means. This information can improve the functioning of your business. For example, suppose a marketing campaign results in a 2% positive response. If Data Mining can help focus the campaign on the people most likely to respond, resulting in a 3% response, then the business outcome is a 50% increase in revenue. Data Mining can be divided into two types of "Learning". The goal is to predict a value for a particular characteristic, or attribute that describes some behavior. The attribute being predicted is called the Target Attribute. The goal is to discover relationships and patterns Unsupervised rather than to determine a particular value. Here, Learning there is no Target Attribute. Supervised Learning
Oracle Data Mining (ODM) is powerful data mining software embedded in the Oracle Database that enables you to discover new insights hidden in your data. Oracle Data
Mining helps businesses to target their best customers, find and prevent fraud, discover the most influential attributes that affect Key Performance Indicators (KPIs), and find valuable new information hidden in the data. Oracle Data Mining helps technical professionals find patterns in their data, identify key attributes, discover new clusters and associations, and uncover valuable insights. Oracle Data Mining enables companies to: KNOW MORE DO MORE
Leverage your data and discover valuable new information and insights that were previously hidden. Build applications that automate the extraction and dissemination of new information and insights. Oracle Data Mining is significantly less expensive SPEND compared to traditional approaches and, as a component LESS of your investment in Oracle technology, significantly reduces your total cost of ownership. Oracle Data Mining enables you to go beyond standard query and reporting tools and Online Analytical Processing (OLAP). Query and reporting and OLAP tools can tell you who are your top customers, what products have sold the most, and where you are incurring the highest costs. With Oracle Data Mining, you can implement strategies to: Anticipate and prevent customer attrition Acquire new customers and identify the most profitable customers Identify promising cross-sell opportunities Detect noncompliant and fraudulent activities Discover new clusters or segments Develop customer profiles Identify likely targets and promising leads in drug discovery Find association relationships of co-occurring items and/or events Mine unstructured data, that is, text. Traditional business intelligence (BI) tools such as reports, interactive query and reporting, and Online Analytical Processing (OLAP), only report on what has happened in the past. Oracle Data Mining (ODM) allows you to go beyond traditional BI and reporting to mine your data and build advanced data mining applications. ODM enables you to discover new insights, segments and associations, make more accurate predictions, find the variables that most influence your business, and in general, extract more information from your data. For example, by analyzing the profiles of your best customers, ODM enables you to build data mining models and integrated applications to identify customers who are likely to become your best customers in the future. These customers may not represent your most valuable customers today, but may match profiles of your current best customers. Moreover, with ODM you can do more and transform a predictive model into a regular production application that distributes lists of your most promising customers to your Sales force every Monday morning. Knowing the “strategic value” of your customers — which are likely to become profitable customers in the future and which are not, or predicting which customers are likely to
churn or likely to respond to a marketing offer — and integrating this information into your operations is the key to proactively managing your business. The phases of solving a business problem using Oracle Data Mining are as follows: Problem Definition in terms of Data Mining and Business Goals Data Acquisition and Preparation Building and Evaluation of Models Deployment. Note: This tutorial is not intended as a comprehensive course on Oracle Data Mining. It illustrates the techniques required to carry out some common data mining operations. If you want more background information about some of the topics, see the Oracle Data Mining Concepts Guide. Back to Topic List
Scenario An electronics store chain wants to distribute a discount card to its customers, but only to those customers who are expected to increase their buying (and thus the company’s revenue) because of this card. A test campaign was run on a sample of customers and the results were compiled into a table containing the customer demographics, purchasing patterns, and a measure of revenue produced by each customer. A dataset describing the customers and results in the test campaign is used to create a model that can be applied to all customers for the purpose of predicting revenue levels expected from each customer who uses a discount affinity card – the Target attribute to be predicted is AFFINITY_CARD with values 0 (= low revenue) and 1 (= high revenue). The distinct target values (0 and 1 in this case) are sometimes called the target “classes”, thus the prediction of a target value for each customer is called Classification. Back to Topic List
Prerequisites Before starting this tutorial, you should: 1. Perform the Installing Oracle Database 10g on Windows tutorial. 2. Perform the Installing Oracle Data Miner 10g Release 2 tutorial. Back to Topic List
Launching Oracle Data Miner To launch Oracle Data Miner perform the following. 1. Double-click \bin\odminerw.exe.
2. To connect as dmuser1, click OK.
3. You may want to confirm sample tables and views in your user schema. Expand Data Sources > dmuser1. This displays both tables and views. Click Tables to view tables; Click Views to have the list of views.
Back to Topic List Preparing Data for Data Mining The data used in the Data Mining process usually has to be collected from various locations. Also, some transformation of the data is usually required to prepare the data for Data Mining operations. Oracle Data Mining has powerful utilities for sampling, recoding, discretizing and creating new transformed attributes. In this section, you perform the following tasks: Display Data and Statistics Use Data Transformation wizard
Back to Topic List Display Data and Statistics Perform the following steps: 1. You can display the structure of a table or view. Click MINING_DATA_BUILD_V.
2. Click the Data tab to view a sample of the contents.
Scroll to the right of the window to view full details
Note: The default number of records shown is 100; Enter a different number in the Fetch Size window, then click Refresh to change the size of the display. Also, you can click Fetch Next to add more rows to the display. 3. Right-click MINING_DATA_BUILD_V to expose a menu with more options. You have access to Transformation wizards and other tools for data exploration.
4. To see a statistical summary, you can click one of the two selections depending on the data format type. Click Show Summary Single-Record.
Your output should match the image below:
For each numerical attribute, Maximum and Minimum values, as well as average and variance are shown. These statistics are calculated on a sample. The sample size in this example is 999. You can change the sample size by adjusting ODM Preferences, available on the Tools pulldown menu. 5. For any highlighted attribute, you can view a distribution of values. a. Highlight AFFINITY_CARD.
b.Click Histogram.
The default maximum number of bins is 10; this number can be changed by clicking Preferences. Numerical attributes are divided into the designated number of bins of equal width between the minimum and maximum. The bins are displayed in ascending order of attribute values. Categorical attributes are binned using the "Top N" method(N is the number of bins). The N values occurring most frequently have bins of their own; the remaining values are thrown into a bin labeled "Other". The bins are displayed in descending order of bin size. When done reviewing, click OK, then close the Data Summarization Viewer window. Back to Topic Use Data Transformation wizard The Mining Activity Guides assist you in joining data from disparate sources into one view or table. Also, they carry out transformations that are required by a particular algorithm. However, there are transforms that typically are completed on a standalone basis using one of the following Data Transformation wizards. Filter
Recode Derive field
In this tutorial, you learn how to use the Filter Single -Record Transformation wizard. Suppose you want to concentrate on customers between the ages of 21 and 35. You can filter the data to include only those people. Oracle Data Miner provides a filtering transformation to define a subset of the data based upon attribute values. Perform the following steps: 1. Click Data > Transform > Filter Single-Record.
2. Click Next on the Welcome page.
3. Identify the input data and click Next. In this example the view used is MINING_DATA_BUILD_V.
Note: If you accessed the wizard by right-clicking the table or view name, then the data is already known and this step is skipped. 4. Enter a name for the resultant view and click Next.
5. Enter the filter condition in the Filter text box. Alternatively, to construct the condition in a dialog box click the icon to the right of the Filter field.
6. In the Expression Editor page, peform the following steps: a. Double-click the attribute name AGE.
b. Click the >= button and type 21 to construct the first part of the condition shown.
c. Click AND to continue defining the full condition. Double Click AGE.
d. Click the <= button and type 35.
e. Click the Validate button to check that the condition is satisfied by a subset of the source data.
f. Click OK to display the condition in the Filter window.
7. Click Next.
8. Click Finish to complete the transformation.
Your output should match the image below:
Back to Topic Back to Topic List Using the Mining Activity Guides
When the data mining problem has been defined and the source data identified, there are two phases remaining in the Data Mining Process: Build/Evaluate models and deploy the results. Oracle Data Miner contains Activity Guides for the purpose of carrying out these phases with the minimum of required intervention. Moreover, the implicit and explicit choices and settings used in the Build activity can be passed on seamlessly to the Apply activity. Therefore, many operations usually required are hidden or eliminated. The Build Activity Wizard allows you to : Identify supplemental data to add to the case table(the basic source data) Select the Data Mining Functionality and Algorithm Adjust the Activity settings manually, rather than accept automatic settings
This example uses the Decision Tree Classification algorithm to create, evaluate, and apply a predictive model classifying customers as fitting the profile of either a highrevenue or low-revenue customer. You need to perform the following tasks: 1. Build a Predictive Model using a Mining Activity 2. Apply a Predictive Model using a Mining Activity Back to Topic List Build a Predictive Model using a Mining Activity 1. For this topic, you use the original view, not the view you just created. Click MINING_DATA_BUILD_V in the left navigator.
2. To launch the Activity Build Wizard, click Activity > Build.
3. The New Activity Wizard-Welcome page is displayed. Click Next.
4. On the Select Mining Type page, accept the default of Classification Function Type and the Decision Tree Algorithm. Click Next.
5. To select all the data to be mined from the case table or view, ensure that the box Join additional data with case table is not checked. If the box is checked, you can specify data from other tables and schemas to be joined with the case table. This example assumes that all the necessary data is in the case table MINING_DATA_BUILD_V. From the Unique Identifier drop down list, select CUST_ID. Then click Next.
6. You can specify the attribute to be predicted for a data mining problem. Click the AFFINITY_CARD radio button to indicate the target and then click Next. Note: The Target Attribute to be predicted in this tutorial is AFFINITY_CARD
7. Some evaluation results depend upon your choice of Preferred Target Value. In this example, you want to identify the high-revenue customers (Target Value = 1). Select 1 for the Preferred Target Value and click Next.
8. Enter a name for the activity that is descriptive and click Next.
9. The last step of the Activity Wizard is displayed. You can display the settings in the Activity. Click Advanced Settings.
10 The parameters have default settings that ensure a good result. The Build parameters determine . how the branches of the decision tree are constructed. Click OK.
11. Ensure that the Run upon finish check box is selected. Click Finish.
When the Activity Wizard is completed, the steps appropriate to the chosen activity are displayed. If you choose Run Upon Finish, the steps are executed to completion in sequence and a check appears on the right side of each step as it is completed. 12 To view the structure of the tree, click Result in the Build step. .
13 Your results might differ from the display due to the small size of the sample dataset. The default . view shows all nodes and the attribute values used to determine splits. You can highlight a node to show the rule for a record to be included in that node. The predicted value is the target value of the majority of records in that node. Confidence is the percentage of records in the node having the predicted taget value. Cases is the actual number of cases in the source data satisfying the rule for that node. Support is the percentage of cases in the source data satisfying the rule for that node.
14 You may want to view only the terminal nodes(known as Leaves) and eliminate the intermediate . nodes. Click the Show Leaves Only check box.
15 A decision tree is sensitive to missing values when applied to new data. However, if the splitting . attribute is missing, the ODM Decision Tree Algorithm provides an alternative attribute known as surrogate to be used in its place. If both the splitting attribute and its surrogate are missing, the predicted value is determined at the parent node of the tree. To view the surrogate, highlight a node and click the Surrogate radio button.
Close the Build Results window. 16 To evaluate the Decision Tree Model, click Result in the Test Metrics step. .
17 Click the Accuracy tab. .
18 Click the More Detail button to see the Confusion Matrix. .
19 The Confusion Matrix is calculated by applying the model to the hold-out sample from the test . campaign. The values of AFFINITY_CARD are known and are represented by the rows; the columns are the predictions made by the Classification Model. The upper right and lower left cells indicate potential prediction errors. Most business problem solutions attempt to minimize one type of error. For example, the lower left cell contains the number of “missed opportunities” – cases of high-revenue customers who were not correctly identified by the model. Click the ROC tab.
20 To explore possible changes in the model's parameters, you can slide the red line back and forth . by clicking the arrow buttons at the bottom right of the graph and observe the change in the Confusion Matrix until the values indicate a good solution to the problem. Click Lift.
21 You can view two graphs showing different interpretations of the lift calculations. With the . Cumulative Lift radio button selected, this indicates how much better the model does than the naive guess. In this example, the Decision Tree finds about 2.5 times more positive cases than chance in the first quantile. Select the Cumulative Positive Cases radio button.
21 The graph indicates how many cases must be analysed in order to find the desired number of . positive cases, that is high-revenue customers.
Back to Topic Apply a Predictive Model using a Mining Activity When a model is applied to new data, the input data must be prepared and transformed in exactly the same way that the source data for the Build activity was prepared. The Apply activity is based on a Build activity. This Build activity passes to the Apply activity whatever knowledge is required to prepare the input data appropriately. Perform the following steps: 1. Click Activity > Apply
2. Click Next.
3. Expand Classification and select the Build Activity that you just built. Then click Next.
4. Click the Select link to choose the input data for the Apply procedure.
5. Expand DMUSER1 and select MINING_DATA_APPLY_V. Then click OK.
6. Accept the default data source. Click Next.
7. A table will be produced containing the Prediction and Probability for each customer. You can select additional attributes for display in the results. Accept the defaults and click Next.
8. You have a choice of formats for the Predictions. Accept the default, which includes the Target value with the highest Probability for each case, then click Next.
9. Enter a Name for the activity and click Next.
10 Ensure that the Run upon finish check box is selected. Click Finish. .
11. Your Data Mining activity is running. When completed, click the Result link.
12 For each customer (case), the prediction of either 1 (high-revenue) or 0 (low-revenue) is . included, along with the Confidence in that Prediction (Probability). Cost is another measure of the value of the prediction – low Cost means high Probability. Since this model is a Tree, where each branch represents a rule, you can highlight a row and click Rule to see how the prediction was made.
13 The rule logic is explained. .
Back to Topic Summary In this tutorial, you learned how to: Launch Oracle Data Miner Prepare Data for Data Mining Use the Mining Activity Guides