Machine Learning For Predictive Data Analytics.pdf

  • Uploaded by: Le Dinh Phong
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Machine Learning For Predictive Data Analytics.pdf as PDF for free.

More details

  • Words: 2,446
  • Pages: 45
What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Fundamentals of Machine Learning for Predictive Data Analytics Machine Learning for Predictive Data Analytics

John Kelleher and Brian Mac Namee and Aoife D’Arcy [email protected]

[email protected]

[email protected]

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

1

What is Predictive Data Analytics?

2

What is Machine Learning?

3

How Does Machine Learning Work?

4

What Can Go Wrong With ML?

5

The Predictive Data Analytics Project Lifecycle: Crisp-DM

6

Summary

Lifecycle

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

What is Predictive Data Analytics?

Lifecycle

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Predictive Data Analytics encompasses the business and data processes and computational models that enable a business to make data-driven decisions.

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Figure: Predictive data analytics moving from data to insights to decisions.

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Example Applications: Price Prediction Fraud Detection Dosage Prediction Risk Assessment Propensity modelling Diagnosis Document Classification ...

Underfitting/Overfitting

Lifecycle

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

What is Machine Learning?

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

(Supervised) Machine Learning techniques automatically learn a model of the relationship between a set of descriptive features and a target feature from a set of historical examples.

Summary

Figure: Using machine learning to induce a prediction model from a training dataset.

Figure: Using the model to make predictions for new query instances.

ID 1 2 3 4 5 6 7 8 9 10

O CCUPATION industrial professional professional professional industrial industrial professional professional industrial industrial

AGE 34 41 36 41 48 61 37 40 33 32

L OAN -S ALARY R ATIO 2.96 4.64 3.22 3.11 3.80 2.52 1.50 1.93 5.25 4.15

O UTCOME repaid default default default default repaid repaid repaid default default

What is the relationship between the descriptive features (O CCUPATION, AGE, L OAN -S ALARY R ATIO) and the target feature (O UTCOME)?

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if

Lifecycle

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if This is an example of a prediction model

Lifecycle

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if This is an example of a prediction model This is also an example of a consistent prediction model

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Summary

if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if This is an example of a prediction model This is also an example of a consistent prediction model Notice that this model does not use all the features and the feature that it uses is a derived feature (in this case a ratio): feature design and feature selection are two important topics that we will return to again and again.

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Summary

What is the relationship between the descriptive features and the target feature (O UTCOME) in the following dataset?

ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Amount 245,100 90,600 195,600 157,800 150,800 133,000 193,100 215,000 83,000 186,100 161,500 157,400 210,000 209,700 143,200 203,000 247,800 162,700 213,300 284,100 154,000 112,800 252,000 175,200 149,700

Salary 66,400 75,300 52,100 67,600 35,800 45,300 73,200 77,600 62,500 49,200 53,300 63,900 54,200 53,000 65,300 64,400 63,800 77,400 61,100 32,300 48,900 79,700 59,700 39,900 58,600

LoanSalary Ratio 3.69 1.2 3.75 2.33 4.21 2.94 2.64 2.77 1.33 3.78 3.03 2.46 3.87 3.96 2.19 3.15 3.88 2.1 3.49 8.8 3.15 1.42 4.22 4.39 2.55

Age 44 41 37 44 39 29 38 17 30 30 28 30 43 39 32 44 46 37 21 51 49 41 27 37 35

Occupation industrial industrial industrial industrial professional industrial professional professional professional industrial professional professional professional industrial industrial industrial industrial professional industrial industrial professional professional professional professional industrial

House farm farm farm apartment apartment farm house farm house house apartment farm apartment farm apartment farm house house apartment farm house house house apartment farm

Type stb stb ftb ftb stb ftb ftb ftb ftb ftb stb stb ftb ftb ftb ftb stb ftb ftb ftb stb ftb stb stb stb

Outcome repaid repaid default repaid default default repaid repaid repaid default repaid repaid repaid default default repaid repaid repaid default default repaid repaid default default default

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

if L OAN -S ALARY R ATIO < 1.5 then O UTCOME=’repay’ else if L OAN -S ALARY R ATIO > 4 then O UTCOME=’default’ else if AGE < 40 and O CCUPATION =’industrial’ then O UTCOME=’default’ else O UTCOME=’repay’ end if

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Summary

if L OAN -S ALARY R ATIO < 1.5 then O UTCOME=’repay’ else if L OAN -S ALARY R ATIO > 4 then O UTCOME=’default’ else if AGE < 40 and O CCUPATION =’industrial’ then O UTCOME=’default’ else O UTCOME=’repay’ end if The real value of machine learning becomes apparent in situations like this when we want to build prediction models from large datasets with multiple features.

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

How Does Machine Learning Work?

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Machine learning algorithms work by searching through a set of possible prediction models for the model that best captures the relationship between the descriptive features and the target feature.

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Summary

Machine learning algorithms work by searching through a set of possible prediction models for the model that best captures the relationship between the descriptive features and the target feature. An obvious search criteria to drive this search is to look for models that are consistent with the data.

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Summary

Machine learning algorithms work by searching through a set of possible prediction models for the model that best captures the relationship between the descriptive features and the target feature. An obvious search criteria to drive this search is to look for models that are consistent with the data. However, because a training dataset is only a sample ML is an ill-posed problem.

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Table: A simple retail dataset ID 1 2 3 4 5

B BY no yes yes no no

A LC no no yes no yes

O RG no yes no yes yes

G RP couple family family couple single

Lifecycle

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Table: A full set of potential prediction models before any training data becomes available. B BY no no no no yes yes yes yes

A LC no no yes yes no no yes yes

O RG no yes no yes no yes no yes

G RP ? ? ? ? ? ? ? ?

M1 couple single family single couple couple single single

M2 couple couple family single couple family family single

M3 single single single single family family family family

M4 couple couple single single family family family family

M5 couple couple single single family family family couple

...

...

M6 561 couple single family couple family couple single family

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Table: A sample of the models that are consistent with the training data B BY no no no no yes yes yes yes

A LC no no yes yes no no yes yes

O RG no yes no yes no yes no yes

G RP couple couple ? single ? family family ?

M1 couple single family single couple couple single single

M2 couple couple family single couple family family single

M3 single single single single family family family family

M4 couple couple single single family family family family

M5 couple couple single single family family family couple

...

...

M6 561 couple single family couple family couple single family

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Table: A sample of the models that are consistent with the training data B BY no no no no yes yes yes yes

A LC no no yes yes no no yes yes

O RG no yes no yes no yes no yes

G RP couple couple ? single ? family family ?

M1 couple single family single couple couple single single

M2 couple couple family single couple family family single

M3 single single single single family family family family

M4 couple couple single single family family family family

M5 couple couple single single family family family couple

...

...

M6 561 couple single family couple family couple single family

Notice that there is more than one candidate model left! It is because a single consistent model cannot be found based on a sample training dataset that ML is ill-posed.

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Consistency ≈ memorizing the dataset. Consistency with noise in the data isn’t desirable. Goal: a model that generalises beyond the dataset and that isn’t influenced by the noise in the dataset. So what criteria should we use for choosing between models?

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Summary

Inductive bias the set of assumptions that define the model selection criteria of an ML algorithm. There are two types of bias that we can use: 1 2

restriction bias preference bias

Inductive bias is necessary for learning (beyond the dataset).

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Summary

How ML works (Summary) ML algorithms work by searching through sets of potential models. There are two sources of information that guide this search: 1 2

the training data, the inductive bias of the algorithm.

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

What Can Go Wrong With ML?

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

No free lunch! What happens if we choose the wrong inductive bias: 1 2

underfitting overfitting

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Table: The age-income dataset. ID 1 2 3 4 5

AGE 21 32 62 72 84

I NCOME 24,000 48,000 83,000 61,000 52,000

Lifecycle

Summary

What is ML?

How Does ML Work?

80000

What is Predictive Data Analytics?

Underfitting/Overfitting

60000



Income

● ●

20000

40000





0

20

40

Age

60

80

100

Lifecycle

Summary

What is ML?

How Does ML Work?

80000

What is Predictive Data Analytics?

Underfitting/Overfitting

60000



Income

● ●

20000

40000





0

20

40

Age

60

80

100

Lifecycle

Summary

What is ML?

How Does ML Work?

80000

What is Predictive Data Analytics?

Underfitting/Overfitting

60000



Income

● ●

20000

40000





0

20

40

Age

60

80

100

Lifecycle

Summary

What is ML?

How Does ML Work?

80000

What is Predictive Data Analytics?

Underfitting/Overfitting

60000



Income

● ●

20000

40000





0

20

40

Age

60

80

100

Lifecycle

Summary

40

Age

60

80

(a) Dataset

100

20

40

Age

60

80

100

(b) Underfitting

80000 60000

● ● ●

20000



0



Income

● ●

20000

40000

20

Lifecycle

40000

● ●

20000

40000 20000



0



40000

● ●



Income



Income



Underfitting/Overfitting

60000



60000

60000



Income

How Does ML Work?

80000

What is ML?

80000

80000

What is Predictive Data Analytics?



0

20

40

Age

60

80

100

(c) Overfitting



0

20

40

Age

60

80

100

(d) Just right

Figure: Striking a balance between overfitting and underfitting when trying to predict age from income.

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

There are many different types of machine learning algorithms. In this course we will cover four families of machine learning algorithms: 1 2 3 4

Information based learning Similarity based learning Probability based learning Error based learning

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

The Predictive Data Analytics Project Lifecycle: Crisp-DM

Summary

Business   Understanding  

Data   Understanding  

Data   Prepara1on   Deployment  

Data  

Modeling  

Evalua1on  

Figure: A diagram of the CRISP-DM process which shows the six key phases and indicates the important relationships between them. This figure is based on Figure 2 of [1].

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Summary

Lifecycle

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

Machine Learning techniques automatically learn the relationship between a set of descriptive features and a target feature from a set of historical examples. Machine Learning is an ill-posed problem: 1 2 3 4

generalize, inductive bias, underfitting, overfitting.

Striking the right balance between model complexity and simplicity (between underfitting and overfitting) is the hardest part of machine learning.

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

Lifecycle

[1] R. Wirth and J. Hipp. Crisp-dm: Towards a standard process model for data mining. In Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pages 29–39. Citeseer, 2000.

Summary

What is Predictive Data Analytics?

What is ML?

How Does ML Work?

Underfitting/Overfitting

1

What is Predictive Data Analytics?

2

What is Machine Learning?

3

How Does Machine Learning Work?

4

What Can Go Wrong With ML?

5

The Predictive Data Analytics Project Lifecycle: Crisp-DM

6

Summary

Lifecycle

Summary

Related Documents


More Documents from "SRINIVASA RAO GANTA"