Rule Based Classification

Uploaded by: Allison Collier
0
0

July 2020
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Rule Based Classification as PDF for free.

More details

Words: 2,048
Pages: 42

Preview
Full text

Rule based classification Lecture 21/10-09-09 (no class b’coz of placements) Lecture 22/11-09-09 (No class) Lecture 23/12-09-09 (No Class) Lecture 24/14-09-09

L22/11-09-09

1

Building Classification Rules • Direct Method: • Extract rules directly from data. • e.g.: RIPPER, CN2, Holte’s 1R

• Indirect Method: • Extract rules from other classification models (e.g. decision trees, neural networks, etc). • e.g: C4.5rules

L22/11-09-09

2

Direct Method: Sequential Covering Algorithm • Extracts rules directly from data. • Extraction of rules are done one class at a time if no. of classes are more. • The criterion for selecting which class should be considered, depends on no. of factors like class prevalence.

L22/11-09-09

3

Algorithm 1: Let E - training Records, A – set of attr. value pairs, {(Aj,vj)}. 2: Let Yo be an ordered set of classes {y1,y2,y3-------- ,yk}. 3: Let R = { } be initial ∈ rule (decision) list. 4: for each class y Yo- {yk} do 5: while stopping condition is not met do 6:

r

Learn-One-Rule (E, A, y). Remove training records from E that are covered by r. Add r to the bottom of the rule list: R R V r. end while end for Insert the default rule, { } yk, to the bottom of the rule list R

L22/11-09-09

4

Sequential Covering Algorithm (in short for your quick reference) 1. Start from an empty rule set. 2. Extract a rule using the Learn-One-Rule function. 3. Remove training records covered by the rule. 4. Repeat Step (2) and (3) until stopping criterion is met. L22/11-09-09

5

Learn-One-Rule function • Objective - extract a rule that covers maximum of +ve examples and none or few negative examples in the training dataset. • Computationally expensive b’coz of exponential size of search space. • Generates an initial rule and keeps on growing (refining) it till stopping criteria is met. L22/11-09-09

6

Example of Sequential Covering

(ii) Step 1

(i) Original Data L22/11-09-09

7

Example of Sequential Covering… R1

R1

R2 (iii) Step 2

(iv) Step 3

L22/11-09-09

8

Aspects of Sequential Covering • Rule Growing Strategy • Instance Elimination • Rule Evaluation • Stopping Criterion • Rule Pruning

L22/11-09-09

9

Rule Growing

• Two common strategies : 1. General-to-specific 2. Specific-to-general

L22/11-09-09

Tid R 1

Y

2

N 10

General- to- specific r: { }

Rule has poor quality as it covers all examples in the training set

y

Conjuncts are subsequently added to improve the quality of the rule

L22/11-09-09

11

Specific to general Refund=No, Status=Single, Income=85K (Class=Yes)

Refund=No, Status=Single, Income=90K (Class=Yes)

Refund=No, Status = Single (Class = Yes)

(b) Specific-to-general L22/11-09-09

12

Specific-to-general

One of the positive examples are chosen randomly as initial seed

Body temp=warm-blooded, Skin cover=hair, gives birth=yes, aquatic creature=no, Aerial creature= no, has legs= yes, hibernates=no => MAMMALS

Body temp=warm-blooded, Skin cover=hair, gives birth=yes, aquatic creature=no, Aerial creature= no, has legs= yes=> Mammals Skin cover=hair, gives birth=yes, aquatic creature=no, Aerial creature= no, has legs= yes, hibernates=no => MAMMALS L22/11-09-09

One of the conjuncts are removed so that it can cover more +ve examples 13

Rule Evaluation

• Suppose a tr set contains 60 +ve and 100 –ve examples. • Consider two rules: • R1: covers 50 +ve examples and 5 -ve examples • R2: covers 2 +ve ex and 0 –ve ex. The accuracy for R1 is 90.9% and R2 is 100%.

Still R1 is better b’coz its coverage. Other measures clearly state

L22/11-09-09

14

Rule Evaluation • Metrics: – Accuracy

nc = n k

R = 2∑ f i log( f i / ei ) – Likelihood ratio Statistics, i =1 nc + 1 = – Laplace n +k n : total no. of instances

– M-estimate OR

=

nc : Number of instances covered by rule

nc + kp n +k

k : Number of classes

nc + pm = n+m L22/11-09-09

p : Prior probability

15

FOIL’s Information gain (Rule Evaluation Contd…) + covers p0 +ve examples and n0 –ve • r: A examples • Suppose we add a new conjunct B, the extended rule become + covers p1 +ve examples and n1 – • r’: A^B ve examples – Then FOIL IG=

p0 p1 p1 × (log 2 − log 2 ) p1 + n1 p 0 + n0 L22/11-09-09

16

• Q2. Consider two rules: – R1:A C – R2:A ∧B C Suppose R1 is covered by 350 +ve examples and 150 – ve examples, while R2 is covered by 300 +ve examples and 50 –ve examples. Compute the FOIL’s information gain for rule R2 wrt R1.

• Q4. Page no. 317

L22/11-09-09

17

Aspects of Sequential Covering Algorithm • Rule Growing Strategy • Rule Evaluation • Stopping Criterion • Rule Pruning • Instance Elimination L22/11-09-09

18

Stopping Criterion and Rule Pruning

• Stopping criterion

– Compute the gain – If gain is not significant, discard the new rule

• Rule Pruning – Remove one of the conjuncts in the rule – Compare error rate on validation set before and after pruning – If error improves, prune the conjunct

L22/11-09-09

19

Aspects of Sequential Covering Algorithm • Rule Growing Strategy • Rule Evaluation • Stopping Criterion • Rule Pruning • Instance Elimination L22/11-09-09

20

Instance Elimination • Why do we need to eliminate instances? – Otherwise, the next rule is identical to previous rule

R3 R1 +

class = +

+ -

class = -

+

+ + + + + ++ + + + + + -

-

+ +

R + +

+

+ + +

+ +

-

-

instances?

-

-

-

-

•Why do we remove +ve and –ve -Ensure that the next rule is different -Prevent underestimating the accuracy of rule. L22/11-09-09

+

-

21

Indirect methods for rule-based classifiers and Instance-Based Classifiers Lecture 26/17-09-09

L22/11-09-09

22

Indirect Methods (Generating rule set from Decision tree) P No

Yes

Q No

-

Rule Set

R Yes

No

Yes

+

+

Q No

Consider r2, r3, r5

-

r1: (P=No,Q=No) ==> r2: (P=No,Q=Yes) ==> + r3: (P=Yes,R=No) ==> + r4: (P=Yes,R=Yes,Q=No) ==> r5: (P=Yes,R=Yes,Q=Yes) ==>

Yes

+

It shows that the class label is always predicted as + when Q=Yes So we can say r2’: (Q=yes) r3: (P=yes)

+

∧ (R=no)

Simplified rules

+

23

Classification rules extracted from DT Give Birth? Yes

C4.5rules: (Give Birth=No, Lives in water=No, Can Fly=Yes) →

No

Birds

(Give Birth=No, Live in Water=Yes) → Fishes

Live In W ater?

Mammals Yes

(Give Birth=Yes) → Mammals (Give Birth=No, Live in Water=No, Can Fly=No,) → Reptiles

No

( ) → Amphibians

Sometimes Fishes

Can Fly?

Amphibians

Yes Birds

No

Reptiles L22/11-09-09

24

Advantages of Rule-Based Classifiers • • • • •

As highly expressive as decision trees Easy to interpret Easy to generate Can classify new instances rapidly Performance comparable to decision trees

L22/11-09-09

25

Instance–Based Classifiers

L22/11-09-09

26

Eager learners vs. Lazy learners • Eager learners – DT and rule-based classifiers are ex. of eager learners – They are designed to learn a model that maps the input attr. to the class label as soon as training data becomes available.

• Lazy Learners – They delay the process of modeling the tr. Data until it is provided with an unseen instance to be classified. – Instance-based classifiers belong to this class. – They memorizes the entire tr. data and perform classification only when attr. of a test instance matches it completely.

L22/11-09-09

27

Instance-Based Classifiers Set of Stored Cases Atr1

……...

AtrN

• Store the training records • Use training records to predict the class label of unseen cases

Class A B B

Unseen Case

C

Atr1

A

……...

AtrN

C B

L22/11-09-09

28

Instance Based Classifiers • Examples: – Rote-learner (classifier) • Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly. • Its drawback is that some test rec. may not at all be classified b’coz they don’t match any of the instance in the tr. Data. • SLOUTION??

– Nearest neighbor • Uses k “closest” points (nearest neighbors) for performing classification L22/11-09-09

29

Nearest Neighbor Classifiers • Basic idea: The main idea or justification of nearest neighbor classifier is emphasized with the following example:

• If it walks like a duck, quacks like a duck, looks like a duck, then it’s probably a duck Compute Distance

TTraining eRecords s t R e

L22/11-09-09

Choose k of the “nearest” records 30

• A nearest-neighbor classifier represents each instance as a d-dimensional data point in space, where d is the no. of attributes.

L22/11-09-09

31

Nearest-Neighbor Classifiers Unknown record

●

Requires three things – The set of stored records – Distance Metric to compute distance between records – The value of k, the number of nearest neighbors to retrieve

●

To classify an unknown record: – Compute distance to other training records – Identify k nearest neighbors – Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) 32

Definition of Nearest Neighbor

X

(a) 1-nearest neighbor

X

X

(b) 2-nearest neighbor

(c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that have the k smallest distance to x L22/11-09-09

33

Nearest Neighbor Classification • Compute distance between two points: – Euclidean distance

d ( p, q ) =

∑ ( pi − qi ) i Determine the class from nearest neighbor list 2

– take the majority vote of class labels among the k-nearest neighbors

34

Nearest Neighbor Classification… • Choosing the value of k: – If k is too small, sensitive to noise points – If k is too large, neighborhood may include points from other classes

X

K-nearest classification with large k L22/11-09-09

35

Nearest Neighbor Classification… • Scaling issues – Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes – Example: • height of a person may vary from 1.5m to 1.8m • weight of a person may vary from 90lb to 300lb • income of a person may vary from $10K to $1M

L22/11-09-09

36

Nearest neighbor Classification… • k-NN classifiers are lazy learners – It does not build models explicitly – Unlike eager learners such as decision tree induction and rule-based systems – Classifying unknown records are relatively expensive

L22/11-09-09

37

Algorithm • • •

1: Let k be the no. of NN and D be the set of tr. examples. 2: for each test example z=(x’,y’) do 3: compute d(x’,x), the distance between z and every example, (x,y) D. Select Dz D, the set of k closest training • 4: ⊆ examples to z. • 5: y’=

∈

arg max ∑( xi , yi )∈Dz I (v = y i )

•

6: end for

v

L22/11-09-09

38

• Once the NN list is obtained, the test sample is classified based on the majority class of its NN : – Majority voting: y’=

arg max ∑( xi , yi )∈Dz I (v = y i ) v

v is the class label, yi is the class label for one of the NN and I(.) is an indicator function that returns the value 1 if its argument is true and 0 otherwise.

L22/11-09-09

39

• In majority voting approach, every neighbor has the same impact on the classification. (Refer slide 15 fig.) • This factor makes classification algo. sensitive to the choice of k. • In order to reduce this impact of k, we assign weight to each of the distance for NN say xi: wi=1/d(x’,xi)2. L22/11-09-09

40

• As a result of applying wt. to the distance, the tr. Ex that are located far away from z have a weaker impact on the classification. • Using the distance-weighted voting scheme, the class label can be determined: Distance-wtd. Voting y’=

arg max ∑( xi , yi )∈Dz wi × I (v = y i ) v

L22/11-09-09

41

Characteristics • 1. NN classification is a part of instance-based learning. • 2. Lazy learners like NN classifiers do not need model building. • 3. NN classifiers make their predictions based on local information whereas DT and rule-based classifiers attempt to find a global model that fits the entire input space. • 4. Appropriate proximity measures play a significant role in NN classifiers. L22/11-09-09

42

Rule Based Classification

Overview

More details

Related Documents

Rule Based Classification

Rule Based Classifier

Biogeography Based Satellite Image Classification

Rule

Rule

Classification

More Documents from ""

L12 Dectree14-08-09

Similarity And Dissimilarity

Association Rule Mining

Rule Based Classification

Data Preprocessing

Data Reduction :1. Aggregation