RULE-BASED CLASSIFIER Lecture 19/07-09-09 (Monday) (this lecture was suppose to happen on 03-09-09, DB2 WS)
L19-03-09-09
1
Rule-Based Classifier
• Classify records by using a collection of “if…then…” rules • Rule:
(Condition) → y
– where • Condition is a conjunctions of attributes • y is the class label
– LHS: rule antecedent or condition – RHS: rule consequent – Examples of classification rules: • (Blood Type=Warm) ∧(Lay Eggs=Yes) → Birds • (Taxable Income < 50K) ∧(Refund=Yes) → Evade=No
L19-03-09-09
2
Rule-based Classifier (Example) Name
human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle
Blood Type
warm cold cold warm cold cold warm warm warm cold cold warm warm cold cold cold warm warm warm warm
Give Birth
yes no no yes no no yes no yes yes no no yes no no no no no yes no
Can Fly
no no no no no no yes yes no no no no no no no no no yes no yes
Live in Water
no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no
Class
mammals reptiles fishes mammals amphibians reptiles mammals birds mammals fishes reptiles birds mammals fishes amphibians reptiles mammals birds mammals birds
R1: (Give Birth = no) ∧(Can Fly = yes) → Birds R2: (Give Birth = no) ∧(Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧(Blood Type = warm) → Mammals R4: (Give Birth = no) ∧(Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians L19-03-09-09
3
Application of Rule-Based Classifier
• A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no) ∧(Can Fly = yes) → Birds R2: (Give Birth = no) ∧(Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧(Blood Type = warm) → Mammals R4: (Give Birth = no) ∧(Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians Name
hawk grizzly bear
Blood Type
warm warm
Give Birth
Can Fly
Live in Water
Class
no yes
yes no
no no
? ?
The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal L19-03-09-09
4
Rule Coverage and Accuracy
• Coverage of a rule:
– Fraction of records that satisfy the antecedent of a rule. – Consider a dataset D, and a rule as r:A y, then,
Tid Re
Coverage (r)=|A|/|D| • Accuracy of a rule: – Fraction of records that satisfy both the antecedent and consequent of a rule.
Accuracy(r)= |A y|/|A| L19-03-09-09
1
Ye
(Status=Single) → No Coverage = 40%, Accuracy = 50% 5
Name
human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle
Blood Type
warm cold cold warm cold cold warm warm warm cold cold warm warm cold cold cold warm warm warm warm
Give Birth
yes no no yes no no yes no yes yes no no yes no no no no no yes no
Can Fly
no no no no no no yes yes no no no no no no no no no yes no yes
Live in Water
no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no
Class
mammals reptiles fishes mammals amphibians reptiles mammals birds mammals fishes reptiles birds mammals fishes amphibians reptiles mammals birds mammals birds
The rule: (gives birth=yes) and (body temp= warm_blooded) Mammals Coverage=(6/20)*100=30% and Accuracy = (6/6)*100= 100%
6
How does Rule-based Classifier Work? R1: (Give Birth = no) ∧(Can Fly = yes) → Birds R2: (Give Birth = no) ∧(Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧(Blood Type = warm) → Mammals R4: (Give Birth = no) ∧(Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians Name
lemur turtle dogfish shark
Blood Type
warm cold cold
Give Birth
Can Fly
Live in Water
Class
yes no yes
no no no
no sometimes yes
? ? ?
A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules L19-03-09-09
7
Characteristics of Rule-Based Classifier • Mutually exclusive rules – Classifier contains mutually exclusive rules if the rules are independent of each other – The rules in a rule set R are ME if no two rules in R are triggered by the same record. – The above property ensures that every record is covered by at most one rule.
• Exhaustive rules – Classifier has exhaustive coverage if it accounts for every possible combination of attribute values – This property ensures that each record is covered by at least one rule
L19-03-09-09
8
R1: {body_temp=cold_blooded}
non-mammals
R2: {body_temp=warm-blooded} ∧ {gives birth= yes) Mammals R3: {body_temp=warm-blooded} ∧ {gives birth=No} non-mammlas
•This rule set is ME and Exhaustive •Together these 2 properties ensure that every record is covered by exactly one rule. •Many rule-based classifiers (also the one shown previously )do not have these properties. •What if a rule set is not exhaustive????? L19-03-09-09
9
• Suppose a rule set is not exhaustive, then a default rule is considered: rd: () yd this rule covers the cases that are not covered by the other rules and assigned a default label yd. • If a rule not ME then a record may be covered by more than one rule thus providing conflicting class labels. L19-03-09-09
10
Rules • Non mutually exclusive rules – A record may trigger more than one rule – Solution? • Ordered rule set
• Non exhaustive rules – A record may not trigger any rules – Solution? • Use a default class
L19-03-09-09
11
Two ways to overcome the problem when rules are not ME • Ordered rule set – Rules are ordered in decreasing order of their priority – Priority can be defined in many ways based on accuracy, coverage or the order in which the rules were generated.
• Unordered rule set – Allows a test record to trigger multiple classification rules and consider consequent of each rule as a vote for a particular class. – use voting schemes L19-03-09-09
12
Ordered Rule Set
• Rules are rank ordered according to their priority – An ordered rule set is known as a decision list. • When a test record is presented to the classifier – It is assigned to the class label of the highest ranked rule it has triggered – If none of the rules fired, it is assigned to the default class
R1: (Give Birth = no) ∧(Can Fly = yes) → Birds R2: (Give Birth = no) ∧(Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧(Blood Type = warm) → Mammals R4: (Give Birth = no) ∧(Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians Name
turtle
Blood Type
cold
Give Birth
Can Fly
Live in Water
Class
no
no
sometimes
?
L19-03-09-09
13
Unordered rules – Allows a test record to trigger multiple classification rules and consider consequent of each rule as a vote for a particular class. – The votes are then compared to determine the class label of the test record. – The record is usually assigned to the class that has highest no. of votes.
L19-03-09-09
14
Rule Ordering Schemes
Rule based ordering
Class based ordering
L19-03-09-09
15
• Rule-based ordering – Individual rules are ranked based on their quality – This scheme ensures that every test record is classified by the “best” rule covering it. – If the no. of rules is large, interpreting lower ranked rules become cumbersome.
L19-03-09-09
16
• Class-based ordering – Rules that belong to the same class appear together in the rule set R. – The relative ordering among the rules from the same class doesn’t matter. – Most of the well known rule based classifiers (like C4.5 rules and RIPPER) employ class based ordering.
L19-03-09-09
17
R u le -b a s e d O rd e rin g ( R e fu n d = Y e s ) = = > N o
C la s s -b a s ( R e fu n d = Y e s ) = = >
( R e fu n d = N o , M a r ita l S ta tu s = { S in g le ,D (ivRoer fu c endd} =, N o , M a r ita T a x a b le In c o m e < 8 0 K ) = = > N o T a x a b le In c o m e < 8
( R e fu n d = N o , M a r ita l S ta tu s = { S in g le ,D (ivRoer fu c endd} =, N o , M a r ita T a x a b le In c o m e > 8 0 K ) = = > Y e s ( R e fu n d = N o , M a r ita ( R e fu n d = N o , M a r ita l S ta tu s = { M a r r ie d } )T =a =x a> bNleo In c o m e > 8 L19-03-09-09
18
From Decision Trees To Rules Classification Rules (Refund=Yes) ==> No
Refund Yes
No
NO
Marita l Status
{Single, Divorced}
NO
(Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes
{Married}
(Refund=No, Marital Status={Married}) ==> No
NO
Taxable Income < 80K
(Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No
> 80K YES
Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree L19-03-09-09
19
Rules Can Be Simplified Refund Yes
No
NO {Single, Divorced}
Marita l Status
NO
Taxable Income < 80K NO
{Married}
> 80K YES
Tid Refund Marital Status
Taxable Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
10
Initial Rule:
(Refund=No) ∧(Status=Married) → No
Simplified Rule: (Status=Married) → No L19-03-09-09
20