Chapter-5 Association Rule Generation
This chapter proposes algorithms for generating association rules from association patterns (frequent itemsets). The notion of Valid Rule Pattern is introduced in Section 5.2. Section 5.3 discusses the issue of pruning the misleading association rules. In Section 5.4 an improved algorithm for generating strong and useful association rules is proposed. Section 5.5 gives the algorithms for constraint-based mining. While Section 5.6 proposes algorithm for generating representative rules.
5.1 Introduction Generation of strong association rules from association patterns is easy and straightforward compared to the mining of association patterns. As stated in Chapter 1 (Section 1.4), strong association rules are those which satisfy user-defined thresholds of minimum support and minimum confidence. In this chapter, association rule algorithm given in [4] has been modified and a new algorithm for generating representative rules has been developed to incorporate the following: •
It is hypothesized (explained in Section 5.2) that their would be certain rules generated from the association patterns, particularly in the dense database which will be valid requiring no further testing, in the process reducing the time taken in rule generation.
112
•
Some of the association rules generated by algorithms described in [1,4] may be misleading, thus need to be filtered out. Correlation analysis enables identification of such rules.
•
It has been observed by various researchers that most of the generated rules are not useful as they can be derived from other rules. Algorithms for generating representative rules (RR) have been proposed in literature [79]. However, these algorithms seem to be inefficient in the case of dense data. To handle such situations, an algorithm for generating representative rules has been proposed in this work.
5.2 Valid Rule Patterns Existing algorithms of association rule generation generate and test all possible rules for each l ∈ L. In this section a property (Property 5.1) has been proposed and proved for identifying frequent itemsets for which association rules can be generated directly without further testing the validity of such rules. Such patterns have been christened as Valid Rules Patterns (VRP) in this work.
Property 5.1: If the support of a frequent itemset is equal to or greater than the minimum confidence required for the rule, then all association rules generated by such itemset or from its subset will have confidence at least equal to the minimum required confidence.
Proof:
113
Let ‘ABC’ be a frequent itemset with support of 70% (say minimum support and minimum confidence thresholds are 30% and 60% respectively). Let the generated rule be, (subset of ABC) ⇒ (ABC – subset of ABC).
………(R5.1)
Then confidence of above generated rule can be given by: Confidence (C ) = ABC ⁄ subset of ABC where Xrepresents the support of X (in %) in the given data. But, ABC≤subset of ABC≤ 100 Thus, value of subset of ABCwill always be equal to or greater than the value of ABC and equal to or less than 100. For two possible extremes value of subset of ABC (Case I and Case II) the confidence of the above rule will be as follows: •
Case I
Let the value of subset of ABCbe equal to the value of ABC. In this case confidence of Rule (R5.1) will be 100%, which is greater than the minimum required confidence that is 60%. •
Case II
114
Let the value of subset of ABCbe equal to 100%. In this case confidence of Rule (R5.1) will be equal to the value of ABCthat is, 60%, which is equal to the minimum required confidence. Now consider the case when the value of subset of ABCis between ABC and 100%. •
Case III
Let the value ofsubset of ABCbe between value of ABCand 100%. In this situation confidence of Rule (R5.1) will be between the two extremes discussed above in case I and case II and hence this value will be greater than the minimum required confidence of the rule. Further note that each subset of ‘ABC’ will also be VRP and will always generate strong association rules. The above discussion establishes the Property 5.1 which suggests that there is no need to test the validity of any rule generated from Valid Rule Pattern or its subset, as they are always strong association rules. Usefulness of Property 5.1 Consider the transaction data shown below in Table 5.1. Assume min_supp = 50% and min_conf = 50%.
Table 5.1: A transaction data Tid
Items
115
100
A, C, E
200
B, D
300
A, B, C, D
400
B, C, D
Table 5.2 shows all the frequent itemsets (along with their actual support) present in the given data at the specified value of minimum support. Table 5.2: Frequent itemsets, L Frequent Itemset (l ) {A}
Support (S) 50%
{B}
75%
{C}
75%
{D}
75%
{A, C}
50%
{B, C}
50%
{B, D}
75%
{C, D}
50%
{B, C, D}
50%
Note that in the above table, support of each frequent itemset (l ) is equal to or greater than the value of min_conf. Hence, all these frequent itemsets are Valid Rule Patterns. Thus, according to Property 5.1, association rules can be obtained directly (without testing the validity of the rules) from these frequent itemsets. Author has selected an itemset ‘BCD’ as an example to demonstrate how the proposed property is useful. Table 5.3 is showing all strong association rules from frequent itemset ‘BCD’ along with their actual support (S) and confidence (C). Note that all generated rules are strong association rules as suggested by Property 5.1.
116
Table 5.3: Generated association rules from ‘BCD’ Rule No Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7 Rule 8 Rule 9 Rule 10 Rule 11 Rule 12 Rule 13 Rule 14
Rule B, C ⇒ D B, D ⇒ C C, D ⇒ B B ⇒ C, D C ⇒ B, D D ⇒ B, C A⇒C C⇒A B⇒C C⇒B B ⇒ D D⇒ B C⇒ D D⇒ C
Support (S) 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 75% 75% 50% 50%
Confidence(C ) 100% 66.7% 100% 66.7% 66.7% 66.7% 100% 66.7% 66.7% 66.7% 100% 100% 66.7% 66.7%
5.3 Strong Association Rules are not Always Useful In some cases support-confidence framework, suggested by Agrawal et al. [1], is not sufficient to find interesting rules. It can be explained by the following example: Consider the following rule derived in previous example (Rule 10 of Table 5.3). C⇒B
S = 50%
C = 66.7%
…. (R5.2)
Based on support-confidence framework and definition of strong association rule, above rule is a strong association rule. However, this rule is misleading since the overall support of ‘C’ is 75% (Table 5.2), even greater than 66.7%. In other words, a customer who buys item ‘C’ is less likely to buy item ‘B’ than a customer about whom we have no information. In reality there is a negative dependence between buying ‘C’ and buying ‘B’. Hence, all such rules are not useful.
117
Thus, association rules should not be used directly for prediction without further analysis. They do not necessarily indicate causation. They are, however, a starting point for further exploration, making them a popular tool for understanding data.
5.3.1 Correlation Analysis To filter out all misleading strong association rules (like (R5.2)), objective interestingness criterion (based on the statistic behind the data) can be used as one step towards the goal of weeding out uninteresting rules from presentation to the user. Thus need to study how the two events x and y are correlated.
Definition 5.1: Interestingness of events x, y can be defined as,
I(x, y) = p(x, y)/ p(x).p(y) where p(x) is the possibility of event x. The fact that Interestingness of events x and y is less than one indicates negative correlation. In the above example the Interestingness of C ⇒ B is:
I(C, B) = p( B ∪ C )/( p( B ) * p( C )) = supp(B ∪ C )/ (supp( B ) * supp(C )) = (2/4)/(3/4*3/4) = 0.89 < 1 This indicates buying ‘C’ is negatively associated with buying ‘B’. Hence, this rule is not interesting enough to be reported to the user.
118
5.4 Improved Algorithm for Association Rule Generation In the light of above, algorithm proposed in [4] can be improved. Improved algorithm is given below.
Algorithm: Improved algorithm for association rule generation Input: Frequent itemsets, L Minimum confidence threshold, min_conf Output: All interesting and strong association rules, ℜI Process: begin 1. Divide all given frequent itemsets in two category: VRP and non-VRP. 2. Generate strong association rules from frequent itemsets of non-VRP category by using algorithm as proposed in [4] and generate strong association rules from frequent itemsets of VRP category directly as mentioned above. Let generated strong association be ℜ. 3. ℜI = φ; 4. forall r : (X ⇒ Y) ∈ ℜ do
I(r) = supp(X∪Y)/ (supp(X) * supp(Y)); if I(r) ≥ 1 then
119
ℜI = ℜI ∪ {r }; endif; endfor; end;
5.5 Algorithms for Constraint-based Rule Mining In this section, algorithms have been proposed for constraint-based rule mining to discover more interesting rules. As stated earlier, user can impose constraints either before mining or during post-mining, depending upon his/her domain knowledge. Accordingly, two separate algorithms are given below.
Algorithm: Algorithm for constraint-based association rule mining Input: Database, D Minimum support, min_supp Minimum confidence, min_conf A set of constraints on rules, Ĉ Output: The association rules, ℜĈ satisfying thresholds and constraints Process: begin 1. Generate frequent itemsets, L satisfying min_supp threshold by using any algorithm proposed in Chapter 4.
120
2. Generate association rules, ℜI satisfying the min_conf threshold by using algorithm proposed in Section 5.4. 3. Obtain association rules ℜĈ from ℜI satisfying constraints Ĉ. ℜĈ = prun_rules(ℜI , Ĉ); end; function prun_rules(ℜI , Ĉ) //This function returns those rules in ℜI satisfying constraints, Ĉ. forall r ∈ ℜI do begin if r not satisfying Ĉ then ℜI = ℜI \
r
;
endif; endfor; return ℜĈ; Above algorithm is simple, but inefficient. Note that constraint checking is not performed until all the rules satisfying the threshold are generated. This obviously involves a lot of unnecessary processing.
5.5.1 Anti-monotone Property of Constraints This property states that if a set S violates the constraints, any superset of S violates the constraint as well. The formal definition is as follows:
Definition 5.2: A constraint Ĉ is anti-monotone iff for any set S, 121
S does not satisfy Ĉ
⇒ ∀ S’ ⊇ S , S’ does not satisfy Ĉ Based on this property mining can be improved by integrating anti-monotone constraints more deeply during frequent itemsets generation. Algorithm incorporating the idea is given below.
Algorithm: Improving mining process with anti-monotone constraints. Input: Database, D Minimum support, min_supp Minimum confidence, min_conf A set of anti-monotone constraints, Ĉam
Output: Association rules, ℜĈam satisfying thresholds and constraints Process: begin 1. k = 1; L = φ; 2. Find the 1- candidate itemset, C1; 3. Prune C1 based on given constraints Ĉam, impose_constraint(C1, Ĉam); 4. L1 = identify_frequent(C1, min_supp); // This function returns 1-frequent itemset, L1 5. Repeat 122
k = k + 1; Generate k-candidate itemset Ck from Lk-1; impose_constraint(Ck, Ĉam); //Prune Ck based on constraints Ĉam Lk = identify_frequent(Ck, min_supp); L = L ∪ Lk; Until Lk is empty 6. Generate association rules, ℜĈam from frequent itemset, L by using algorithm proposed in Section 5.4. function impose_constraint(Ck, Ĉam) //This function filters out all those itemsets in Ck which do not satisfy constraints Ĉam. for each itemset c ∈ Ck do { if c does not satisfy Ĉam then Ck = Ck\ c; //Filter out c from Ck endif; endfor; function identify_frequent(Ck, min_supp) //This function generates k-frequent itemset, Lk from k-candidate itemset, Ck. Lk = φ; for each itemset c ∈ Ck do S(c) = supp(c); //computes the support for c by using Property 4.1 if S(c) ≥ min_supp then
123
Lk = Lk ∪ c; endif; endfor; return Lk; Note that above algorithm performs the constraints check on candidate itemsets before the support computation. The anti-monotone property guarantees that any kitemset satisfying the constraint must be derived from the filtered out (k-1) itemsets. Thus, it can prune away a significant number of candidates that require support counting.
5.6 Representative Association Rules 5.6.1 Generation of Representative Rules from Maxpatterns Function genallrepresentative_max() is being proposed for generating representative rules from maxpatterns. In contrast to approach [79], the proposed algorithm starts searching from minimum condition and maximum consequent rule and increases its conditions on not getting the representative rule for existing conditions. Following are the properties used in the algorithm:
Property 5.2: If an association rule r is longer than an association rule r ’ then r ’ can’t be derived from r.
124
Property 5.3: If an association rule r : (X ⇒ Y) is shorter than an association rule r ’: (X’ ⇒ Y’) then
r
can be derived from r ’ iff X ∪ Y⊂ X’ ∪ Y’ and X ⊇ X’.
Property 5.3: If r : (X ⇒ Y) and r ’: (X’ ⇒ Y’) are different association rules of same size then r can be derived from r ’ iff X ∪ Y= X’ ∪ Y’ and X ⊃ X’.
function genallrepresentative_max(L, Lmax) RR = φ; MaxK = the cardinality of a set in L with maximal number of items; /*Loop1*/ for k = MaxK to 2 do if Lmaxk ≠ φ then RR = RR ∪ genk_representativepattern(RR, Lmaxk , k); endif; endfor; return RR; The genallrepresentative_max() controls the generation of representative association rules, RR from the maxpatterns (see Loop 1). Firstly, longest (starting from size MaxK) representative association rules are searched and added in RR, by the genk_representativepattern(). Then representative association rules shorter by 1 are 125
found and added to RR etc. Finally, representative association 2-rules are generated and added in RR. The computed set (RR) is the set of representative association rules. function genk_representativepattern(RR, Lmaxk, k) Rk = φ; TIRk = φ; /*Loop 2 */ forall l ∈ Lmaxk do RR’ = {r ∈ RR| l ⊂ r.antecedent ∪ r.consequent}; RR” = RR’; A1 = {{ l [1]},{ l [2]},……{ l [k]}}; IRk = φ; /* Loop 3*/ forall a ∈ A1 do begin if (|l |/|a| ≥ min_conf AND (NOT(isderivative(RR’, a ⇒ (l \ a)))) ) then Rk = Rk ∪ a ⇒ (l \ a); RR” = RR” ∪ a ⇒ (l \ a); else IRk = IRk ∪ a ⇒ (l \ a); endif; endfor; Rk = Rk ∪ gen_fixedsizerule(IRk, RR”); TIRk = TIRk ∪ IRk; endfor; Lmaxk-1 = Lmaxk-1 ∪ genpatternk-1(TIRk); return Rk;
126
The input parameters in genk_representativepattern() are RR, Lmaxk, and k while it returns Rk. Firstly, Rk and TIRk are initialized to null. Let l be a considered itemset in Lmaxk. The representative association k-rules are generated from this itemset as follows (Loop 2): Set RR’, and A1 are created. IRk is initialized to null. RR’ is the set of representative association rules longer than k which contain all items occurring in l. A1 is assigned the family of single-sets created from items in l. First iteration of Loop 3 starts. Each itemset, a ∈ A1 is treated as an antecedent of a candidate k-rule a ⇒ l \a. The confidence of each candidate rule is computed in order to check whether it is a strong association rule. The support of the rule is equal to | l |, while the support of the antecedent a is found as the support of the respective itemset in L. If the rule’s confidence is equal to or greater than min_conf then representativeness of the rule is tested by using function isderivative() which is based on Property
5.2 to 5.4,
otherwise send invalid rule in IRk as well as TIRk. If tested rule is representative then this rule will be sent to Rk. Loop 3 is repeated for each element of A1. Next generate all representative rules from IRk using function gen_fixedsizerule(). Loop 2 is repeated for each itemset. Now by using function genpatternk-1() generate all subset of size k-1 from each member of TIRk. At last it returns Rk. function gen_fixedsizerule(IRk, RR”) n = |IRk|;
//number of invalid rules of size k
if no_item(IRk [1].consequent) > 1 AND n > 1 then R’k = φ; for i = 1 to n-1 do begin {
r
= IRk [i];
for j = i + 1 to n do begin
127
r
’ = IRk [j];
if (left(r.antecedent, no_item(r.antecedent)-1) = left(r ’.antecedent, no_item(r ’.antecedent)-1)) then c = (r.antecedent ∪
r
’.antecedent) ⇒
( r.consequent ∩ r ’.consequent); if (|c|/|c.antecedent| ≥ min_conf ) then if NOT(isderivative(RR”, c)) then R’k= R’k ∪ c; endif; else if no_item(c.consequent) > 1 then IR’k = IR’k ∪ c; endif; endif; else endfor; endif; endfor; endfor; else return φ; endif; R’k = R’k ∪ gen_fixedsizerule(IR’k, RR”); return R’k; The brief description of gen_fixedsizerule() is given below. The input parameters to this function are IRk, and RR” while it returns R’k (representative rules of fixed size having antecedent of size between 2 and (k-1)). 128
Firstly, it finds number of invalid rules in IRk and assigns the value to n. If value of n and number of items IRk[1].consequent greater than 1 then antecedent of invalid rules are combined to get candidate rules having antecedent of one more higher size (This can be obtained by taking intersection of consequents of corresponding invalid rules). Each generated candidate rule is tested for validity. If it is not valid then send it in IR’ k, otherwise it is tested for representativeness. If it passes representativeness test then send it to R’k. If the number of invalid rules is greater than one then function calls itself. At last it returns R’k. fuction genpatternk-1(TIRk) Pk -1 = φ; n = |TIRk|; /* Loop 4 */ for i = 1 to n do begin cons = TIRk [i].consequent; sizecons = no_item(cons); S = subset(cons, sizecons – 1); /* Loop 5 */ for all s ∈ S AND (TIRk[i].antecedent+s)∉ Pk-1 do Pk-1 = Pk-1 ∪ (TIRk[i].antecedent ∪ s); endfor; endfor; return Pk-1; The brief description of genpatternk-1() is given below. TIRk, is the input to the function while it returns Pk-1. Firstly, it initializes Pk-1 to null and finds the number of invalid rules in TIRk and assigns the value to n. Now patterns of size lower than 1 are generated i.e k-1 size patterns (Loop 4) from all elements of 129
TIRk. Now each generated subset is tested whether it is present in Pk-1 (Loop 5), if not present, then append this element in Pk-1. At last it returns Pk-1.
function isderivative(RR’, c) RR’= RR’ order by no_item(antecedent) + no_item(consequent) desc, no_item(consequent) asc; for all r ’∈ RR’ do length = |
r
’|;
if length ≥ |c| then if r ’.antecedent = (r ’.antecedent ∩ c.antecedent) AND c = (c ∩
r
’ ) then isderivative = true; exitfunction;
endif; endif; isderivative = false; return isderivative; The brief description of isderivative() is given below. The input parameters of this function are RR’ and c while it returns true or false. This function tests whether an association rule can be derived from existing representative rules by using Property 5.2 to 5.4. If, agrees then it returns true, otherwise false.
130
5.6.2 Measure of Redundancy Adopting bench mark of [55], the author conducted experiments on the data outlined in Chapter 4 to determine the level of redundancy in the generated rules with user specified levels of objective measures. Figure 5.1 and 5.2. illustrate such results. It is seen from these figures that, in most cases the number of redundant rules is significantly larger than the number of represented rules. It is also noticed (Figure ---) that the level of redundancy in generated rules is quite high and depends upon the minimum support value; the lower the value of minimum support, the higher the redundancy level. Redundancy level is also seen to be influenced by the characteristics of the data and for the case of dense data, its value is much higher.
131
132