An Instance-based Learning Approach Based On Grey Relational Structure

  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View An Instance-based Learning Approach Based On Grey Relational Structure as PDF for free.

More details

  • Words: 6,393
  • Pages: 9
Appl Intell (2006) 25:243–251 DOI 10.1007/s10489-006-0105-0

An instance-based learning approach based on grey relational structure Chi-Chun Huang · Hahn-Ming Lee

C Springer Science + Business Media, LLC 2006 

Abstract In instance-based learning, the ‘nearness’ between two instances—used for pattern classification—is generally determined by some similarity functions, such as the Euclidean or Value Difference Metric (VDM). However, Euclidean-like similarity functions are normally only suitable for domains with numeric attributes. The VDM metrics are mainly applicable to domains with symbolic attributes, and their complexity increases with the number of classes in a specific application domain. This paper proposes an instancebased learning approach to alleviate these shortcomings. Grey relational analysis is used to precisely describe the entire relational structure of all instances in a specific domain. By using the grey relational structure, new instances can be classified with high accuracy. Moreover, the total number of classes in a specific domain does not affect the complexity of the proposed approach. Forty classification problems are used for performance comparison. Experimental results show that the proposed approach yields higher performance over other methods that adopt one of the above similarity functions or both. Meanwhile, the proposed method can yield higher performance, compared to some other classification algorithms. Keywords Instance-based learning . Grey relational analysis . Grey relational structure . Pattern classification C.-C. Huang () . H.-M. Lee Department of Information Management, National Kaohsiung Marine University, NKMU Kaohsiung, Taiwan 811, R.O.C. e-mail: [email protected] H.-M. Lee Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan e-mail: [email protected]

1 Introduction In recent years, different learning algorithms, such as instance-based learning (IBL) [1, 2, 17, 23, 33, 34], rule induction [3, 14, 28, 36], decision trees [11, 13, 30], decision tables [16, 27] and neural networks [6, 24] have been investigated to solve classification problems. As a comparable alternative, IBL is quite straightforward to understand and can yield excellent performance [1, 2], i.e., high classification accuracy. In IBL, a training set of labeled instances is first collected by the learning system. A new, unseen instance is then classified according to its ‘nearest’ training instance or instances [7, 12]. Based on the nearest-neighbor rule [7, 12], instance-based learning methods include nearest-neighbor classifiers [4, 7, 15], similarity-based induction (learning), lazy learners, memory based learning or case-based reasoning systems [34, 38]. In general, the ‘nearness’ between two instances in IBL is determined by a similarity function. In the machine learning literature, two metrics, Euclidean metric [1] and the Value Difference Metric (VDM) [34], are widely used for IBL. Euclidean-like similarity functions are normally suitable for domains with numeric attributes. The VDM is mainly applicable to domains with symbolic attributes; however, the number of training instances with different values for each attribute in a specific application domain should be determined prior to learning, such that the complexity of using the VDM increases with the number of classes [39]. In this paper, an instance-based learning approach based on grey relational structure (GRS) is proposed to alleviate these shortcomings. Here, grey relational analysis (GRA) [8–10, 29] is used to precisely describe the entire relational structure for all instances in a specific application domain. Some properties of GRA, including wholeness, asymmetry and normality, are helpful for the learning tasks (as stated in Section 3). Springer

244

Appl Intell (2006) 25:243–251

By using the so-called grey relational structure, new instances in a specific domain can be classified with high accuracy. Moreover, the total number of classes in a specific domain does not affect the complexity of the proposed approach. Forty classification problems are used for performance comparison. Experimental results have shown that the proposed approach yields higher performance over other methods that adopt one of the above-mentioned two similarity functions or both, i.e., Euclidean metric and the Value Difference Metric (VDM). Moreover, the proposed method can yield higher performance, compared to some other classification algorithms. The rest of this paper is structured as follows. Some similarity functions used for IBL are introduced in Section 2. The concept of grey relational analysis is reviewed in Section 3. In Section 4, an instance-based learning approach based on grey relational structure is presented. In Section 5, experiments performed on forty datasets are reported. Finally, Section 6 gives our conclusions.

2 Similarity functions in IBL This section reviews some similarity functions used for IBL. In most IBL systems, the Euclidean similarity function has been adopted. Let x and y be two instances with n attributes, denoted as x = (x(1), x(2), . . . , x(n)) and y = (y(1), y(2), . . . , y(n)). The Euclidean metric is defined as follows.   n  Eu(x, y) =  [x(i) − y(i)]2 . (1) i=1

An alternative similarity function, the Manhattan distance function, is defined as follows. Ma(x, y) =

n 

|x(i) − y(i)|.

(2)

i=1

where x and y are two instances with n attributes, denoted as x = (x(1), x(2), . . . , x(n)) and y = (y(1), y(2), . . . , y(n)). Obviously, Euclidean-like distance functions are normally applicable to domains with numeric attributes. In general, prior to learning, normalization should be done for each numeric attribute, i.e., each distance for numeric attribute i is divided by the maximal difference or by the standard deviation of attribute i. In this manner, the problem of one attribute with a relatively larger range of values than other attributes dominating the distance can be avoided. Accordingly, the distance of each attribute can be bounded between zero and one. To deal with each symbolic attribute i, the distance between two values a and b of attribute i will be set to zero if a and b are the same (i.e., the distance is minimal); otherwise

Springer

the distance will be set to one (i.e., the distance is maximal). Meanwhile, the distance will also be set to one if a or b is unknown (i.e., missing). This biases to a high dependence on symbolic attributes or unusually high biases against missing attributes. A heterogeneous similarity function is thus defined for domains with both numeric and symbolic attributes. This distance function, also known as HOEM [39], was adopted in IB1, IB2 and IB3 [1]. In [34], another well-known distance function, namely the Value Difference Metric (VDM), was proposed to determine the ‘nearness’ between instances. For each symbolic attribute s of instances, the VDM between two values a and b is defined as follows. k C     Nai − Nbi  , (3) vdm(a, b) = N Nb  a i=1 where Na is the number of training instances with value a for attribute s, and Nai is the number of training instances with value a for attribute s and output class i; C is the number of output classes in a specific domain and k is usually set to 1 or 2. In Eq. (3), the two ratios are estimates of the probabilities of attribute values given the class (i.e., of the class conditional probabilities). Generally, the VDM is mainly suitable for domains with symbolic attributes. Some discretization methods were thus incorporated with the VDM for dealing with numeric attributes, i.e., numeric attributes were discretized into symbolic attributes for the learning tasks. In [39], three variants of VDM, namely Heterogeneous Value Difference Metric (HVDM), Interpolated Value Difference Metric (IVDM) and Windowed Value Difference Metric (WVDM) were proposed. The HVDM uses the VDM and the above-mentioned normalized-distance (i.e., each distance for numeric attribute i is divided by 4 standard deviations of attribute i) to handle symbolic and numeric attributes, respectively. As for the IVDM and the WVDM, different discretization methods were incorporated with the original version of VDM. These similarity functions are useful for applications with both numeric and symbolic input attributes. Further details regarding these three similarity functions are mentioned in [39]. Similar to VDM, however, their complexity is increased along with the number of classes in the learning tasks. In addition, the PEBLS learning system [32] introduces a variant of VDM, called the modified VDM (MVDM). MVDM incorporates VDM with a scheme that adjusts the weight of each attribute for learning. A review of various similarity functions can be found in [39]. 3 Grey relational analysis Since its inception in 1984 [8], grey relational analysis [8–10] has been applied to a wide variety of application

Appl Intell (2006) 25:243–251

245

domains. As a measurement method, grey relational analysis is used to determine the relationships among a referential observation and the compared observations based on calculating the grey relational coefficient (GRC) and the grey relational grade (GRG). Consider a set of m + 1 observations {x0 , x1 , x2 , . . . , xm }, where x0 is the referential observation and x1 , x2 , . . . , xm are the compared observations. Each observation xe includes n attributes and is represented as xe = (xe (1), xe (2), . . . , xe (n)). The grey relational coefficient can be calculated as, GRC (x0 ( p), xi ( p)) =

min + ζ max , |x0 ( p) − xi ( p)| + ζ max

(4)

where min = min∀ j min∀k |x0 (k) − x j (k)|,  max = max∀ j max∀k |x0 (k) − x j (k)|, ζ ∈ [0,1] (Usually, ζ = 0.5), i = j = 1, 2, . . . , m, and k = p = 1, 2, . . . , n. Here, GRC (x0 ( p), xi ( p)) is considered as the similarity between x0 ( p) and xi ( p). If GRC (x0 ( p), x1 ( p)) exceeds GRC (x0 ( p), x2 ( p)) then the similarity between x0 ( p) and x1 ( p) is larger than that between x0 ( p) and x2 ( p); otherwise the former is smaller than the latter. Moreover, if x0 and xi have the same value for attribute p, GRC (x0 ( p), xi ( p)) (i.e., the similarity between x0 ( p) and xi ( p)) will be one. By contrast, if x0 and xi are different to a great deal for attribute p, GRC (x0 ( p), xi ( p)) will be close to zero. Similar methods for dealing with symbolic attributes will be detailed in Section 4. Notably, changing the value of ζ does not affect GRO and the performance of the proposed learning approach. Restated, the original version of GRC (as stated in [8]) is considered to determine the similarity between two instances in the proposed learning approach. Accordingly, the grey relational grade between instances x0 and xi is expressed as, GRG (x0 , xi ) =

n 1 GRC(x0 (k), xi (k)), n k=1

(5)

where i = 1, 2, . . . , m. The primary characteristic of grey relational analysis is as follows. If GRG (x0 , x1 ) is larger than GRG (x0 , x2 ) then the difference between x0 and x1 is smaller than that between x0 and x2 ; otherwise the former is larger than the latter. According to the degree of GRG, the grey relational order (GRO) of observation x0 can be stated as follows. GRO(x0 ) = (y01 , y02 , . . . , y0m ),

(6)

where GRG(x0 , y01 ) ≥ GRG(x0 , y02 ) ≥ · · · ≥ GRG(x0 , y0m ), y0r ∈ {x1 , x2 , x3 , . . . , xm }, r = 1, 2, . . . , m, and y0a = y0b if a = b.

The grey relational orders of observations x1 , x2 , x3 , . . . , xm can be similarly obtained as follows. GRO(xq ) = (yq1 , yq2 , . . . , yqm ),

(7)

where q = 1, 2, . . . , m, GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥ · · · ≥ GRG(xq , yqm ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq , r = 1, 2, . . . , m, and yqa = yqb if a = b. In Eq. (7), the GRG between observations xq and yq1 exceeds those between xq and other observations (yq2 , yq3 , . . . , yqm ). That is, the difference between xq and yq1 is the smallest. Grey relational analysis meets four principal axioms, including (a) normality (b) dual symmetry (c) wholeness and (d) approachability [8–10, 29]. (a) normality—GRG(x0 , xi ) takes a value between zero and one (b) dual symmetry—If only two observations (x0 and x1 ) are made in the relational space, then GRG(x0 , x1 ) = GRG(x1 , x0 ) (c) wholeness—If three or more observations are made in the relational space, then GRG(x0 , xi ) often doesnot equal GRG(xi , x0 ), ∀i and (d) approachability—GRG(x0 , xi ) decreases as the difference between x0 ( p) and xi ( p) increases (other values in Eqs. (4) and (5) are held constant). Based on these axioms, grey relational analysis offers some advantages. For example, it gives a normalized measuring function (Normality)—a method for measuring the similarities or differences among observations—to analyze the relational structure. Also, grey relational analysis yields whole relational orders (wholeness) over the entire relational space. As stated in the following section, these properties are useful for instance-based learning.

4 An instance-based learning algorithm based on grey relational structure To build a successful instance-based learning model, the relationships among all instances in a specific application domain should be determined. Here, grey relational analysis is used to describe the relational structure of all instances and then new, unseen instances can be identified. As mentioned above, if a set of m + 1 instances {x0 , x1 , x2 , . . . , xm } is given, the grey relational orders of each instance xq (q = 0, 1, . . . , m) can be expressed as follows. GRO(xq ) = (yq1 , yq2 , . . . , yqm ),

(8)

Springer

246

Appl Intell (2006) 25:243–251

where GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥ · · · ≥ GRG(xq , yqm ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq , r = 1, 2, . . . , m, and yqa = yqb if a = b. Here, a graphical structure, called k-level grey relational < structure (k = m), is defined as follows to describe the relationships among referential instance xq and all other instances, where the total number of ‘nearest’ instances of referential instance xq (q = 0, 1, . . . , m) is restricted to k. GRO∗ (xq , k) = (yq1 , yq2 , . . . , yqk ),

(9)

where GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥ · · · ≥ GRG(xq , yqk ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq , r = 1, 2, . . . , k, and yqa = yqb if a = b. That is, a directed graph, shown in Fig. 1, can be used to express the relational space, where each instance xq (q = 0, 1, . . . , m) as well as its k nearest instances (i.e., yqr , r = 1, 2, . . . , k) are represented by vertices, and each expression GRO∗ (xq , k) is represented by k directed edges (i.e., xq to yq1 , xq to yq2 , . . . , xq to yqk ). Here, the characteristics of the proposed k-level grey relational structure are described in detail. First, for each instance xq , k instances (vertices) are connected by the inward edges from instance xq . That is, these instances are the nearest neighbors (with small difference) of instance xq , implying that, they evidence the class label of instance xq according to the nearest neighbor rule [7, 12]. Also, in the one-level grey relational structure, instance yq1 , with the largest similarity, is the nearest neighbor of instance xq . Thus, a new, unseen instance can be classified according to its nearest instance in the one-level grey relational structure or its nearest instances in the k-level grey relational structure. Obviously, Fig. 1 k-level grey relational structure

x0

y 01 . .

y 02

y 0k x1

y 11 . .

y 12

y 1k

. . .

xm

y m1 . .

y m2

y mk

Springer

for classifying a new, unseen instance i, only k inward edges connected with instance i in the above k-level grey relational structure are needed. In other words, k nearest neighbors of each unseen instance are considered for the learning tasks (i.e., pattern classification). Next, an instance-based learning algorithm for pattern classification based on the k-level grey relational structure is detailed. Assume that we have a training set T of m labeled instances, denoted by T = {x1 , x2 , . . . , xm }, where each instance xe has n attributes and is denoted as xe = (xe (1), xe (2), . . . , xe (n)). For classifying a new, unseen instance x0 , the proposed learning procedure is performed as follows. Step 1. Calculate the grey relational coefficient (GRC) and the grey relational grade (GRG) between x0 and xi , for i = 1, 2, . . . , m. If attribute p of each instance xe is numeric, the value of GRC (x0 ( p), xi ( p)) is calculated by Eq. (4). If attribute p of each instance xe is symbolic, the value of GRC (x0 ( p), xi ( p)) is calculated as GRC (x0 ( p), xi ( p)) = 1,

if x0 ( p) and xi ( p) are the same.

GRC (x0 ( p), xi ( p)) = 0,

if, x0 ( p) and xi ( p) are different.

Accordingly, calculate the grey relational grade (GRG) between x0 and xi , for i = 1, 2, . . . , m by Eq. (5). Step 2. Calculate the grey relational order (GRO) of x0 based on the degree of GRG(x0 , xi ), where i = 1, 2, . . . , m. Step 3. Construct the k-level grey relational structure according to the above grey relational order (GRO) of x0 , where k< = m. Here, only k inward edges connected with instance x0 are needed. Step 4. Classify the new instancex0 by considering the class labels of instances y1 , y2 , . . . , yk with the majority voting method [7], where y1 , y2 , . . . , yk are the vertexes connected by k inward edges from x0 in the k-level grey relational structure (i.e., instances y1 , y2 , . . . , yk are the nearest neighbors of instance x0 ). Notably, the best choice of k used for pattern classification can be determined by cross validation [35]. As stated in Section 3, GRC (x0 ( p), xi ( p)) can be treated as the similarity between x0 ( p) and xi ( p). If x0 and xi have the same value for symbolic attribute p, GRC (x0 ( p), xi ( p)) (i.e., the similarity between x0 ( p) and xi ( p)) will be set to one. By contrast, if x0 and xi are different for symbolic attribute p, G RC (x0 ( p), xi ( p)) will be set to zero. These settings are

Appl Intell (2006) 25:243–251

similar to those used in [1]. As mentioned earlier, the similarity function presented here offers some advantages, including normality and wholeness (i.e., asymmetry). That is, this similarity function is appropriate for measuring the similarities or differences among observations and yields whole relational orders (wholeness) over the entire relational space in which all instances (or patterns) in a specific domain are treated as various vectors. In some application domains, instances may contain missing attribute values (for example, some datasets in the experiments in Section 5 contain missing attribute values). In this paper, to handle domains that contain missing attribute values, a method presented in [20] for missing attribute value prediction is applied prior to learning (That is, domains with missing attribute values in the experiments in Section 5 are handled by using the prediction method first presented in [20]). In this missing attribute value prediction method, the nearest neighbors of an instance with missing attribute values can be determined. Accordingly, the valid attribute values derived from these nearest neighbors are used to predict those missing values. After predicting (estimating) missing attribute values with high accuracy, an imperfect dataset can be handled as a complete dataset in classification tasks. Finally, the proposed learning approach is applied for classification. Notably, any method used for dealing with missing attribute values probably biases the data. Assume that we have a training set T of m labeled instances, denoted by T = {x1 , x2 , . . . , xm }, where each instance xe has n attributes and is denoted as xe = (xe (1), xe (2), . . . , xe (n)). For classifying a new, unseen instance x0 in typical instance-based learning methods (in which the Euclidean distance is used as the similarity func< tion), the Euclidean distance between x0 and xa (1 < = a = m) is calculated without considering other training instances < (i.e., without considering all xi , i = a, and 1 < = i = m). By contrast, in the proposed learning approach, all training instances with n attributes will be considered (calculated) to determine the similarity (i.e., GRC and GRG) between x0 < and xa (1 < = a = m), i.e., the property of the axiom, wholeness [8] of GRA in Section 3. This consideration is the main difference between Euclidean-based similarity functions and GRA. In other words, in the proposed learning approach, a whole relational order (i.e., GRO) by considering all training instances will be derived for classifying a new, unseen instance. In addition, let m denote the number of compared instances and n be the number of attributes. The time for classifying a new, unseen instance (including the time for calculating the grey relational order) is O (mn + m log m). In addition, the time for discovering of the best value for k in the proposed learning approach should also be included as the complexity of the proposed learning approach. The two parts are the overall time complexity of the proposed learn-

247

ing approach. As mentioned earlier, the complexity of using the VDM is increased along with the number of classes in a specific application domain, i.e., O(mnC), where C is the number of classes. This problem does not appear in the proposed learning approach. In addition to deal with classification tasks, the above klevel grey relational structure can be used for instance pruning or partial memory learning [21, 26, 40]. For example, an instance may not be connected by any inward edges from Table 1 Average accuracy (%) of classification for the proposed approach and other methods with HOEM (with k-nn), HVDM (with k-nn) and IVDM (with k-nn) [39], respectively. (k) indicates the best value for k using cross-validation on each classification problem

Dataset

HOEM

HVDM

IVDM

Proposed approach (k)

Allbp Allhyper Allhypo Allrep Australian Autos Breast cancer Breast-w Cpu Crx Dis Echoi Glass Hepatitis Hypothyroid Ionosphere Iris Letter Liver disorders Mushroom Pageblocks Pimadiabetes Satelliteimage Satelliteimagetest Segment Shuttle Shuttletest Sick Sickeuthyroid Sonar Soybean Soybeansmall Sponge Tae Vehicle Voting Vowel Wine Yeast Zoo Average

94.89 97.09 90.31 96.14 81.30 74.90 70.90 95.54 68.04 80.12 98.20 81.09 69.45 79.40 93.58 87.22 94.67 96.25 61.86 100.00 96.17 70.49 90.23 88.35 96.80 99.05 98.88 87.01 68.30 85.88 90.51 100.00 84.29 63.71 70.01 93.57 98.52 94.89 53.37 95.45 85.91

95.05 97.00 90.16 96.31 81.72 79.79 66.73 95.24 68.51 81.06 98.42 80.32 72.63 80.45 93.60 86.40 94.67 96.01 62.89 100.00 96.24 70.25 90.26 88.32 97.07 99.86 98.76 86.86 68.41 87.20 90.98 100.00 84.38 60.99 70.90 95.17 98.67 95.46 53.72 95.34 86.15

95.29 97.20 96.11 98.25 80.52 80.19 66.73 95.74 65.39 80.27 98.24 79.36 70.69 81.47 98.06 91.14 94.67 96.10 62.43 100.00 96.34 68.53 90.14 88.41 97.33 99.85 98.87 96.84 95.08 84.24 92.03 100.00 84.28 60.33 69.53 95.17 98.52 97.47 53.21 96.43 87.26

95.48 (13) 97.35 (11) 92.74 (7) 97.18 (7) 81.87 (13) 76.45 (1) 70.90 (7) 96.78 (5) 70.09 (3) 80.82 (11) 97.77 (3) 80.94 (7) 74.13 (3) 80.84 (7) 98.24 (1) 91.37 (1) 95.33 (7) 95.30 (1) 63.30 (19) 100.00 (1) 96.43 (1) 69.40 (17) 90.19 (5) 88.81 (1) 97.81 (1) 99.94 (1) 98.99 (1) 92.75 (5) 88.60 (7) 86.01 (1) 89.31 (1) 100.00 (1) 85.37 (1) 63.51 (1) 70.86 (5) 93.57 (5) 98.95 (1) 97.30 (13) 53.44 (19) 96.04 (1) 87.35

Springer

248

Appl Intell (2006) 25:243–251 worse test result of X/Y under method S column indicates that the proposed learning approach performs better than method S in X cases

Table 2 The statistic analysis for the proposed approach and other learning methods with HOEM, HVDM and IVDM [39], respectively. A better or

Average accuracy Better or worse test (B/W test) Wilcoxon test

HOEM

HVDM

IVDM

Proposed approach

85.91 29/7 99.50

86.15 27/11 99.20

87.26 26/12 88.70

87.35 − −

Table 3 Average accuracy (%) of classification for the proposed approach and other classification algorithms

Springer

Dataset

Baseline

Decision Stump

Hyper Pipes

VFI

1R

Naive Bayes

Decision Table

C4.5

Proposed approach

Allbp Allhyper Allhypo Allrep Australian Autos Breast cancer Breast-w Cpu Crx Dis Echoi Glass Hepatitis Hypothyroid Ionosphere Iris Letter Liver disorders Mushroom Pageblocks Pimadiabetes Satelliteimage Satelliteimagetest Segment Shuttle Shuttletest Sick Sickeuthyroid Sonar Soybean Soybeansmall Sponge Tae Vehicle Voting Vowel Wine Yeast Zoo Average

95.25 97.25 92.14 96.89 55.51 32.74 70.30 65.52 57.90 55.51 98.39 42.58 35.50 79.38 95.23 64.10 33.33 4.07 57.98 51.80 89.77 65.11 24.17 23.50 14.29 78.41 79.16 93.89 90.74 53.38 13.03 35.50 16.07 34.42 25.77 61.38 9.09 39.93 31.20 40.64 55.02

94.82 97.21 95.75 96.89 85.51 44.93 70.69 91.70 65.10 85.51 98.39 64.63 44.87 81.29 97.38 82.61 66.67 7.09 61.17 88.68 93.13 72.01 44.74 41.70 28.53 86.94 86.77 96.75 94.44 71.60 26.06 57.00 41.25 35.75 39.59 95.62 17.58 59.51 40.70 60.45 67.78

35.04 97.93 98.21 96.89 44.93 62.95 69.94 88.42 61.24 60.43 48.61 70.91 50.97 65.16 95.51 92.60 93.33 22.25 44.62 99.77 91.39 35.03 48.00 58.30 75.50 84.15 87.61 93.86 90.74 59.57 89.90 100.00 80.36 47.04 38.55 38.62 36.67 91.08 35.85 94.09 69.40

44.04 87.14 91.68 93.07 86.81 58.93 67.13 96.00 55.57 84.78 55.21 74.14 57.90 83.88 48.61 94.02 96.67 61.23 59.76 99.88 87.01 64.84 71.52 72.70 77.45 78.27 82.68 61.93 46.38 55.76 82.44 97.50 75.89 52.96 53.55 90.34 60.20 95.46 50.27 94.09 73.69

95.93 97.57 96.68 96.82 85.51 62.90 65.39 91.84 69.86 85.51 98.25 63.71 58.35 80.00 97.91 82.04 93.33 17.25 57.40 98.52 93.55 72.28 59.82 58.05 63.98 94.69 94.65 96.54 94.91 63.93 39.41 83.50 44.82 42.42 52.70 95.62 34.34 76.93 40.30 73.36 74.26

93.89 95.68 95.00 93.96 76.67 58.02 74.42 96.00 68.98 77.68 95.14 86.34 50.39 82.58 97.85 82.36 96.00 64.11 55.96 95.75 90.08 75.78 79.55 79.05 80.30 91.52 92.88 92.57 84.00 65.93 90.55 97.50 84.46 53.63 44.45 90.57 67.78 97.22 58.09 95.18 81.20

96.89 98.61 99.11 99.21 85.07 80.05 74.14 94.42 67.93 85.51 98.75 79.19 69.18 83.87 99.08 89.75 93.33 71.24 56.23 100.00 95.85 74.48 82.57 80.80 91.69 99.75 99.70 97.54 97.28 72.55 83.39 100.00 73.04 50.96 67.96 94.49 67.07 91.08 56.88 89.18 84.70

97.29 98.64 99.43 99.25 85.51 82.52 75.18 95.28 66.05 85.94 99.14 84.25 66.71 83.23 99.24 90.90 95.33 88.23 66.38 100.00 96.00 74.09 86.11 83.25 97.10 99.96 99.90 98.82 97.79 74.07 88.93 98.00 66.07 53.04 73.28 97.00 80.81 94.41 54.39 92.09 86.59

95.48 97.35 92.74 97.18 81.87 76.45 70.90 96.78 70.09 80.82 97.77 80.94 74.13 80.84 98.24 91.37 95.33 95.30 63.30 100.00 96.43 69.40 90.19 88.81 97.81 99.94 98.99 92.75 88.60 86.01 89.31 100.00 85.37 63.51 70.86 93.57 98.95 97.30 53.44 96.04 87.35

Appl Intell (2006) 25:243–251

249

Table 4 The statistic analysis for the proposed approach and other various classification algorithms. A better or worse test result of X/Y under method S column indicates that the proposed learning approach performs better than method S in X cases Baseline DecisionStump HyperPipes VFI Average accuracy 55.02 Better or worse test (B/W test) 37/3 Wilcoxon test 99.50

67.78 31/9 99.50

69.40 33/6 99.50

other instances in the k-level grey relational structure. In other words, this instance is rarely used in determining the class labels of other instances, implying that, it is probably a good choice for instance pruning in a learning system.

5 Experimental results In this section, experiments performed on forty data sets (from [5]) are reported to demonstrate the performance of the proposed learning approach. In the experiments, ten-fold cross validation [35] was used and applied ten times for each application domain. That is, the entire data set of each application domain was equally divided into ten parts in each trial; each part was used once for testing and the remaining parts were used for training. Accordingly, the average accuracy of classification was obtained. Table 1 gives the performances (average classification accuracy) of the proposed approach (i.e., h-nn with GRG), the HOEM (with k-nn), HVDM (with k-nn) and IVDM (with knn) [39]. These distance functions used for comparison are mentioned in Section 2. As shown in Table 2, the statistical analysis, including better or worse test (B/W test; for example, a better or worse test result of 29/7 under the HOEM column means that the proposed learning approach performs better than the HOEM in 29 cases) and Wilcoxon Signed Ranks test [37] (i.e., the proposed approach is compared with others), was done for the above methods with various distance functions. Here, Wilcoxon Signed Ranks test was used to test the null hypothesis that the differences (classification accuracy) between two methods are distributed symmetrically around zero (i.e., to determine if one method is significantly better than another, regarding the classification accuracy). Of these forty application domains, the proposed approach reveals its superiority over the HOEM (with k-nn) and the HVDM (with k-nn). Meanwhile, the classification accuracy of the proposed approach is comparable to that of the IVDM (with k-nn). Furthermore, various classification algorithms were also used for performance comparison, including DecisionStump [41], DecisionTable [27], HyperPipes [41], C4.5 [31], NaiveBayes [25], 1R [18] and VFI [41]. These methods are all available in [41]. Table 3 gives the performances (average accuracy of classification) of the above classification meth-

1R

NaiveBayes DecisionTable C4.5

73.69 74.26 81.20 35/5 30/10 32/8 99.50 99.50 99.50

84.70 21/17 94.05

Proposed approach

86.59 87.35 17/21 − 54.35 −

ods and the proposed approach. Here, “Baseline” means that the majority class is simply chosen for classification. Similarly, Table 4 gives the statistical analysis, including better or worse test (B/W test; for example, a better or worse test result of 32/8 under NaiveBayes column indicates that the proposed learning approach performs better than NaiveBayes in 32 cases) and Wilcoxon Signed Ranks test [37] (i.e., the proposed approach is compared with others), for comparing the above learning methods. As a result, the proposal presented here can yield higher performance, compared to some other classification algorithms.

6 Conclusions In this paper, an instance-based learning approach based on grey relational structure is proposed. Grey relational analysis is used to precisely describe the entire relational structure of all instances in a specific application domain. By using the above-mentioned grey relational structure, new instances can be identified with high accuracy. Experiments performed on forty application domains are reported to demonstrate the performance of the proposed approach. It can be easily seen that the proposed approach yields high performance over other methods that adopt one of Euclidean metric and the Value Difference Metric (VDM) or both. Moreover, the proposal presented here can yield higher performance, compared to some other classification algorithms. For some domains with pure symbolic attributes in the experiments, instancebased learning approaches using the VDM perform better than the proposed learning approach, which is based on the grey relational structure. As pointed earlier, the VDM is mainly applicable to domains with symbolic attributes. For further work, the VDM will be incorporated with the proposed metric to increase the performance of the corresponding instance-based learning approach. Acknowledgments This work was supported in part by the National Digital Archive Program-Research & Development of Technology Division (NDAP-R & DTD), the National Science Council of Taiwan under grant NSC 94-2422-H-001-006, and by the Taiwan Information Security Center (TWISC), the National Science Council under grant NSC 94-3114-P-001-001-Y. In addition, the authors would like to thank the National Science Council of Taiwan for financially supporting this research under grant NSC 94-2213-E-022-006.

Springer

250

Appl Intell (2006) 25:243–251

References 1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66 2. Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man-Mach Stud 36(2):267–287 3. An A (2003) Learning classification rules from data. Comp Math Appl 45:737–748 4. Bay SD (1999) Nearest neighbor classification from multiple feature subsets. Intell Data Anal 3:191–209 5. Blake CL, Merz CJ (1998) UCI Repository of machine learning databases [http://www.ics.uci.edu/∼mlearn/MLRepository.html]. Irvine, CA: Department of Information and Computer Science, University of California 6. Brouwer RK (1997) Automatic growing of a hopfield style network during training for classification. Neur Netw 10:529–537 7. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27 8. Deng J (1984) The theory and method of socioeconomic grey systems. Soc Sci China 6:47–60 (in Chinese) 9. Deng J (1989) Introduction to grey system theory. J Grey Syst 1:1– 24 10. Deng J (1989) Grey information space. J Grey Syt 1:103–117 11. Elouedi Z, Mellouli K, Smets P (2001) Belief decision trees: Theoretical foundations. Int J Appr Reas on 28:91–124 12. Fix E, Hodges JL (1951) Discriminatory analysis: nonparametric discrimination: consistency properties. Technical Report Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, Texas 13. Freund Y, Mason L (1999) The alternating decision tree learning algorithm. In: Proc. of the 16th international conference on machine learning, Bled, Slovenia, pp 124–133 14. Friedman JH (1977) A recursive partitioning decision rule for nonparametric classification. IEEE Trans Comp, pp 404–408 15. Hattori K, Takahashi M (2000) A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recog 33:521–528 16. Hewett R, Leuchner J (2003) Restructuring decision tables for elucidation of knowledge. Data Knowl Engin 46:271–290 17. Hickey RJ, Martin RG (2001) An instance-based approach to pattern association learning with application to the English past tense verb domain. Knowl-Based Syst 14:131–136 18. Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11:63–91 19. Hu YC, Chen RS, Hsu YT, Tzeng GW (2002) Grey self-organizing feature maps. Neurocomputing 48:863–877 20. Huang CC, Lee HM (2001) A grey-based nearest neighbor approach for predicting missing attribute values. In: Proc. of 2001 national computer symposium, Taiwan, pp B153–159 21. Huang CC, Lee HM (2003) A partial-memory learning system based on grey relational structure. In: Berthold MR et al (eds)

Springer

22. 23. 24. 25.

26.

27. 28. 29. 30. 31. 32.

33.

34. 35. 36.

37.

38. 39. 40. 41.

Advanced in intelligent data analysis V, Lecture Notes in Computer Science 2810, Springer-Verlag, pp 68–75 Huang YP, Huang CH (1997) Real-valued genetic algorithms for fuzzy grey prediction system. Fuzzy Sets Syst 87:265–276 Hullermeier E (2003) Possibilistic instance-based learning. Art Intell 148:335–383 Ignizio JP, Soltys JR (1996) Simultaneous design and training of ontogenic neural network classifiers. Comp Oper Res 23:535–546 John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp 338–345 Kibler D, Aha DW (1987) Learning representative exemplars of concepts: An initial case study. In: Proc. of the fourth international workshop on machine learning. Morgan Kaufmann, CA, Irvine, pp 24–30 Kohavi R (1995) The power of decision tables. In: European conference on machine learning, pp 174–189 Langley P, Simon HA (1995) Applications of machine learning and rule induction. Commun ACM 38(11):55–64 Lin CT, Yang SY (1999) Selection of home mortgage loans using grey relational analysis. J Grey Syst 4:359–368 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA Rachlin J, Kasif S, Salzberg S, Aha DW (1994) Towards a better understanding of memory-based and bayesian classifiers. In: Proc. of the eleventh international machine learning conference, NJ, Morgan Kaufmann, New Brunswick, pp 242–250 Salzberg S (1988) Exemplar-based learning: theory and implementation. Technical Report TR-10-88, Center for Research in Computing Technology, Harvard University Stanfill C, Waltz D (1986) Towards memory-based reasoning. Commun ACM 29(12):1213–1228 Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J Royal Stat Soc B 36:111–147 Tsumoto S (2003) Automated extraction of hierarchical decision rules from clinical databases using rough set model. Expert Syst Appl 24:189–197 Watson CJ, Billingsley P, Croft DJ, Huntsberger DV (1993) Statistics for management and economics, 5th edn. Allyn and Bacon, Boston Watson I (1999) Case-based reasoning is a methodology not a technology. Knowl-Based Syst 12:303–308 Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Art Intell Res 6:1–34 Wilson DR, Martinez TR (2000) Reduction techniques for exemplar-based learning algorithms. Mach Learn 38(3):257–268 Witten I, Frank E (2000) Data mining—practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco, CA

Appl Intell (2006) 25:243–251

Chi-Chun Huang is currently Assistant Professor in the Department of Information Management at National Kaohsiung Marine University, Kaohsiung, Taiwan. He received the Ph.D. degree from the Department of Electronic Engineering at National Taiwan University of Science and Technology in 2003. His research includes intelligent Internet systems, grey theory, machine learning, neural networks and pattern recognition.

251

Hahn-Ming Lee is currently Professor in the Department of Computer Science and Information Engineering at National Taiwan University of Science and Technology, Taipei, Taiwan. He received the B.S. degree and Ph.D. degree from the Department of Computer Science and Information Engineering at National Taiwan University in 1984 and 1991, respectively. His research interests include, intelligent Internet systems, fuzzy computing, neural networks and machine learning. He is a member of IEEE, TAAI, CFSA and IICM.

Springer

Related Documents