Manuscript by Nai Ding
Application of chaotic neural network mimicking olfactory system and SVM in classifying reconstituted milk and fresh milk Abstract--This paper presents a new approach for pattern recognition in machine olfaction by combining a novel chaotic neural network -- KIII model and support vector machine (SVM). In this approach, feature vectors are firstly processed by KIII model stimulating information processing process in olfactory bulb and then are classified by SVM. This novel approach is applied to classify reconstituted milk from fresh milk and gain high accuracy for complex data and is more robust when unexpected noise is added comparing with several traditional approaches.
1 Introduction Many pattern recognition approaches have been applied to electronic nose [1] [2]. However, until now, robustness and accuracy is still the main weakness for pattern analysis on electronic nose. Problems such as drift compensation, mixture separation and identification against complex odor backgrounds are still challenges. In contrast with artificial electronic nose’s limitation, the mammalian olfactory system can detect and interpret the information from volatile molecules in the environment with a high degree of sensitivity, selectivity and stability. Therefore, many researchers begin to pay more attention to biologically-inspired odor processing models [3], to conquer difficulties in machine olfactory. Milk classification using electronic nose is an especially difficult task due to the heterogeneous nature of dairy product [4]. Difference in heat treatment, protein and fat concentration can all affect aroma of milk [5]. Thereby, classification of milk always deals with noisy data with complicated structure. This paper investigates the performance of a novel classification approach by cascading bionic KIII model and SVM in classifying reconstituted milk and fresh milk.
2 Description of KIII Model and SVM 2.1 KIII Model The KIII network describes the olfactory neural system, including populations of neurons, local synaptic connection, and long forward and distributed time-delayed feedback loops. With parameters optimized and additive noise introduced, KIII model can simulate the output EEG waveform observed in electrophysiological experiments. In the topology of the KIII network (fig.1), R represents the olfactory receptor, which is sensitive to the odor molecule. M represents mitral cell, the response of which is used as the activity measure. A full description of KIII can be found in [6]. In olfactory system, each odorant activates a subset of the receptor cells and initiates a spatial pattern of action potentials. This pattern initiates another spatial pattern of activity in the outer layer of the olfactory bulb. These spatial patterns have no specific topographic relation to the stimulus input pattern [7]. Discrimination among odors is a problem of spatial pattern recognition.
Manuscript by Nai Ding
Mimicking olfactory system, KIII model receives stimulus at receptor level and transforms this stimulus pattern to amplitude modulation at olfactory bulb level. If the standard deviations of responses of mitral cells are viewed as the output, KIII model with N channels can be regarded as a nonlinear multiple-input multiple-output system mapping input vector in its N-dimensional input space onto its N-dimensional output space. KIII model aims at stimulating the information processing phase in olfactory system but it does not model the decision making function realized by higher level neural systems.
Learning Rule Associative learning in olfactory bulb is by enhancement of mutually excitatory synapses among the mitral level, creating a landscape of chaotic attractors. Each attractor, formed in a self-organized way, represents a class. Habituation is also an essential part of discrimination of sensory stimuli. It takes place at the synapses of the excitatory neurons onto both inhibitory and other excitatory neurons. A modified Hebbian learning rule with habituation is employed to train KIII model [8].
2.2 SVM Basically the SVM [9] [10] is a linear machine that applies a kernel method to map the data into a higher dimensional space where a hyperplane can be used to do the separation. It hinges on two mathematical operations [11]. One is to map the input vector into a higher dimensional feature space in which non-linear separable patterns become highly probable to be linearly separable. The other is to construct an optimal hyperplane in the new feature space. The nonlinear map in operation one is achieved by a kernel method. Each kernel function, satisfying Mercer’s Theorem, corresponds to a space where the function is defined as an inner product. In the new space a hyperplane is constructed in such a way that the margin of separation between different classes is maximized. Different from back-propagation algorithm devised to train a multilayer perceptron with artificially designed structure, SVM automatically determines the required number of hidden units (the number of support vectors (SV)). The decision function of SVM can be expressed by equation (1)[12],
⎛ ⎞ I ( x ) = sign ⎜ α i K ( xi ⋅ x ) + b0 ⎟ (1) ∑ ⎝ SV in training set ⎠
K ( x . x ) represents the kernel function. i
2.3 Combination of KIII and SVM As mentioned in Section 2.1, KIII model can process and learn data but it does not output a class label. Usually, a traditional classifier is needed to be cascaded to accomplish pattern recognition task. Previous works [13]14] have shown that minimum Euclidean distance classifier can usually yield a satisfying result for data preprocessed by KIII. However, minimum Euclidean distance classifier is only optimum for normally distributed classes sharing the same diagonal covariance matrices. When samples do not conform to this assumption, minimum Euclidean distance classifier is not optimum and some more powerful classifier may have better performance. In this paper, SVM is adopted to classify the feature vector
Manuscript by Nai Ding
preprocessed by KIII. KIII model, trained unsupervisedly, stimulates the nonlinear map in olfactory system. SVM also involves mapping data into a space with higher dimension using a kernel method. KIII’s self-organized map may strengthen SVM’s ability to transform data and make it easier to find the intrinsic variances of different classes in SVM’s supervised learning phase. If the mapping relation of KIII model is noted as y = ϕ ( x , S ) , where x denoting the stimuli at R node, y denoting the standard deviation at M node and S denoting the training set for KIII network. Then, this cascaded classifier, noted as KIII-SVM, can be mathematically expressed as equation (2) ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ I ( x) = sign ⎜ ∑ αi K ⎜ ( xi , ST ) ⋅ ( x, ST ) ⎟ + b0 ⎟ (2) ⎝ ⎠ ⎜ SV in ϕ ( ST ,ST ) ⎟ ⎝ ⎠ Some former experiments [15] have shown that KIII network performs well when the training set is small, which inspired us to use training sets of different size for the cascaded KIII and SVM. The simplest method to accomplish this idea is to
ϕ
ϕ
partition the training set ST into two non-overlapping subsets, one being S K the other being ST − S K . S K is used as the training set for KIII and obtain the trained network ϕ ( i, S ) , the whole training set S T is used as training set for SVM after K
being processed by KIII. In other word, ϕ ( S , S ) rather than ϕ ( S , S ) is used to train SVM in this model. This model, noted as KIII-SVM-modified, can be expressed as equation (3) ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ I j ( x ) = sign ⎜ ∑ α i K ⎜ϕ ( xi , ST ) ⋅ϕ ( x, ST ) ⎟ + b0 ⎟ (3) ⎝ ⎠ ⎜ SV in ϕ ( ST , S K ) ⎟ ⎝ ⎠ A further extension of this model is inspired by the fact that one set can be partitioned into two subsets by many different methods. When using different subset pairs, we can train different KIII networks and therefore gain different KIII preprocessed data. Consequently, the structure of cascaded SVM will differ too. This indicates that we can construct different classifiers using one training set by manipulating the selection of training set like the bagging algorithm [0]. When a series of classifiers gotten, we can use a majority voting method [0] to combine them together. As mentioned in [0], ensemble method can partly overcome the statistical, computational and representation problem a single classifier may suffer from, . This bagging like method, noted as KIII-SVM-ensemble, can be expressed as equation (4) T
K
⎧ ⎛ ⎞ ⎛ ⎞ ⎪ I j ( x) = sign ⎜⎜ ∑ α i K ⎜ϕ ( xi , S k ) ⋅ϕ ( x, Sk ) ⎟ + b0 ⎟⎟ ⎪ ⎝ ⎠ ⎝ SV in ST ⎠ (4) ⎨ p ⎛ ⎞ ⎪ I x sign I x = ( ) ( ) ⎜ ⎟ ∑ j ⎪ ⎝ j =0 ⎠ ⎩
T
T
Manuscript by Nai Ding
3 Experimental Results 3.1 Data Acquisition Our experiments employ static head-space analysis. A tin oxide gas sensor array with 8 sensors (TGS880 (2×), TGS813 (2×), TGS822 (2×), TGS800, TGS823) from Figaro Engineering Inc. is mounted into a chamber. After equilibrated air above a milk sample is injected into the chamber, sensors begin to be heated. The step response of the gas sensor array is recorded. Diverse methods to extract features from dynamic response of gas sensor array exist [16], but in our experiment maximum response is adopted as the single feature for each sensor. As mentioned in [17], when the ratio of samples to variables is low many erroneous classifications may be made. Thereby, if too many features are extracted, a large amount of measurements have to be conducted to make classification result convincing, which brings difficulty to experiment. Classification rate when maximum response is adopted as the single parameter is higher than those when rise time, maximum slopes and stable response are adopted in our pretest. Since sensor array’s response to milk is not strong, the impact of humidity and temperature is not neglectable in our experiment. Equation (5) is used for the purpose of drift compensation, with sensor response in fresh air as the baseline. To compensate for differences in concentration between samples, each feature vector is normalized using equation (6).
R=
Robserved − Rbaseline (5) Rbaseline
R=
R
∑ ( R (i ))
(6) 2
i
3.2 Sample description Six brands of commercial UHT fresh milk and six brands of commercial whole milk powder were collected from the market in Hangzhou, China. Reconstituted milk samples were prepared with whole milk powder and water. The amount of water added to milk powder is selected to be the one making the feature vectors of fresh milk and reconstituted milk has the minimum Euclidean distance. 8 consecutive measurements are conducted for each brand of dairy. Therefore, samples of one brand of milk can be supposed to be coherent. However, because the heterogeneous nature of milk samples of different brands of fresh or reconstituted milk may have observable difference [18]. Therefore, both fresh and reconstituted milk can be deemed as containing six subclasses.
3.3 Experiments Four experiments are conducted with different purposes. Experiment I and Experiment II measure classifiers’ ability to classify fresh and reconstituted milk from different perspective. Experiment III is designed to measure classifier’s ability to detect reconstituted milk adulterated into fresh milk. Experiment IV is to evaluate the impact of fresh milk’s concentration on adulteration detection. For the first two experiments, all twelve brands of dairy are considered. The six brands of fresh milk are regarded as one class and the six brands of reconstituted milk are regarded as another. Difference between the two experiments lies in the methods to select training set for classifiers. Experiment I uses ‘non-blind’
Manuscript by Nai Ding
selection and Experiment II uses ‘blind’ selection. ‘Non-blind’ selection performs the selection considering the prior information of subclass. This method selects the same number of samples from each subclass to compose the training set. In contrast, the ‘blind’ selection dismisses the subclass information and randomly selects training samples from fresh milk set and reconstituted milk set. In this method, some subclass may contribute more samples to the training set while others may contribute less or none. In these two experiments, four classifying approaches are employed, namely minimum Euclidean distance classifier, KIII-minimum Euclidean distance classifier, SVM, KIII-SVM. SVM adopts radial-basis function kernel with its parameter optimized. KIII-SVM-modified and KIII-SVM-ensemble are also applied in Experiment II. 5 basis classifiers are combined for KIII-SVM-ensemble approach in our experiments. For the minimum Euclidean distance classifier employed in Experiment I, we calculate a cluster center for each subclass. If the cluster center nearest to an unknown sample is of a brand of fresh milk then the sample is categorized as fresh milk. It will be labeled as reconstituted milk otherwise. This method can be viewed as a multi-class method because it calculates 12 cluster centers. However it dose not output information about which brand the sample belongs to, which means misclassification between different brands within fresh milk or different brands of reconstituted milk is tolerated. In Experiment II, with subclass information being ignored, only two clusters are calculated when applying minimum distance classifier. One is for all brands of fresh milk; the other is for all brands of reconstituted milk. In this method, the classification problem becomes a pure two class problem and samples in each class may have complicated distribution. In Experiment III, one brand of fresh milk and one brand of reconstituted milk randomly selected are mixed together with different ratio. Samples with reconstituted milk’s ratio under one value are deemed as one class and other samples are deemed as another class. In Experiment IV, the two brands of dairy used in Experiment III are used again. With reconstituted milk’s concentration fixed, different amount of water is added into fresh milk. Classifiers trained in Experiment III are employed to classify the diluted fresh milk and reconstituted milk. In Experiment I and Experiment II, 8 sensor’s response amplitudes construct a 8-dimentional feature vector. Nevertheless, in Experiment III and Experiment IV only the first three components of the feature vector are used for classification. This is to keep the ratio of data points to variables bigger than six, as suggested in [19].
3.4 Results Four classifiers’ performances in Experiment I with different training set size are shown in fig.2. Result of Experiment II is shown in Fig.3 and Fig.4. Noting that KIII-SVM outperforms SVM with 30-50 training samples, the size of KIII’s training set for KIII-SVM-modified and KIII-SVM-ensemble are chosen to be 48 or 36. In Fig.4 approaches with KIII trained by 36 samples are noted as KIII-SVM-modified(1) and KIII-SVM-ensemble(1) while approaches with KIII trained by 48 samples are noted as KIII-SVM-modified(2) and
Manuscript by Nai Ding
KIII-SVM-ensemble(2). From the results, approaches adopting SVM outperform those adopting minimum Euclidean distance classifier significantly and KIII-SVM-ensemble achieves the highest accuracy. Experiment III’s results are in Table 1. When 80% or more reconstituted milk is adulterated into fresh milk, classifiers can achieve fairly satisfying classification rate. Classifiers trained by all samples considered in this condition are tested in Experiment IV. Different approaches’ performance with different levels of water added into fresh milk is shown in Fig.5. The normalized Euclidean distances between cluster center of reconstituted milk and cluster centers of milk water mixture with 100%, 90%, 80%, 70%, 60% milk are 0.09 0.18 0.44 1.00 respectively. The Euclidean distance grows longer as the proportion of water grows higher, but SVM’s classification rate dose not fall gradually.
3.5 Accuracy Analysis In Experiment I, with structure of data is clear and each subclass can be approximately assumed to normally distributed, minimum Euclidean distance classifier achieves similar accuracy with SVM. But in experiment II, when data of each pattern contains a more unknown complex structure and is more different from normally distributed sets, minimum Euclidean distance classifiers perform poorly. This result concurs with the discussion in Section 2.3. Minimum Euclidean distance classifier is designed with a Gaussian priori and its performance can not be guaranteed when data is complex. KIII’s pre-procession can only improve accuracy when data is complex. SVM performs well for both data with simple or complex structure. KIII can not enhance SVM’s accuracy directly. However, the involvement of KIII makes it easy to construct ensemble classifiers. The size of training set of KIII does not affect classification result significantly and the differences in training set for KIII result in differently transformed data. Maintaining the size of training set for SVM large enough enables SVM to make as accurate prediction as possible for data transformed in different ways. The KIII-SVM-ensemble is shown to be the most accurate approach for complex data in Experiment II.
3.6 Robustness Analysis Robustness is of great importance for pattern recognition. For reconstituted milk detection, the concentration of milk surely cannot be deemed as a constant. However, in Experiment IV, when concentration’s impact considered, SVM’s performance becomes extremely unstable. This experiment also shows that KIII’s preprocessing make SVM generate more robust and meaningful classification result. This malfunction of SVM does not conflict with the generalizing ability it show in most cases, since generalizing ability is usually measured when the training set can comprehensively represent the statistical attribute of the testing set. However, in experiment IV, the training set only contains information about fresh and reconstituted milk at a specific concentration. Thereby the hyperplane learned by SVM is only optimum to classify samples of fresh and reconstituted milk at that concentration. As the training set does not predict the change in data when unexpected noise added (In this experiment, the noise is the impact of additional water), the hyperplane is not necessarily still the optimum one for classifying noisy data. That is to say finding the optimum hyperplane to separate classes in training
Manuscript by Nai Ding
set is one problem, but finding the optimum hyperplane reflecting the intrinsic differences of different classes is another, when the training set does not carry comprehensive information.
4 Discussion Pattern recognition involves a process to transform a realistic scene to a feature vector that is easy to deal with. It is a process to reduce dimensionality, to strengthen differences between classes, and to make patterns more separable. This process may be accomplished explicitly in signal processing phase or implicitly by the classifier with different orientations. For example, discriminant function analysis (DFA) and the kernel method used in SVM are oriented to map the data into a space where the patterns become more separable. Isometric feature mapping (ISOMAP), locally linear embedding (LLE) and other manifold learning method aim at reducing dimension by learning the underlying global geometry of a data set. Principle component analysis (PCA) and independent component analysis (ICA) are designed to exploit the statistical structure of data. However, information processing in biology system is highly complicated whose mechanism is still unknown but is definitely not optimized according to one specific mathematical object. As mentioned in [20], we need to characterize how representations are transformed such that they are useful. KIII network can stimulate EEG waveform in electrophysiological experiments and perhaps can partly stimulate the signal processing mechanism in mammalian olfactory system. It may transform data to strengthen the useful information in them rather than simply make them more separable. The information processing mechanism of KIII is still unknown too and is worth studying.
5. Conclution This article demonstrates that KIII-SVM-ensemble is the best approach to classify reconstituted milk from fresh milk considering its robustness and accuracy shown in Experiment II and IV. Preprocessing feature vectors with KIII model is a choice when data is complex and may suffer from unexpected noise.
References (To be added)
Manuscript by Nai Ding
SVM
Approaches
KIII-SVM-
KIII-SVM-
ensemble
fresh: reconstituted
1:0
91%
86%
81%
0.8:0.2
87%
86%
78%
0.6:0.4
71%
67%
65%
0.2:0.8
64%
55%
61%
Table 1. Correction rates of different classifiers when classifying fresh milk and fresh milk
Fig.1. Topological diagram for KIII network
randomly chosen training set
equal training samples from each brand of dairy
87
86
86 84
85 84 correction rate %
correction rate %
82
80
20
30
40 50 60 size of training set
70
80
KIII-SVM(ensemble)1 KIII-SVM(ensemble)2 SVM KIII-SVM(modified)1 KIII-SVM(modified)2 KIII-SVM
80
KIII-SVM SVM minimum distance KIII-minimum distance
76
79 78 60
90
65
70 75 size of training set
80
85
Fig.2. Performance measures for different classifying
Fig.4. Performance measures for different approaches
approaches with ‘non-blind’ training set selection
cascading KIII and SVM with ‘non-blind’ training set
mix water and milk together
randomly chosen training set
100
84 82 80
KIII-SVM SVM minimum distance KIII-minimum distance
90 80 70 correction rate %
78 correction rate %
82 81
78
74 10
83
76 74
30
70
SVM KIII-SVM(ensemble) KIII-SVM
20
68
Fig.3.
50 40
72
66 10
60
20
30
40 50 60 size of training set
Performance
measures
70
80
90
for
different
classifying approaches with ‘blind’ training set
10 0.6
0.65
0.7 0.75 0.8 0.85 0.9 milk’s proportion in the milk-water mixture
0.95
1
Fig.5. Performance measures for different classifying approaches with different proportion of water added
The training phase of KIII-SVM-ensemble is illustrated in fig.1. The testing phase of KIII-SVM-ensemble is illustrated in fig.2. The testing and training phases of KIII-SVM are illustrated in fig.3.
Fig.1.
the training phase for KIII-SVM-ensemble
Fig.2.
the testing phase for KIII-SVM-ensemble
Fig.2.
the training and testing phases for KIII-SVM