UNSUPERVISED LEARNING STRATEGIES FOR THE DETECTION AND CLASSIFICATION OF TRANSIENT PHENOMENA ON ELECTRIC POWER DISTRIBUTION SYSTEMS David L. Lubkeman
Chris D. Fallon
Adly A. Girgis
.
Department of Electrical and Computer Engineering Clemson University Clemson. South Carolina 29634-0915
ABSTRACT A number of utilities are currently installing high-speed data aoquisition equipment in their distribution substations. This equipment will make it ssible to record the transient waveforms due to events such as c w and high-impedance faults, capacitor switching, and load switching. This paper describes the potential of applying unsupervlsed learning strategies to the classification of the various events observed by a substation recorder. Several strategies are tested using simulation studies and the effectiveness of unsupervised learning is compared to current classification strategies as well as supervised learning. Keywords: Fault Classificorion, Unsupervised Learning, Paver Distribution Fault Analysis.
~
INTRODUCTION The classification of transient disturbances on electric power distribution feeders is a challenging problem. Traditional protective relaying schemes are capable of correctly identifying major disturbances, such a$ low-impedance faults. However, there are presently no commercial on-line detection schemes that can be applied to differentiate between transient disturbances due to events such as high-impedance faults, transformer inrush and load switching. Modem data recording systems are capable of sampling three-phase voltages and currents, at sampling rates up to 6 KHz. This makes it possible to calculate lower-order harmonics and the extent of higher-frequency noise. Although different types of disturbances ap ar to have certain unique signature characteristics, the class&on of these events is not a trivial task. Each feeder has different load and line characteristics, which would mean that any detection scheme would have to be adaptable to a particular system. Also, the voltage and current waveforms in a distribution system are rich in high-frequency noise with a very wide frequency spectrum (white noise) and a number of harmonics. Any successful scheme for event detection and classification would have to be based on some type of adaptive pattem recognition technique. Neural networks have successfully been applied to a number of power engineering-oriented classification problems [11. They have been especially useful when the essential features of the patterns are unknown or difficult to characterize. Initial efforts to train a neural network to differentiate between high-impedance faults and other transient events have alread been described by Ebron, et al. [2]. In this study, a number oztraining cases for a typical 12 kV distribution system were developed through the use of the Electromagnetic Transients Program (EMTP).The standard backpropagation learning algorithm was then applied to train the network. The results indicated that this type of supervised learning could be successfully applied. Unfwtunately, the problem with a classification approach based on supervised learning is that a substantial amount of effort would be required to obtain the training cases. Also, because actual fault waveforms differ from simulation results, it would eventually be necessary to train the network on real network data. Since the types of events needed to train the network would not naturally occur over a short period of time, it would be necessary to stage the events in order to gather the data. A more desirable approach to the classification of transient events would involve unsupervised leaming. A nctwork that learns without supervision is appro riate since there is no requirement for a priori knowledge ofthe relationship between input patterns and the events to be classified. Unfortunately, if an event is not consistently associated with a characteristic pattern of activity, then that event cannot be classified. One could also employ unsupervised learning as a tool to discover the
kinds of events that could potentially be classified with a given feature set. This paper discusses the feasibility of applying unsupervised learning techniques to the classification of transient events on distribution networks. The specific unsupervised learning schemes applied include the self-organizing mapping scheme introduced by Kohonen as well as a model based on adaptive resonance theory which is attributed to Grossberg. Training and testing cases are based on EMTP program simulations of typical transient events on a model power distribution system. The performance of the unsupervised learning schemes is also compared to that obtained by applying supervised learning based on backpropagation.
OVERVIEW OF LEARNING STRATEGIES Unsupervised vs. Supervised Learning What the distribution engineer would like to have as a disturbance classifier is a black box that can be hooked up to a transient recorder with minimal setup. The engineer would not normally have the resources necessary to develop a set of cases . that would be needed to train a neural network-based classifier requiring supervised training, such as a backpropagation network. Instead, it would be desirable that this classifier could learn how to differentiate between the various events on its own, , with minimal interaction on the part of the engineer. Neural networks are trained by applying sets of input pattern vectors and adjusting the network weights according to a predetermined learning strategy. There are two basic types of learning strategies: supervised and unsupervised. Supervised learning requires a set of training pairs, consisting of input vectors with target vectors corresponding to the desiied output. The goal of a supervised training strategy is to minimize the m r between the output produced by each input vector and the desired output for a set of training vectors. A popular neural network based on supervised training is the backpropagation network. The basic problem with strategies employing supervised training is that there are many situations in which the relationships between the classifier inputs and the appropriate output target patterns can not be determined ahead of time. It is for these types of situations that unsupervised learning strategies have been developed that do not require an output target vector. The training set would only consist of input vectors corresponding to the features of the event to be classified. The goal of the training algorithm in this case would be to adjust network weights to produce output vectors that are consistent. In other words, the application of similar input vectors, corresponding to the same class of events would produce the same output pattem. Although a vector from a certain event class would produce a specific output. there would be no way to determine, before tr\aining the network, which output pattern would be produced by a given input vector beloneg to a certain class of events. Although a number of convenhonal algorithms for performing the unsltpervised clustering described above have already been developed, such as the K-Means and ISODATA algorithms [3]. this paper'will only focus on two common neural network approaches for unsupervised learning. ~
Self-organized Mapping One strategy for unsupervised learning is self-organized mapping, based on work done by T. Kohonen [4]. The selforganizing map is a neural network which maps an n-dimensional input space onto a two-dimensional grid. A Kohonen layer employs a competitive or "winner-take-all" strategy. For a given input vector, only one neuron will output a logical one, with all
Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
-
the other outputs set to zero. The Kohonen layer is capable of grouping input vectors into clusters which correspond to a category that the input vector belongs to. This is accomplished be adjusting the Kohonen layer weights so that input vectors with similar features activate the same Kohonen neuron. Eventually, the weights of a neuron will be the average of the class of input vectors that activate it. Kohonen used this type of network for speech recognition in which he created phoneme maps. The structure of this network consists of three layers: an input layer, a Kohonen layer and a competition layer, as shown in Figure 1. The Kohonen layer is basicall a twodimensional array of neurons, where each input neuron is &y connected to those in the Kohonen layer. Neurons in the Kohonen layer are also connected to neighboring neurons. The Kohonen layer is also fully connected to the competition layer, which only contains a single neuron. Also, the neurons of the Kohonen layer are connected to neighboring neurons. This interconnection allows the self-organized mapping strategy to maintain spatial relationships among nearby members of the grid elements. Before training, the Kohonen layer is initialized such that the neuron's weights are set to points on a grid in the unit square, defmed by the fmt two coordinates of the input space, where the coordinates vary between zero and one in each dimension. When an input vector is presented to the network, each Kohonen neuron computes the distance between its weight vector and the input vector. where this distance is given by
The neuron in the competition layer then determines the "winner", which is the Kohonen neuron with the smallest distance. This is referred to as a competitive learning strategy since the neurons compete against each other for the ability to modify their weights. The winning Kohonen neuron's weights are then adjusted to move it closer to the input vector. In this particular implementation of the self-organizing map, the winning neuron's neighbors are also adjusted such that their weight vectors are also moved closer to the input vector. The adjustment of the neighboring neurons is needed to main& the spatial integrity of the grid. The weight adjustments are as follows:
w"k
= {ld
+ ~ ( - xwL15, if k is the winner
Wmw k
= +ld
+ ~ ( -xwkold1, if k is a neighbor of the winner
As the learning progresses, the neuron weight vectors are spread out such that each weight vector represents a region in the input space. That is, each neuron's weight vector becomes the prototype for inputs in that region. The learning rate for the neighbors is also subject to a cooling factor, which determines how quickly the neighborhood effect is reduced to zero. One potential problem with Kohonen networks is that if the input vectors are very similar and the initialization process spreads the initial weight vectors over a wide range, then only a few neurons will get involved in the learning process. To r e c w this- problem, this implementation of the self-organizing ma also mcludes a conscience mechanism developed by Desieno [5f The conscience mechanism is used to monitor each Kohonen neuron's history of success in the competitive learning process. If a neuron wins too often, then the conscience mechanism takes that unit temporarily out of the competition. This will allow neurons in undersampled areas to get involved in the competitive learning process.
A preprocessing strategy is needed to normalize the t r h g set of input vectors before using them as inputs to the network. This is accomplished by dividing each component of an input vector by that vector's length. This has the effect of convertmg each input into a unit vector in ndimeasional space. Hence, each input vector terminates on the surface of a hypersphere.
COMPETITION LAYER
Figure 1 Self-Organized Mapping Network
Adaptive Resonance Theory The human brain has the ability to process new memories as they arrive and still keep from erasing or corrupting existing memories. One of the disadvantages of a backpropagation network is that the addition of a new input vector to the training set may require that the network be completely retrained. This makes the backpropagation network unsuited for incremental . learning in certain environments. A neural network designed for incremental learning is the ART network, based on the application of adaptive resonance theory and developed by ' Carpenter and Grossberg [a]. There are several variations of the ART network. ART1 was developed for binary signals and ART2 was developed for analog signals. The ART network is basically a pattem vector classifier which can also be used for unsupervised learning. That is, it accepts an input vector pattem and classifies it into a category depending on pattems already seen by the network. The desired output does not have to be known ahead of time to train the network. If an in ut vector pattern does not match up to anything stored by the ARf network, then a new category is created. If the input vector pattern is matched with a pattern category, then the weights corresponding to that pattem category are modified to make it more like the input vector. Hence new input pattem vectors will modify weights corresponding to stored pattems if the match is within a certain tolerance referred to as the vigilance factor. An ART2, two-layer network, is illustrated in Figure 2 below. The classification strategy consists of three stages: recognition, comparison and search. New input pattem vectors are learned and classified b modifying the bottom-up weights from the F1 neurons to the 22 neurons. Neurons in the E2 layer then compete for the ability t~ match up with the input pattern. where each neuron in the F2 layer corresponds to a pattern category. The topdown weights from the F2 layer then provide an expectation to the F1 layer, of what a typical pattern should look l i e for a given category. A vigilance factor specified by the user then determines what degree of recognition is required for a match to be declared. If the F2 match is close enough to the input vector pattern, then resonance is said to occur. Network weights are then adjusted to make the stored pattern look more like the input. In this manner, the weights for a given pattem reflect the average of the pattems in a given category. However, if a mismatch occurs, then the F2 neuron is inhibtted and process is regeated until a match occurs. If the network is unable to match the 1 pattern with any existing category, then the network creates a new pattem category, using the input vector as a prototype for a new category. 108
Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
The network user needs to specify the size of the F1 layer, which corresponds to the size of the input vector, the number of neurons to be used in the F2 layer, a vigilance arameter. and successful some other miscellaneous learning parameters. application of this type of network is highly dependent on the selection of the vigilance factor. If the vigilance is set too high, then input vector pattems will fail to match up to those stored in memory, resulting in a large number of pattem categories being created. In this case. the network fails to generalize correctly, since only a slight variation of a pattem will create a new class. However, if the vigilance is set too low, then different categories will &come indistinguishable and get grouped together. This usually necessitates the need for some type of supervision to adjust the vigilance factor.
he
STM RESET
SHORT-TERM MEMORY FZ
Harmonic components Deviations in frequency Difference between pre- and postevent values Rate of change of above quantities. The detection and classification of low-impedance faults is fairly straightforward since this only involves discriminating between the high current magnitudes associated with faults and the normal load currents. However, the detection of events associated with switching or high-impedance faults requires a more detailed analysis of transients. Each disturbance due to capacitor switching, faults, etc. is accompanied by transients in the current and voltage waveforms. Certain aspects of transients are unique to the type of disturbance, uency and rate of while others are common to all of them. The disturbance and decay of these transients depend on the the location of the event causing the disturbance. For example, capacitor switching creates both voltage and current transients. The voltage transients are based on the natural frequency of the system, which normally varies between 250 and lo00 Hz. These transients may decay within half a cycle. The capacitor switching can also magnify the harmonic distortion. High-impedance faults also produce transients. However these transients may decay faster due to the high attenuation produced by the fault impedance. It is interesting to note that this type of fault is not easily detected by conventional relaying schemes since this fault's characteristlcs are similar to other transient events. Switching a parallel transformer to satisfy loading conditions may result in transients or inrush current. In some instances, this inrush may be incorrectly interpreted as a fault condition. High-impedance faults can exhibit arcing of a highly random nature, resulting in fault currents with noticeable high-frequency components. Yet this same behavior can result from such normal operations as capacitor switching and. transformer tap changing, so frequency monitors are also unreliable. These attributes of high-impedance faults make them very difficult to detect and the identification of salient features is an ongoing area of research [7]. The detection of certain types of events based on the monitoring of waveforms at the substation is not a trivial task. The waveforms of the voltage or the current in a distribution system are rich in high-frequency noise with a very wide frequency spectrum and a number of harmonics. It is reasonable to expect that the level of these harmonics and high-frequency transients will increase in the future. Also, each feeder has different load and line characteristics, which would mean that any detection scheme would have to be adaptable to a particular system.
-3
+ Ip(my n :a,L, w I TD LTM
SHORT-TERM MEMORY F1
INPUT PAllERN
Figure 2 ART2 Network
DESIRED FEATURES OF A NEURAL NETWORKBASED EVENT CLASSIFICATION STRATEGY The success of a strategy for classifying transient events will be highly dependent on the features presented to the neural network. Neural networks do not normally operate on raw data. Some form of preprocessing involving filtering, computing the discrete Fourier transform components or scaling is usually essential. A neural network approach to the classification of power system disturbances, as illustrated in Figure 3. would consist of three basic tasks: collecting a set of sampled feeder line currents and voltages corresponding to abnormal and normal conditions, using this set to train a neural network, and Jesting the network on a separate set of processed line currents and voltages. The preprocessor is an integral part of this strategy since it conditions the raw data into a form suitable for input into the neural net, as illustrated in Figure 4. Such a detection strategy could be based on parameters such as changes in seQuence components, variations in the non-60 Hz components in the current waveform, and abnormal high-frequency noise. These parameters would be calculated by a preprocessor for a number of windows, where each window represents a certain time period. This allows the network to make use of changes in parameters over time. The parameters which existed for the n-window range would then form one input vector for a neural network. The associated output vector would then be used to indicate the type of transient event. Feature selection is more of an art than a science. The goal of feature selection is to eliminate as much unnecessary information as possible while still retaining the salient information in a cbmpact form. Useful features are those which vary widely from class to class, are easy to measure and calculate and which are not correlated with other features. This process is difficult to automate and must be based on an intuitive understanding of the classification problem. For identifying disturbances on distribution systems, the following items have typically been looked at as possible features: Magnitude and phase of currents Magnitude and phase of voltages
.
AND WGEMENT
SUUAlWN
OF FAULT DATA UBRARY
.
S -
RRYDUUV WRLO S M V DATA
I
Lorn-
"utnwIIJ#
Figure 3 Creation and Management of Fault Data
109
Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
I
'
'
'
'6PERmACm'r
'
I
I
I
I I
RAWDATAANALYSIS
I
I
I
P H A S E # CURRENTS
;
I I I
I
NEUTRAL CURRENT
I
I I
; I
I I
SAMRE SETS
I
I I
I I I I I I
I
DATA PREPROCESSOR
-I
I I I I I I I
I I
substantial increases in current on two phases. What distinguishes an ungrounded-line-to-line fault from a groundedline-to-line fault is the fact that the latter results in an increase in zero sequence c m m t [9]. The fmt set of studies involved a 12 element input pattern vector consisting of three-phase pre- and post-fault current magnitudes and angles, as measured at the s&station. To test whether the inputs were sufficient for characterizing the fault typs, a backpropagation network consisting of 1 hidden la er with 12 neurons was presented with the training data. h e network was able to correctly class 98% of the cases after about 100 iterations. A self-organamapping network with a seven by seven Kohonen layer was then presented with the training set. The mapping results are shown in Figure 5a The grounded-line-to-line faults correspond to the triangles. the ungrounded-bto-line faults am represented by the squares while the single-line-to-ground faults are represented by the circles. As shown in the map, the single-line-to-ground faults are well grouped, while there is good, but not ideal differentiation between the two sets of lineto-he faults. Next, an ART2 network with 12 nemns in the F1 layer and 4 neurons in the F2 layer with a vigilance factor of 0.95 was tested. The network accurately classified the single-line-to-ground faults, but could only obtain a success rate of about 8096 when hying to differentiate between the two types of line-to-line faults.
I
I I
I
rI Fault
NEURAL NETWORK
J
HIF
J
Capadtor
Fault Switching
Xfmr
Inrush
8.0 7.0 6.0 5.0 4.0
J Load
3.0
Switching
2.0 1.0
EVENT CLASSIFICATION Figure 4
0.0
Neural Network-Based Classification Scheme ~
SIMULATION RESULTS In order to evaluate the suitability of unsupervised strategies such as self-organized mapping and adaptive resonance theory, as opposed to supervised strategies such as backpropagation, it was necessary to derive suitable test and training sets. A simulation of a typical radial distribution feeder with multiphase laterals and loads was used to create case studies. For the purpose of the transient simulations, the system was modelled as mutually coupled resistive and inductive transmission lines with lumped loads to best simulate actual distribution feeders. Transient data was generated using EMTP by creating faults at different locations within the network. The fault type, loading conditions, fault resistance and the point on the voltage waveform when the fault occurred was varied throughout the simulations. The 60 Hz phasor quantities of the fault induced transient data were estimated by means of an optimal estimation algorithm [8]. The data produced was the pre-fault voltage and current phasors and the post-fault voltage and current phasors taken after the Kalman fdter algorithm converged, typically one-half to threequarters of a cycle after the detection of the transient. A trainiig set with 75 events-was constructed as well as a test set with 75 events. In this initial study, only three types of events were considered: single-line-to-ground faults, ungrounded-line-to-line faults, and grounded-line-to-line faults. It was decided to fully explore this sim ler set of events before moving on to more complex types o f events involving additional frequency domain information. Hence the goal of the classificakion process was to select which of the three fault events occurred. Single line-toground faults can be characterized by a large increase in current on a single phase only, while line-to-he faults are characterized by
e
0
e o - 0 0 e e A
0
6
0
U
h A
0
n
0
A A A 0 6 1 c b t l @ A l 0
0
1.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
Figure 5a Results of Self-Organized Map for First Set of Cases
A second set of studies with a r e d d pattem vedor size was then created. The new pattem consisted of the difference between the pre- and post-fault current magnitudes for each phase, as well as the difference between pre- and post-fault zero sequence current, to make a total of 4 inputs. A backpropagation network consisting of 1 hidden layer with 4 neurons was able to achieve the same 98% classification success as cited before. The same data set was also applied to the same self-organized mapping network as descrhd above, with results as shown in Figure 5b. Again there is a noticeable differentiation made in the three types of events. An ART2 network with 4 neurons in the F1 la er and 5 neurons in the F2 layer with a vigilance factor of 0.9rwas then tested. As before, the network was able to differentiate between single-line-to-ground and line-to-line faults. However, the network was not able to completely differentiate between grounded and ungrounded line-to-line faults. A number of different variations were made to the network without much success. The two type of faults differ in that one has a large WO sequence component while the other doesn't. Ap problem was that there was too great a differeact in within each class of line-to-line faults. When the vigilance factor was increased, instead of differentiating between the two types of faults. the network would onl take each type of line-to-line fault and divide its cases into smalrer subclasses.
rt:yG
110
Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
8.0 1.0
-
6.0
e
5.0
e
3.0 2.0
0.0
'
h
[6]
0
0 0
Carpenter,. Gail A., and Grossberg. Stephen, "ART 2: SelfOrganization of Stable Category Recognition Codes for Analog Input Patterns", Applied Optics, Vol. 26. No. 23, 1987, pp. 4919-30.
[7]
0
Adly A. Girgis. Wenbin Chang. Elham B. Makram. "Analysis of High-Impedance Fault Generated Signals using a Kalman Filtering Approach, IEEE Transactions on Power Delivery, Vol. 5. No. 4, November 1990, pp. 1714-1724.
[8]
A.A. Girgis, R.G. Brown, "Application of Kalman Filtering in Computer Relaying", IEEE Transactions on Power Apparatus and Systems, Vol. PAS-100, No. 7, July 1981, pp. 3387-3397.
0
0
A 4 A
A A A
A A
B
n
o
0
A 0 0 A A t b t b l
0
8
1.0
Duane Desieno, "Adding a Conscience to Competitive Learning", Proceedings of the IEEE International Conference on Neural Networks, Volume 1, July 1988, IEEE Press, pp. 117-124.
o
-
"4.0
[5]
0 O
h
l a 1
-1.0 1 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 Figure 5b Results of Self-organized Map for Second Set of Cases
[9] A.G. Phadke, M. Ibrahim, T. Hlibka, "Fundamental Basis
for Distance Relaying with Symmetrical Components".
IEEE Transactions on P e e r Ap aratus and S stems, Vol. PAS-96, no. 2, March/Apnl 197,!
CONCLUSIONS It is apparent that developing an event classification scheme based totally on unsupervised learning will be a difficult task. Obvieusly some type of initial supervised training would be ~ q u i r e dbefore such a device is used in the field. This could be rcomplished by using a simulation of the network to which this rype of device is to be attached. Although this would take quite a bit of extra work, an event classifier would then only require aruprvised learning to compensate for the difference between IL sudation model and the waveforms which actually occur on dr feeder. The practical application of neural networks will also rvolve the integration of small, special purpose networks with mnventional fault detection algorithms. It will be difficult to ccmstruct a large neural network with a large number of inputs that will be able to classify a multitude of different events. There is no rcd to have a network learn that the zero sequence current, which I the phasor sum of the three-phase currents, corresponds to a fault involving ground. One could incorporate t h s type of howledge into a procedural algorithm. However, a specialpopose neural network component could be embedded to help aerentiate between two events which are very similar, given &at only a limited comparison between the two is required. Future work is also required on how to best incorporate ddltional salient features into an input pattern vector. There is a rcd to mix in additional frequency domain information related to m-60 Hz phenomena. This will be necessary to classify other of events, such as capacitor switching and high-impedance faalts. However problems with dimensionality will be crpuntered if one were to attempt to add a complete set of threevalues for each measurable harmonic.
pp. 635-64dl
-
i
REFERENCES Proceedings of the Workshop on Applications of Artificial Neural Network Methodolo y in Power Systems Engineering. A p d 8-10, 19%. Elemson university. Sonja Ebron, David Lubkeman and Mark White, "Neural Net Processing A roach to the Detection of High Impedance Faults", IEE&ransactions on Paver Delivery, Vol. 5, No. 2, April 1990. pp. 905-914. Yoh-Han Pao, Adaptive Pattern Recognition and Neural Networks. Addison-Wesley. Reading, Massachusetts, 1989.
T. Kohonen, Self-organization and Associative Memory. Springer-Verlag, Berlin, 1984. 111
Authorized licensed use limited to: University of Houston Clear Lake. Downloaded on February 13, 2009 at 00:42 from IEEE Xplore. Restrictions apply.
.