A Review of ANN-based Short-Term Load Forecasting Models Y. Rui
A.A. El-Keib
Department of Electrical Engineering University of Alabama, Tuscaloosa, AL 35487
Abstract - Artificial Neural Networks (AAN) have recently been receiving considerable attention and a large number of publications concerning ANN-based short-term load forecasting (STLF) have appreared in the literature. An extensive survey of ANN-based load forecasting models is given in this paper. The six most important factors which affect the accuracy and efficiency of the load forecasters are presented and discussed. The paper also includes conclusions reached by the authors as a result of their research in this area. Keywords: artificial neural networks, short-term load forecasting models
Introduction Accurate and robust load forecasting is of great importance for power system operation. It is the basis of economic dispatch, hydro-thermal coordination, unit commitment, transaction evaluation, and system security analysis among other functions. Because of its importance, load forecasting has been extensively researched and a large number of models were proposed during the past several decades, such as Box-Jenkins models, ARIMA models, Kalman filtering models, and the spectral expansion techniques-based models. Generally, the models are based on statistcal methods and work well under normal conditions, however, they show some deficiency in the presence of an abrupt change in environmental or sociological variables which are believed to affect load patterns. Also, the employed techniques for those models use a large number of complex relationships, require a long computational time, and may result in numerical instabilities. Therefore, some new forecasting models were introduced recently. As a result of the development of Artificial Intelligence (AI), Expert System (ES) and Artificial Neural Networks (ANN) have been applied to solve the STLF problems. An ES forecasts the load according to rules extracted from experts' knowledge and operators' experience. This method is promising, however, it is important to note that the expert opinion may not always be consistent, and the reliability of
such opinion may be in question. Over the past two decades, ANNs have been receiving considerable attention and a large number of papers on their application to solve power system problems has appeared in the literature. This paper presents an extensive survey of ANN-based STLF models. Although many factors affect the accuracy and efficiency of the ANN-based load forecaster, the following six factors are believed to be the most important ones. In section 2, various kinds of Back-Propagation (BP) network structures are presented and discussed. The selection of input variables is reviewed in section 3. In section 4, different ways of selecting the training set are presented and evaluated. Because of the drawbacks of the BP algorithm, some efficient modifications are discussed in section 5. In section 6 and 7, the determination of the number of hidden neurons and the parameters of the BP algorithm are respectively presented. Conclusions follow in section 8.
The BP network structures Artificial Neural Networks have parallel and distributed processing structures. They can be thought of as a set of computing arrays consisting of series of repetitive uniform processors placed on a grid. Learning is achieved by changing the interconnection between the processors [1]. To date, there exists many types of ANNs which are characterized by their topology and learning rules. As for the STLF problem, the BP network is the most widely used one. With the ability to approximate any continuous nonlinear function, the BP network has extraordinary mapping (forecasting) abilities. The BP network is a kind of multilayer feed forward network, and the transfer function within the network is usually a nonlinear function such as the Sigmoid function. The typical BP network structure for STLF is a three-layer network, with the nonlinear Sigmoid function as the transfer function [2-8]. An example of this network is shown in Figure 1. In addition to the typical Sigmoid function, a linear transfer function from the input layer directly to the output layer as shown in Figure 2 was proposed in [9] to account
for linear components of the load. The authors of [9] have reported that this approach has improved their forecasting results by more than 1%.
Figure 1 A typical BP network structure
forecasting error over the period of a whole year has improved considerably. It is proven that a 3-layer ANN with suitable dimension is sufficient to approximate any continuous non-linear function. In [13], it is illustrated that the 4-layer structure is easier to be trapped in a local minima while possesing the other features of the 3-layer ANNs. However, attracted by the compact architecture and efficiency of the learning process of the 4-layer ANN, a load forecaster using this structure was recoomended in [1,14] and promising results were reported. Based on the above discussion, the topology of BP network can be of 3-layers or 4-layers, the transfer function can be linear, nonlinear or a combination of both. Also, the network can be either fully connected or non-fully connected. From our experience we have found that the BP network structure is problem dependent, and a structure that is suitable for a given power system is not necassarily suitable for another.
Input variables of BP network
Figure 2 An ANN Structure with linear transfer function Because fully connected BP networks need more training time and are not adaptive enough to temperature changes, a non-fully connected BP model is proposed in [10,11]. The reported results show that although a fully connected ANN is able to capture the load characteristics, a non-fully connected ANN is more adaptive to respond to temperature changes. The results also show that the forecasting accuracy is significantly improved for abrupt temperature changing days. Moreover, [11] presents a new approach of which combines several sub-ANNs together to give better forecasting results. Recently, a recurrent high order neural network (RHONN) is proposed [12]. Due to its dynamic nature, the RHONN forecasting model is able to adapt quickly to changing conditions such as important load variations or changes of the daily load pattern. It is reported in [12] that the
As was pointed out earlier, the BP network is a kind of array which can realize nonlinear mapping from the inputs to the outputs. Therefore, the selection of input variables of a load forecasting network is of great importance. In general, there are two selection methods. One is based on experience [1,3,9,14], and the other is based on statistical analysis such as the ARIMA [11] and correlation analysis [6]. If we denote the load at hour k as l(k), a typical selection of inputs based on operation experience will be l(k-1), l(k-24), t(k-1), etc., where t(k) is the temperature corresponding to the load l(k). Unlike those methods which are based on experience, [6] applies auto-correlation analysis on the historical load data to determine the input variables. Auto-correlation analysis shows that correlation of peaks occurs at the multiples of 24 hour lags. This indicates that the loads at the same hours have very strong correlation with each other. Therefore, they can be chosen as input variables. In [11], the authors apply ARIMA procedures and auto-correlation analysis to determine the necessary load related inputs. After load related inputs are determined, the corresponding temperature related inputs are determined. The authors in [10] discuss the method of using ANN to forecast the load curve under extreme climatic conditions. In addition to using conventional information such as historical loads and temperature as input variables, wind-speed, sky-cover are also chosen. In all, the input variables can be classified into 8 classes: 1. historical loads [1-3,6,7,9-12,15]
2. historical and future temperatures [1-3,6,9-11,15] 3. hour of day index [1,3,4,6,11] 4. day of week index [1,4,6,11] 5. wind-speed [4,10] 6. sky-cover [4,10] 7. rainfall [4] 8. wet or dry day [4]. There are no general rules that can be followed to determine input variables. This largely depends on engineering judgment and experience. Our investigations revealed that for a normal climate area, the first 4 classes of variables are sufficient to give acceptable forecasting results. However, for an extreme weather-conditioned area the later 4 classes are recommended, because of the highly nonlinear relationship between the loads and the weather conditions.
Selection of training set ANNs can only perform what they were trained to do. As for the case of STLF, the selection of the training set is a crucial one. The criteria for selecting the training set is that the characteristics of all the training pairs in the training set must be similar to those of the day to be forecasted. Choosing as many training pairs as possible is not the correct approach for the following reasons: i) Load periodicity. The 7 days of a week have rather different patterns. Therefore, using Sundays' load data to train the network which is to be used to forecast Mondays' loads would yield wrong results. ii) Because loads posses different trends in different periods, recent data is more useful than old data. Therefore, a very large training set which includes old data is less useful to track the most recent trends. As discussed in i), to obtain good forecasting results, day type information must be taken into account. There are two ways to do this. One way is to construct different ANNs for each day type, and feed each ANN with the corresponding day type training sets [6,15]. The other is to use only one ANN but contain the day type information in the input variables [1,7,11]. The two methods have their advantages and disadvantages. The former uses a number of relatively small size networks, while the later has only one network of a relatively large size. In [9], the authors realized that the selection of the training cases significantly affect the forecasting result, and developed a selection method based on the "least distance criteria". Using this approach, the forecasting results have shown significant improvement. It is worth noting that the day type classification is system dependent. For instance, in some systems, Mondays' load may be similar to that of Tuesdays', but in others this will not be true. A typical classification given in [1] categarizes the
historical loads into five classes. These are Monday, Tuesday-Thursday, Friday, Saturday, and Sunday/Public holiday. A different way, used in [2], collects the data with characteristics similar to the day being forecasted, and combines these data with the data from the previous 5 days to form a training set. In addition to the above conventional day type classification methods, some unsupervised ANN models are used to identify the day type patterns. The unsupervised learning concept, also called self-organization can be effectively used to discover similarities among unlabeled patterns. An unsupervised ANN is employed in [5,14] to identify the different day types. In all, because of the great importance of appropriate selection of the training set, several day type classification methods are proposed, which can be categorized into two types. One includes conventional method which uses observation and comparison [1,2,9]. The other, is based on unsupervised ANN concepts and selects the training set automatically [10,14].
Modification of the BP algorithm The BP algorithm is widely used in STLF and has some good features such as, its ability to easily accommodate weather variables, and its implicit expressions relating inputs and outputs. However, it also has some drawbacks. These are its time consuming training process and its convergence to local minima. The authors of [16] report their investigation of the problem and point out that one of the major reasons for these drawbacks is "premature saturation," which is a phenomenon that remain constant at a significantly high value for some period of the time during the learning process. A method to prevent this phenomenon by the appropriate selecting of the initial weights is proposed in [16]. In [17], the authors discuss the effects of the momentum factor to the algorithm. The original BP algorithm does not have a momentum factor and is difficult to converge. The BP algorithm with momentum (BPM) converges much faster than the conventional BP algorithm. In [3,18], it is shown that the use of the BPM in STLF significantly improves the training process. The authors of [8] present extensive studies on the effects of various factors such as the learning step, the momentum factor to BPM. They proposed a new learning algorithm for adaptive training of neural networks. This algorithm converges faster than the BPM, and makes the selection of initial parameter much easier. A new learning algorithm motivated by the principle of "forced dynamic" for the total error function is proposed in [19]. The rate of change of the network weights is chosen
such that the error function to be minimized is forced to "decay" in a certain mode. Another modified approach to the conventional BP algorithm is proposed in [20]. The modification consists of a new total error function. This error function updates the weights in direct proportion to the total error. With this modification, the periods of stagnation are much shorter and the possibility of trapping in a local minima is greatly reduced.
Number of hidden neurons Determination the optimal number of hidden neurons is a crucial issue. If it is too small, the network can not posses sufficient information, and thus yields inaccurate forecasting results. On the other hand, if it is too large, the training process will be very long [1]. The authors in [21] discuss the number of hidden neurons in binary value cases. In order to make the mapping between the output value and input pattern arbitrary for I learning patterns, the necessary and sufficient number of hidden neurons is I-1. The authors of [22] also state that a multilayer perceptron with k-1 hidden neurons can realize arbitrary functions defined on a k-element set. Up to our knowledge, there is no absolute criteria to determine the exact number of hidden neurons that will lead to an optimal solution. Different numbers of hidden neurons are used in [1,10,11,14]. Based on our experience, the appropriate number of hidden neurons is system dependent, mainly determined by the size of the training set and the number of input variables.
Parameters of the BP algorithm Three parameters need to be determined before BP network can be trained and is able to forecast. These are i) Weights: The initial weights should be small random numbers. It is proven that if the initial weights in the same layer are equal, the BP algorithm can not converge [18].
There are no general rules to obtain an optimal learning step. The values used in [1,4,14] are 0.9, 0.25, and 0.05 respectively. iii). Momentum factor Like the learning step, the momentum factor is also system dependent. The values chosen by [1,4,14] are 0.6, 0.9, and 0.9 respectively. In contrast to the learning step whose value can be larger than 1.0, the upper limit of the momentum factor is 1.0 [18]. This upper limit can be obtained from the physical meaning of momentum factor. It is the forgetting factor of the previous weight changes. The algorithm diverges if the value of the momentum factor is greater than 1.0 is used. The authors of [8] compare the efficiency and accuracy of the neural network using different learning steps and momentum factors, and show that with an adaptive algorithm, the parameters can be chosen from a much wider range. In our investigation, we have observed that the initial weights with values between -0.5 and 0.5 yield good results. As for the learning step and the momentum factor, they should not be fixed but gradually decreased with the increase of the iteration index. Using an adaptive algorithm such as the one proposed by [8] would yield a more stable algorithm.
Conclusions A summary of an extensive survey of existing ANN-based STLF models is presented. Six factors which are believed to have a considerable effect on the accuracy, reliability, and robustness of the models are emphasized The surveyed publications and the authors' own experience lead to the conclusion that the ANN structure, input variables, number of hidden neurons, and BP algorithm parameters are mainly system dependent. The development of a more general ANN model to handle the STLF problem is a challenging problem and should be investigated timely.
References ii) Learning step: The effectiveness and convergence of the BP algorithm depend significantly on the value of the learning step. However, the optimum value of the learning step is system dependent. For systems which posses broad minima that yield small gradient values, a large value of the learning step will result in a more rapid convergence. However, for a system with steep and narrow minima, a small value of learning step is more suitable [24].
[1]D. Srinivasan, A neural network short-term load forecaster, Electric Power Research, pp. 227-234, 28 (1994). [2]O. Mohammed, Practical Experiences with an Adaptive Neural Network short-term load forecasting system, IEEE/PES 1994 Winter Meeting, Paper # 94 210-5 PWRS. [3]D.C. Park, Electric load forecasting using an
artificial neural network, IEEE Trans. on Power Systems, Vol. 6, No. 2, pp. 412-449, May 1991. [4]T.S. Dillon, Short-term load forecasting using an adaptive neural network, Electrical Power & Energy Systems, pp. 186-191, 1991. [5]M. Djukanvic, Unsupervised/supervised learning concept for 24-hour load forecasting, IEE Proc.-C, Vol. 140, No. 4, pp. 311-318, July, 1993. [6]K.Y. Lee, Short-Term Load Forecasting Using an Artificial neural Network, IEEE Trans. on Power Systems, Vol. 7, No. 1, pp. 124-131, Feb. 1992. [7]C.N. Lu, Neural Network Based Short Term Load Forecasting, IEEE Trans. on Power Systems, Vol. 8, No. 1, pp. 336-341, Feb. 1993. [8]K.L. Ho, Short Term Load Forecasting Using a Multilayer neural Network with an Adaptive Learning Algorithm, IEEE Trans.on Power Systems, Vol. 7, No. 1, pp. 141-149, Feb. 1992. [9]T.M. Peng, Advancement in the application of neural networks for short-term load forecasting, IEEE/PES 1991 Summer Meeting, Paper # 451-5 PWRS. [10]B.S. Kermanshahi, Load forecasting Under extreme climatic conditions, Proceedings, IEEE Second International Forum on the Applications of Neural Networks to Power Systems, April, 1993, Yokohoma, Japan. [11]S.T. Chen, Weather sensitive short-term load forecasting using nonfully connected artificial neural networks, IEEE/PES 1991 Summer Meeting, Paper # 449-9 PWRS. [12]G.N. Kariniotakis, Load forecasting using dynamic high-order neural networks, pp. 801-805, Proceedings, IEEE Second International Forum on the Applications of Neural Networks to Power Systems, April, 1993, Yokohoma, Japan. [13]J. Villiers, Back-propagation Neural Nets with One and Two Hidden Layers, IEEE Trans. on Neural Networks, Vol. 4, No. 1, pp. 136-146, Jan. 1992. [14]Y.Y. Hsu, Design of artificial neural networks for short-term load forecasting, IEE Proc.C, Vol. 138, No. 5, pp. 407-418, Sept. 1991. [15]A.D. Papalexopoulos, Application of neural network technology to short-term system load forecasting, pp. 796-800, Proceedings, IEEE Second International Forum on the Applicaitons of Neural Networks to Power Systems, April, 1993, Yokohoma, Japan. [16]Y. Lee, An Analysis of Premature Saturation in
Back Propagation Learning, Neural Networks, Vol. 6, pp. 719-728, 1993. [17]V.V. Phansalkar, Analysis of the Back-Propagation Algorithm with Momentum, IEEE Trans. on Neural Networks, Vol. 5, No. 3, May 1994. [18]Y. Rui, P. Jin, The modelling method for ANN-based forecaster, CDC' 94, China, 1994. [19]G.P. Alexander, An Accelerated Learning Algorithm for Multilayer Perceptron Networks, IEEE Trans. on Neural Networks, Vol. 5, No. 3, pp. 493-497, May 1994. [20]A.V. Ooyen, Improving the Convergence of the Back-Propagation Algorithm, Neural Network, Vol. 5, pp. 465-471, 1992. [21]M. Arai, Bounds on the Number of Hidden units in Binary-Valued Three-Layer Neural Networks, Neural Networks, Vol. 6, pp. 855-860, 1993. [22]S.C. Huang, Bounds on the Number of Hidden Neurons in Multilayer Perceptrons, IEEE trans on Neural Networks, Vol. 2, No. 1, pp. 47-55, Jan. 1991. [23]Y.Rui, P. Jin, Power load forecasting using ANN, Journal of Hehai University, 1993. [24]J.M. Zurada, Introduction to Artificial Neural Systems, West Publishing Company, 1992.