A VARIANT OF BACK-PROPAGATION ALGORITHM FOR MULTILAYER FEED-FORWARD NETWORK LEARNING A. Khavare Ajinkya A, B. Bidkar Harshal S DKTES’s Textile &Engineering Institute, (Rajwada) Ichalkaranji, Maharashtra
[email protected] ,
[email protected]
Abstract Artificial neural networks (ANNs) provide a general, practical method for learning real-valued, discrete-valued and vector-valued functions. Algorithms such as Backpropagation use gradient descent to tune network parameters to best fit a training set of input-output pairs. ANN learning is robust to errors in the training data and has been successfully applied to problems such as interpreting visual scenes, speech recognition and learning robot control strategies. In this paper, a variant of Back-propagation algorithm is proposed for feed-forward neural networks learning. The proposed algorithm improve the back-propagation training in terms of quick convergence of the solution depending on the slope of the error graph and increase the speed of convergence of the system. Keywords: Neural Networks, Adaptive navigation.
1. Introduction Machine learning refers to a system capable of the autonomous acquisition and integration of knowledge. This capacity to learn from experience, analytical observation, and other means, results in a system that can continuously self-improve and thereby offer increased efficiency and effectiveness. Over the past 50 years, the study of machine learning has grown from the efforts of a handful of computer engineers exploring whether computers could learn to play games, and a field of statistics that largely ignored computational considerations, to a broad discipline that has produced fundamental statistical-computational theories of learning processes, has designed learning algorithms that are routinely used in commercial systems from speech recognition to computer vision, and has spun off an industry in data mining to discover hidden regularities in the growing volume of online data. Neural network learning methods provide a robust approach to approximating real-valued, discrete-valued, and vector-valued target functions. For certain types of problems, such as learning to interpret complex realworld sensor data, artificial neural networks are among the most effective learning methods currently known. The remainder of the paper is organized as follows: Section (2) focuses on theoretical concepts of Perceptron & Multilayer Feed-forward network. Section (3) emphasizes on Back-propagation algorithm and its variant. Lastly we conclude providing results.
2. Perceptrons & FNN Perceptrons are single layered units of ANN systems as illustrated in Fig: 1.
Fig: 1 A Perceptron A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs, then outputs a 1 if the result is greater than some threshold and -1 otherwise. More precisely, given inputs xl through x,, the output o(x1, . . . , x,) computed by the perceptron is where each wi is a real-valued constant, or weight, that determines the contribution of input xi to the perceptron output. Learning a perceptron involves choosing values for the weights w0,. . . ,wn,. Let us begin by understanding how to learn the weights for a single perceptron. Here the precise learning problem is to determine a weight vector that causes the perceptron to produce the correct +1/-1 output. We can train a perceptron using perceptron rule. One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. But, in some cases, multi-layer perceptron network i.e. feed-forward neural network is needed whenever nonlinear decision surfaces are to be used. Fig: 2 show us a feed forward network.
Fig: 2 Feed-forward Neural Network
A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning
Page 1 of 5
DKTES’s Textile & Engg. Institute, Ichalkaranji Each neuron in one layer has directed connections to the neurons of the subsequent layer. In many applications the units of these networks apply a sigmoid function as an activation function. Inputs are provided to the Input layer. Output is taken at output layer. In this computation, hidden layers are provided to allow other complex computations. Feed-forward neural networks (FNN) have been widely used for various tasks, such as pattern recognition, function approximation, dynamical modelling, data mining and time series forecasting. The training of FNN is mainly undertaken using the back-propagation (BP) based learning.
Weights and bias between hidden and output layer are updated as follows:
and δk is the error between the hidden and output layer and calculated as:
3. Back-Propagation Algorithm Back-propagation, or propagation of error, is a common method of teaching artificial neural networks how to perform a given task. It was first described by Paul Werbos in 1974. In back-propagation, there are two phases in its learning cycle, one to propagate the input pattern through the network and the other to adapt the output, by changing the weights in the network. It is the error signals that are back propagated in the network operation to the hidden layer(s). The portion of the error signal that a hidden-layer neuron receives in this process is an estimate of the contribution of a particular neuron to the output error. Adjusting on this basis the weights of the connections, the squared error, or some other metric, is reduced in each cycle and finally minimized, if possible. Mathematical Analysis : Assume a network with N inputs and M outputs. Let xi be the input to ith neuron in input layer, Bj be the output of the jth neuron before activation, yj be the output after activation, bj be the bias between input and hidden layer, bk be the bias between hidden and output layer, wij be the weight between the input and the hidden layers, and wjk be the weight between the hidden and output layers. Let η be the learning rate and δ the error. Also, let i, j and k be the indexes of the input, hidden and output layers respectively. The response of each unit is computed as:
Weights and bias between input and hidden layer are updated as follows:
4. Variant of Back-propagation Algorithm The Back-propagation algorithm described above has many shortcomings. The time complexity of the algorithm is high and it gets trapped frequently in suboptimal solutions. It is also difficult to get an optimum step size for the learning process, since a large step size would mean faster learning, which may miss an optimal solution altogether, and a small step size would mean a very high time complexity for the learning process. Hence, we discuss a variant of above algorithm with following changes. A) Momentum: A simple change to the training law that sometimes results in much faster training is the addition of a momentum term. With this change, the weight change continues in the direction it was heading. This weight change, in the absence of error, would be a constant multiple of the previous weight change. The momentum term is an attempt to try to keep the weight change process moving & also makes the convergence faster and the training more stable. B) Dynamic control for the learning rate and the momentum: Learning parameters such as learning rate and momentum serve a better purpose if they can be changed dynamically during the course of the training . The learning rate can be high when the system is far from the goal, and can be decreased when the system gets nearer to the goal, so that the optimal solution cannot be missed. C) Gradient Following: Gradient Following has been added to enable quick convergence of the solution. When the system is far away from the solution, the learning rate is further increased by a constant parameter C1 and when the system is close to a solution, the learning rate is decreased by a constant parameter C2. D) Speed Factor: To increase the speed of convergence of the system, a speed factor S has been used. Mathematical Analysis : The above modified with suitable steps as follows.
Where, δj is the error between the input and hidden layer and calculated as:
algorithm
is
A) Momentum: Let the momentum term be α. Then equation (3) and equation (4) would be modified as:
A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning
Page 2 of 5
DKTES’s Textile & Engg. Institute, Ichalkaranji
The term δ j would be calculated as in equation (5). The equation (6) and equation (7) would be modified as: D) Speed Factor: Let S be the speed factor. Equation (9) and equation (10) would be modified as:
The term δ k would be calculated as in equation (8). B) Dynamic Control for learning rate and momentum: If changing the weight decreases the cost function (mean squared error), then the learning rate is given by
Similarly, equation (11) and (12) would be modified as:
C) Gradient Following: Let C1 and C2 be two constants, such that C1 > 1 and 0 < C2 < 1 and ∆max and ∆min be the maximum and minimum change permissible for the weight change. If (∂E/∂w) is the gradient following term then three cases need to be considered:
5. Experimental Study The algorithm proposed in this paper was tested on the training of standard multilayer feed forward networks (FNNs) for 8 bit parity problem. The selection of initial weights is important in feed-forward neural network training. If the initial weights are very small, the backpropagated error is so small that practically no change takes place for some weights and therefore more iteration are necessary to decrease the error. Large values of weights, results in speed up of learning, but they can lead to saturation and to flat regions of the error surface where training is slow. Keeping these in consideration, the experiment was conducted using the same initial weight vectors that have been randomly chosen from a uniform distribution in (-1,1). Fig: 3 shows us the 8-bit
A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning
Page 3 of 5
DKTES’s Textile & Engg. Institute, Ichalkaranji parity
problem
neural
network
(8-8-1).
Fig: 4 Comparison of training time between Backpropagation algorithm and proposed algorithm for different momentum and speed for 8-bit parity problem. Fig: 3 8-bit parity problem neural network The initial learning rate was kept constant for both algorithms. It was chosen carefully so that the Backpropagation training algorithm rapidly converges without oscillating toward a global minimum. Then all the other learning parameters were tuned by trying different values and comparing the number of successes exhibited by five simulation runs that started from the same initial weights. Fig: 4 shows the results of training on 8-8-1 network (eight inputs, one hidden layer with eight nodes and one output node) on the 8-bit parity problem. It can be observed that training is considered successful for the given dataset for speed constant and momentum. It can be seen that training time is drastically reduced in the proposed algorithm. In Back-propagation for increase in number of cycles, the training time increases rapidly but in all the cases for the proposed speed algorithm the training time increases gradually. Also for the change in the momentum and speed term, there was not much difference in the training time.
6. Conclusion The variant in Back-propagation has been proposed for the training of feed forward neural networks. The convergence properties of both algorithm have been studied and the conclusion was reached that new algorithm is globally convergent. The proposed algorithm was tested on available training tasks. These results point to the conclusion that the proposed methods stand as very promising new tools for the efficient training of neural networks in terms of time. It also proves to be much more accurate than the existing Backpropagation algorithm. In addition the error correction rate achieved is much faster and training time is also much faster as shown in the results.
7. Acknowledgements We would like to acknowledge our Head Of Dept. Prof. Mrs. L. S. Admuthe madam who gave her valuable suggestion in preparing for the above topic. We would also like to thank our friends who supported us in preparing for this paper.
8. References [1] J. M. Zurada, “Introduction to artificial neural systems,” M. G. Road, Mumbai: Jaico, (2002). [2] P. Mehra and B. W. Wah, “Artificial neural networks: concepts and theory,” IEEE Comput. Society Press, (1992). [3] E.M. Johansson, F.U. Dowla, and D.M. Goodman, “Backpropagation learning for multi-layer feedforward neural networks using the conjugate gradient method,” Intl. J. Neural Systems, vol. 2: pp. 291-301 (1992). A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning
Page 4 of 5
DKTES’s Textile & Engg. Institute, Ichalkaranji [4] X. Yu, M. O. Efee, and O. Kaynak, “A general backpropagation algorithm for feed-forward neural networks learning,” IEEE Trans. Neural Networks, vol. 13, no. 1, pp. 251–254 (January 2002). [5] Machine Learning - Tom M. Mitchell. McGraw-Hill Science/Engineering/Math; (March 1, 1997) [6] Bodgam M., David Hunter, “Solving parity-N problems with feedforward neural networks.” 1992 [7] J. Hertz, A. Krogh, R. Palmer, Introduction to the Theory of Neural Computation. MA: AddisonWesley, 1991.
A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning
Page 5 of 5