Strategies For Nn Image Classification.pdf

  • Uploaded by: Dr. Khan Muhammad
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Strategies For Nn Image Classification.pdf as PDF for free.

More details

  • Words: 7,404
  • Pages: 17
This article was downloaded by: [NWFP University of Engineering & Technology - Peshawar] On: 20 June 2014, At: 00:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Remote Sensing Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tres20

Strategies and best practice for neural network image classification I. Kanellopoulos & G. G. Wilkinson Published online: 25 Nov 2010.

To cite this article: I. Kanellopoulos & G. G. Wilkinson (1997) Strategies and best practice for neural network image classification, International Journal of Remote Sensing, 18:4, 711-725, DOI: 10.1080/014311697218719 To link to this article: http://dx.doi.org/10.1080/014311697218719

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

int. j. remote sensing, 1997 , vol. 18 , no. 4 , 711 ± 725

Strategies and best practice for neural network image classi® cation

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

I. KANELLOPOULOS and G. G. WILKINSON Space Applications Institute, Joint Research Centre, European Commission, 21020 Ispra, Varese, Italy ( Received 18 January 1996; in ® nal form 20 June 1996 ) This paper examines a number of experimental investigations of neural networks used for the classi® cation of remotely sensed satellite imagery at the Joint Research Centre over a period of ® ve years, and attempts to draw some conclusions about `best practice’ techniques to optimize network training and overall classi® cation performance. The paper examines best practice in such areas as: network architecture selection; use of optimization algorithms; scaling of input data; avoidance of chaos e€ ects; use of enhanced feature sets; and use of hybrid classi® er methods. It concludes that a vast body of accumulated experience is now available, and that neural networks can be used reliably and with much con® dence for routine operational requirements in remote sensing. Abstract.

1.

Introduction

Arti® cial neural networks ® rst began to be used for the classi® cation of remotely sensed imagery around 1988, with the ® rst journal papers appearing one to two years later (for example, Key et al . 1989, Benediktsson et al . 1990, Lee et al . 1990 ). Since that time, the number of reports of experimental tests of neural network classi® ers in peer-reviewed journals has grown signi® cantly, subjectively appearing to be at an exponential rate. Moreover, it is now rare for conferences devoted to remote sensing not to contain special sessions devoted to neural networks. Whilst the rapidly growing interest in the use of neural networks in remote sensing indicates a widespread and healthy interest in the exploration of new techniques, it is also evident that progress is hampered by lack of information on proven methodologies and implementation techniques. At the Joint Research Centre, Ispra, Italy, we have been actively investigating the use of neural networks for remotely sensed image classi® cation for over ® ve years. During that time we have explored many di€ erent types of networks, used many di€ erent data sets, and investigated a number of hybrid architectures and systems. Some of the results of this work have been reported in earlier journal papers, at conferences, and in our own technical reports. However, we have not attempted so far to extract the key ® ndings from this considerable body of experimental work and to present it in one article with the aim of making the experience easily accessible to future researchers. This paper attempts to do precisely this, and, can be considered to be the culmination of a number of experiments leading towards the goal of high accuracy image classi® cation. The material presented herein should not be regarded as a comprehensive review of neural networks in remote sensing throughout the world: such a review has already been performed elsewhere ( Paola and Schowengerdt 1995 ). Our aim here is to stress some of the more interesting ® ndings of our own research, and to make some recommendations for strategies and `best practice’ in using neural networks in remote sensing from our point of view. 0143 ± 1161/97 $12.0 0

Ñ

1997 Taylo r & Francis Ltd

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

712

I. Kanellopoulo s and G. G. W ilkinson

The development of `best practice’ in the software ® eld now has a considerable importance. Slowly there has emerged a growing recognition that in many ® elds of technology it is important as early as possible to develop standardized procedures which are known to work well. The use of arti® cial neural networks to classify satellite imagery is no exception to this, and we hope that this paper will make a ® rst contribution to the development of best practice in this area. We very much hope that by so doing, others will have the bene® t of starting from a higher level of experience which will help them to make faster developments in the years to come. We shall tackle various issues of best practice within separate sections below which re¯ ect the main areas in which recommendations can be made. 2.

Network architecture and training issues

The most commonly-used neural network model for image classi® cation in remote sensing is the multi-layer perceptron trained by the back-propagation algorithm ( Rumelhart et al . 1986 ). The input to a node in such a network is the weighted sum of the outputs from the layer below, that is, ( 1)

net j = ž wjio i i

This `weighted sum’ is then transformed by the node `activation function’ (usually a sigmoid or hyperbolic tangent) to produce the node output: oj =

1 1+exp ( Õ

net j + h j )

o j = m tanh(k (net j ))

[sigmoid ]

[hyperbolic tangent]

( 2) ( 3)

where h j , m , and k are constants. Weights are updated during training with the generalized delta rule: D wji ( n +1) = g ( d jo i ) + aD wji ( n )

( 4)

where D wji (n +1) is the change of a weight connecting nodes i and j , in two successive layers, at the (n +1) th iteration, d j is the rate of change of error with respect to the output from node j , g is the learning rate, and a a momentum term. Further details about these networks can be found in Atkinson and Tatnall ( 1997 ). Although many users of neural networks take the training parameters and activation functions as `givens’, it is important to realise that the values and form of these parameters and functions respectively have important consequences both for the way in which input data should be pre-processed and for the stability and e ciency of the network training. 2.1. Input feature preprocessing/scaling The node activation functions used in multi-layer perceptrons, as above, are essentially non-linear and have asymptotic behaviour. In practice they cause individual nodes in the network to behave like non-linear signal ampli® ers. Ideally, to ensure that a network learns e ciently how to classify, it is important that input values are scaled so that the learning process (that is, iterative weight adjustment) stays within the numerical range in which a percentage change in the weighted sum input value net j is re¯ ected in a similar percentage change in the node output value o j . This should happen unless the inputs are a long way outside their normal range, in which case the output signal should saturate. This requirement for e cient learning

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

Neural networks in remote sensing

713

means that the network input valuesÐ that is the feature values from the satellite imagery (typically the digital radiances in each spectral channel )Ð should be centred and scaled to the activation function’s range order of magnitude (instead, for example, of being within the range 0± 255 of most multispectral satellite images). This ensures that the values propagated to the network nodes do not cause early saturation e€ ects. Failure to perform such a normalization causes learning to `stall’ at an error level which is too high (® gure 1). More details on such an approach can be found in Fogelman Soulie ( 1991). 2.2. Chaos e€ ects A further consequence of the non-linearity of neural network activation functions is that they are susceptible to falling into `chaotic’ regimes ( Van der Maas et al . 1990 ). Although such behaviour is not fully understood, it can happen with networks used to classify satellite imagery. Chaotic systems can be recognized by the fact that small changes in inputs lead to very large changes in output. ( This is the so-called `butter¯ y e€ ect’; that a small butter¯ y ¯ apping its wings could cause a modi® cation in local atmospheric behaviour which could eventually cause a tornado.) In computer models chaos is seen, for example, when small changes such as rounding errors in calculations generate signi® cantly di€ erent results. In some of our early experiments on the classi® cation of both Landsat TM and SPOT HRV multispectral imagery we found chaotic behaviour during network training. This manifested itself as signi® cant di€ erences in training sequences run on di€ erent computers which had di€ erent ways of dealing with rounding of ¯ oating point numbersÐ even when the network architecture, starting weights, and input data were identical (® gure 2). Although we have found that chaos is not encountered frequently, it is important to be aware of its potential occurrence. If chaotic behaviour is recognized, for example, because network error remains at a high level during training and also large di€ erences are seen on di€ erent kinds of computer or between training runs using

Figure 1.

Stalling of network learning when inputs are not normalized. Ð - -- 16 class problem.

8 class problem,

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

714

I. Kanellopoulo s and G. G. W ilkinson

Figure 2. Manifestation of chaos in a training sequence with pixels from Landsat imagery. The network error varies considerably due to di€ erences in ¯ oating point rounding errors between the di€ erent computers and for di€ erences between single and double precision arithmetic.

single and double precision arithmetic, it is necessary to shift the training process into a non-chaotic regime. This is most easily done by changing the learning rate and momentum parameters which appear in the delta rule. Although general guidelines cannot be given, a change of an order of magnitude appears to be a good starting point. 2.3. Optimization technique s Learning in neural networks involves adjusting the connection weights so that the di€ erence between the network output and the desired output is decreased. This in turn involves minimization of a cost function f (x ) in a multi-dimensional network error space. This is an optimization problem. If the cost function f (x ) is a non-linear function of x , the problem is one of non-linear optimization. Usually, the mean squared error cost function is used which is expressed in terms of the network’s output vector and the desired output vector for all input patterns. The network’s

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

Neural networks in remote sensing

715

output vector is dependent on the weights and so the minimization takes place over the entire weight space (that is, x would represent the set of all weights). Algorithms that perform non-linear optimization include the gradient descent procedure, conjugate gradient methods and second-order methods such as the quasi-Newton method ( Watrous 1987, Press et al . 1988). The gradient descent method is an iterative optimization procedure which minimizes a function f (x ) in the direction of the local downhill gradientÐ V f (x ). Conjugate gradient methods compute new directions of search at each step in such a way that the new direction is conjugate to the previous gradient. These are more e cient than the gradient descent algorithm. Quasi-Newton methods make use of the second derivative of the cost function which gives information about the curvature of the error surface which may result in a more rapid convergence. The basic back-propagation algorithm performs a gradient descent with a ® xed step size. The step size ( that is, learning rate g ) though, may be changed while the training process progresses. The main problem with the gradient descent is its slow convergence, since as it gets close to the solution it progresses slowly. To assess the performance of the di€ erent optimization techniques we conducted experiments using multitemporal SPOT HRV imagery with large land cover variability. Figures 3 and 4 show the evolution of the network error with the number of iterations for the three optimization methods for 20 and eight land cover classes respectively. From ® gure 3 it can be seen that in classifying the image into 20 land cover classes, the conjugate gradient ( Polak± Ribiere algorithm) and the quasiNewton methods both fail to converge. On the other hand all three techniques converged when the imagery was classi® ed into eight land cover classes ( ® gure 4). The failure of these optimization methods to converge may be attributed to the complexity of the 20 land cover classes and to the fact that both methods are quite `greedy’, that is, they go `downhill’ as fast as they can and may fall into local minima. One more point is that for both these methods the weights of the network are updated after the presentation of all the patterns (the `o€ -line’ back-propagation approach). That is, they ® rst accumulate the gradient information from all the patterns and then the weights are updated. On the other hand, gradient descent updates the weights after

Figure 3. Training sequence for 20 land cover class problem using three di€ erent optimization techniques. Note that the conjugate gradient and quasi-Newton methods fail to converge.

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

716

I. Kanellopoulo s and G. G. W ilkinson

Figure 4.

Training sequence for eight land cover class problem using three di€ erent optimization techniques.

the presentation of each pattern to the network (the `on-line’ back propagation approach). Therefore, modi® cations to the weights are more frequent and also the network is able to escape unfavourable local minima (Fogelman Soulie 1991). For the eight-class problem, the conjugate gradient method required 300 iterations to reach a network error of 0´32 (94´9 per cent overall classi® cation accuracy on the veri® cation data set). The gradient descent algorithm needed 760 iterations to obtain a classi® cation accuracy of 97 per cent on the same data set with a network error of 0´1. Finally, the quasi-Newton method after 360 iterations converged to a network error of 0´1 with an accuracy of 97´4 per cent. From these results we can see clearly that the conjugate gradient method converges much faster than the other methods. Although it appears that the quasi-Newton method is faster than the gradient descent method ( less iterations), in practice it is slower since each iteration is computationally more intensive. 2.4. Network architectures The number of hidden layers and the number of nodes in a hidden layer required for a particular classi® cation problem are not easy to deduce. The neural network architecture which gives the best results for a particular problem can only be determined experimentally, and this can be a lengthy process especially for large classi® cation tasks. This is often seen as an objection to neural network methods. However, some geometrical arguments can be used to derive heuristics to set approximate network sizes (Lippmann 1987). Although it is not strictly accurate, each node in a multi-layer perceptron can be viewed as a system which combines inputs in a `quasilinear’ way and in so doing de® nes hyper-surfaces in feature space which, when combined with a decision rule or process can be used to separate hyper-regions and, thus, classes. To de® ne a network size which is appropriate for a given classi® cation problem, it is necessary to examine the total number of input features and the number of output classes. Ideally, the ® rst hidden layer of a network with two hidden layers should contain two to three times the number of inputs such that a su cient number

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

Neural networks in remote sensing

717

of hyper-planes can be `formed’ to de® ne hyper-regions. For example, in a twodimensional feature space it would be useful to have more than two hyper-planes, and perhaps as many as four to be able to de® ne a small hyper-region which corresponds to a particular class. Our experience has shown that we should choose the number of nodes to be at least equal to double the number of inputs and perhaps four times as many to be safe. Likewise the ® nal hidden layer e€ ectively combines hyper-planes or hyper-regions from the previous layer to form sub-regions de® ning each class. To allow two or three regions per class, as often employed in statistical classi® cation of remotely sensed data, we have found it useful to make the number of nodes roughly equal to two to three times the total number of classes. If only one hidden layer is used, we believe the number of nodes should be equal to the higher of the two ® gures derived by the heuristics stated above. Clearly if this does not yield an accurate classi® cation result, the network should be slowly expanded for successive training runs until better results are achieved. In general, we have found single hidden layer networks to be suitable for most classi® cation problems, though once the number of classes gets near 20, it appears from our experience in remote sensing that additional ¯ exibility is required as provided by a two hidden layer network ( Kanellopoulos et al. 1992), but this is clearly dependent on the complexity of the data. Table 1 summarizes the neural network architectures that we have used so far in some of our experiments which resulted in the best performance (that is, best overall classi® cation accuracy on a test set). The size of the best network in each case is consistent with the heuristics. However, we would caution that it is not possible to rely on such heuristics and that each classi® cation problem needs to be carefully examined in its own right. It is important always to check classi® cation performance during training, and, to verify that the accuracies achieved with both test and training data are su cient, that is, to ensure that the classi® er generalizes well to new/unseen data. One possible approach for ® nding good architectures is simply to train a large number of networks with di€ erent architectures in parallel. This is only practical in a realistic time scale with special purpose hardware, such as the Siemens SYNAPSE-1 parallel neuro-computer, which we have recently begun evaluating in the remote sensing context. 3.

Use of feature enhancements to improve speed/performance

Although the training speed and overall performance of neural networks can be improved by using appropriate network architectures and good optimization procedures in the learning algorithm, it is also possible to achieve improvements by deliberately enhancing the features which are input to the network. There are two ways to do this: (a ) use additional features which provide extra informationÐ either extracted from the image itself, or from ancillary data sets; and (b ) to provide `higher order’ terms derived directly from the original feature set. We have investigated both approaches in our experiments. 3.1. Use of additiona l features The use of extra features often has a bene® cial e€ ect on classi® cation performance, so long as the features provide additional useful information. It is not guaranteed that the use of extra features will always increase accuracy, however, since such features increase the dimensionality of the feature space and the complexity of the network, which can make training more di cult. In cases where there is considerable redundancy between the new features and the original ones, it is possible for the

718

I. Kanellopoulo s and G. G. W ilkinson Table 1.

Some empirical results on best MLP network architectures.

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

Data description France, ArdeÁche Departement, agricultural area, two dates, SPOT HRV imagery France, Loir et Cher Departement, agricultural area, single date, SPOT HRV imagery France, Loir et Cher Departement, agricultural area, two dates, SPOT HRV imagery Portugal, Lisbon/River Tejo valley, very mixed land use, Landsat TM imagery Portugal, Lisbon/River Tejo valley, very mixed land use, Landsat TM and ERS-1 SAR imagery Portugal, Lisbon/River Tejo valley, Landsat TM data and textural features from ERS-1 SAR data Portugal, Lisbon/River Tejo valley, Landsat TM data and textural features from ERS-1 SAR data

Number of input features 6 (2 Ö 3 SPOT HRV channels)

Number of output land cover classes 20

Neural network architecture 2 hidden layers, 17 and 53 nodes per hidden layer

3 SPOT HRV channels

7

1 hidden layer, 15 nodes

6 (2 Ö 3 SPOT HRV channels)

7

1 hidden layer, 29 nodes

16

1 hidden layer, 28 nodes

6 Landsat TM channels plus ERS-1 SAR backscatter intensity channel

9

1 hidden layer, 17 nodes

6 Landsat TM channels and 3 SAR textural features

16

1 hidden layer, 35 nodes

6 Landsat TM channels and 4 SAR textural features

16

1 hidden layer, 34 nodes

6 Landsat TM channels

extra features to reduce overall classi® cation performance besides lengthening training time. In most cases, however, the addition of features is found to be bene® cial. The use of additional features is, of course, appropriate for any kind of classi® er, not just for neural networks. What is most interesting about neural networks, however, is that they do not require that the features follow any parametric model, unlike statistical classi® ers. The neural network approach can, therefore, be seen as a more ¯ exible classi® cation method which makes it easier to incorporate additional features from multiple sources. In our experimental work we have enhanced spectral feature sets derived from

Neural networks in remote sensing

719

optical/infra-red satellite imagery (from LANDSAT TM and SPOT HRV ) using the following:

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

(i ) (ii ) (iii ) (iv)

texture features derived from the imagery; SAR backscattering intensity from ERS-1 and 2; texture features derived from SAR imagery; features derived from ancillary GIS data sets.

It does not serve our purpose to describe all of our experiments on these feature set enhancements here, though a few interesting observations can be made. Firstly, the use of SAR features as additional neural network inputs alongside optical and infra-red channels generally has a bene® cial e€ ect on the overall classi® cation of landscapes, which is not unexpected. However, whilst the use of the radar signal enhances the accuracy of most classes, some can be less accurately classi® ed than with optical/infra-red data alone. This has been observed both in classifying broad land cover classes ( Wilkinson et al . 1994 ) and in classifying forested areas into biodiversity classes ( Wilkinson et al . 1995a), with neural networks in both cases. This suggests that strategies which combine the results of neural network classi® cations made: (a ) with optical/infra-red data alone; and (b ) with multi-source imagery may be fruitful, though this has not so far been investigated. The use of texture features in neural networks has been shown to give enhanced classi® cation results both in our own work ( Kanellopoulos et al . 1994 ) and that of others ( for example, Augusteijn et al . 1995 ). Interestingly, although the use of additional features is viewed primarily as a way of increasing accuracy by adding net information, in some cases there can be signi® cant bene® ts in terms of training e ciency. In one particular experiment conducted with SPOT data over an agricultural test site in southern France, we found that the addition of a single ancillary featureÐ terrain height derived from a digital terrain modelÐ reduced training time for a 20-class problem by a factor of 2 whilst at the same time yielding an accuracy improvement of 4 per cent. In this case the altitude association of certain classes helped the network to learn at a very early stage how to make class separations. 3.2. Use of higher order terms Another tactic to enhance feature sets fed into neural network classi® ers is to generate so-called higher order terms from the initial feature set. This is the basis of the functional link network ( Klassen and Pao 1988, Pao 1989). The functional link net is an extension of the multi-layer perceptron concept in which an additional processing module is included (the functional link unit) which generates higher order terms from the initial input features (® gure 5). These higher order terms are usually cross-products (for example x 1x 2 , x 2x 3 ,. . . -2nd order) derived from an initial feature set {x1 , x 2 , x 3 ,. . . }. The functional link unit can be used to generate terms of various higher orders. The inclusion of terms of progressively higher order can be controlled according to the total network error: if training is not progressing as well as expected, it can be automatically re-initiated with terms of the next order up. We have tested the use of the functional link network as a means of improving both accuracy and training time. In a typical experiment we found that training time could be reduced by roughly a factor of 4, whilst total classi® cation accuracy could be increased marginally (Wilkinson et al . 1993). The use of cross-product terms derived from channel radiances is clearly vindicated, though the physical and mathematical reasons for this need further exploration. Nevertheless, as a strategy to improve performance and e ciency, the use of higher order term networks seems to be justi® ed.

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

720

I. Kanellopoulo s and G. G. W ilkinson

Figure 5. Functional link network which generates higher order terms from an initial feature set. Such networks were found to improve overall classi® cation accuracy and reduce training time signi® cantly.

4.

Use of mixed/hybrid neural network classi® ers

In a considerable number of experimental tests, neural network classi® cation of satellite imagery has been compared with classi® cation by more conventional methods (for example, Benediktsson et al . 1990, Bischof et al . 1992, Downey et al . 1992, Civco 1993, Foody 1995, Serpico and Roli 1995, Zhuang et al . 1995). In general neural network methods have been found to perform well in such studies. However, a particularly interesting and frequently overlooked aspect of such comparisons, is that there are usually signi® cant di€ erences between the performance of the classi® ers for individual classes: with some classes the neural network approach provides a much higher accuracy than the conventional methods and with others the opposite is found. This e€ ect results from the very di€ erent mathematical models underlying the di€ erent types of classi® er and from the way they divide feature space. One of the aims of our work in the last two to three years has been to try to understand such di€ erences. Visualizations and explorations of feature space are relatively revealing on this issue ( Paola and Schowengerdt 1994, Fierens et al . 1994). Statistical approaches, such as the maximum likelihood classi® er, divide feature space into regions formed by intersecting ellipsoidsÐ these being the multivariate equiprobability surfaces. Multi-layer perceptrons, however, divide feature space according to a completely di€ erent mathematical approach. The combination of the weighted sum inputs to nodes and the sigmoid or hyperbolic tangent activation function can result in relatively complex division of feature space though with some quasi-planar class separation surfaces (® gure 6 ). Such behaviour can be understood by examining the computational geometry involved (Gibson and Cowan 1990). Apart from the

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

Neural networks in remote sensing

721

Figure 6. Separation of two-dimensional feature space into classes by a multi-layer perceptron network. The x -direction represents the radiance in Landsat TM channel 1, the y direction represents the radiance in TM channel 4. Note the geometrical form and quasi-linear borders of some class regions.

di€ erences in the geometrical form of the class separation surfaces between di€ erent types of classi® ers, it is also important to note that di€ erent neural networks (of the same type) also yield di€ erent class separation surfacesÐ depending on the architectures and starting weight sets of the networks concerned. Given that di€ erent classi® ers use di€ erent geometrical forms to separate classes and that the resulting classi® cation accuracies can di€ er signi® cantly between models for the same classes, it is then appropriate to devise strategies to combine classi® ers with the aim of improving overall classi® cation performance. There are several ways of doing this. 4.1. Multiple neural network methods One approach, involving only the neural approach, is to train (using the same data) multiple networks which have di€ erent random starting weights or di€ erent architectures. In performing classi® cation, each network is then used in parallel and a majority voting strategy is used in which the class with the highest number of votes from all the networks is taken as the class to be assigned to the sample. More complex versions of this strategy can be utilized such as a combination process which is based on the Dempster± Shafer theory of evidence ( Rogova 1994 ). 4.2. Combined neural/non-neural methods A technique, which is potentially more powerful, is to take combinations of more than one type of classi® er Ð for example the maximum likelihood method and the multi-layer perceptronÐ gaining the advantage of integrating very di€ erent

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

722

I. Kanellopoulo s and G. G. W ilkinson

mathematical models. This can be done using a simple scheme as shown in ® gure 7. In this approach a multi-layer perceptron and maximum-likelihood classi® er are trained separately to classify samples from an image. These two classi® ers are tested with a set of independent samples. A second training set is then built up from those samples about which the two initial classi® ers do not agree. This new training set is then used to train a second multi-layer perceptron. The purpose of this approach is to be able to apply both classi® er models to the data in the classi® cation stage and to highlight samples for which the two models disagree. These `di cult’ pixels are then passed to the second neural network which has been specially trained to deal with such cases. A neural network is used for the `di cult cases’ since they are unlikely to fall into a distribution which can be modelled well by a statistical classi® er. In tests performed within the last one to two years at the JRC, we have found that this approach may signi® cantly increase overall classi® cation accuracy. For example, in one experiment an increase was achieved of approximately 12 per cent compared to using the neural network or maximum likelihood models alone on a problem which involved 16 distinct land cover classes ( Wilkinson et al . 1995b). 5.

Discussion

In this paper we have attempted to draw together the ® ndings of a wide range of experiments on neural network classi® cation we have conducted over ® ve years. Whilst our results have not covered all aspects of neural network image classi® cation in remote sensing, they do point to a number of fruitful strategies and implementation techniques which could contribute to the development of a body of best practice recommendations. Table 2 lists some of the main recommendations for e€ ective and

Figure 7.

Strategy for combination of maximum-likelihoo d classi® er and multi-layer perceptron neural networks.

Neural networks in remote sensing Table 2.

Recommendations and strategies for `best practice’ in neural network image classi® cation.

Number 1

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

723

Recommendation/strategy Preprocess input data and scale according to the form of the activation function used. Apply geometrical arguments and heuristics to set network architectures. Recognize the e€ ects of chaos in network training and take steps to avoid it. Adopt fast optimization technique in training, e.g., conjugate gradient when thematic classes are well separated or not very mixed. Use derived higher order terms through functional link net feature set expansion (avoids computing extra features from imagery and often yields signi® cant bene® ts in terms of training time and overall accuracy). Use additional features from multiple sources alongside basic pixel radiance information (improves accuracy in most cases and can reduce net training time signi® cantly in some cases). Integrate neural networks with conventional classi® ers using simple strategies to take advantage of the signi® cantly di€ erent underying mathematical models. Use multiple networks and voting strategies whenever possible as an alternative to ( 7).

2 3 4 5 6 7 8

e cient use of neural networks which have emerged from our work and which have been discussed in this paper. We very much hope that others will be able to enhance the collective knowledge in the ® eld by improving on our recommendations and adding to them in due course. Overall, it can be stated with some con® dence, that the wide experience gained in the use of neural networks for image classi® cation now makes it possible to use them routinely in operational projects. The neural network technique is now undergoing trials at the JRC in the context of the operational Monitoring Agriculture by Remote Sensing (MARS) Project, and is also being evaluated in the context of mapping projects for the European Union’s statistical o ce `Eurostat’. It can be expected that the use of neural networks will expand rapidly in the coming years, and that they will form an important tool in operational remote sensing. Acknowledgm ents

The authors are grateful to past and present colleagues of the Joint Research Centre who have contributed both directly and indirectly to the experimental work and ® ndings reported in this paper. In particular we should like to thank Drs Freddy Fierens, Paul Rosin, Ron Schoenmakers, Aristide Var® s, and Alessandra Chiuderi, and also Joachim Hill, Wolfgang Mehl, Jacques Megier, Walter Di Carlo, Alice Bernard, Stefania Go€ redo, and Karen Fullerton. We should also like to thank Professor Zhengkai Liu and Suzanne Furby, scienti® c visitors to the JRC, who through many fruitful discussions have contributed to our understanding of neural networks and their relation to other classi® cation techniques. References A tkinson, P . M . and T atnall, A .R . L ., 1997, Neural networks in remote sensing. International Journal of Remote Sensing , 18, 699± 709. (this issue). A ugusteijn, M . F ., C lemens, L . E . and S haw, K . A ., 1995, Performance evaluation of texture

measures for ground cover identi® cation in satellite images by means of a neural

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

724

I. Kanellopoulo s and G. G. W ilkinson

network classi® er. I.E.E.E. T ransactions on Geoscience and Remote Sensing , 33, 616± 626. B enediktsson, J . A ., S wain, P . H . and E rsoy, O . K ., 1990, Neural network approaches versus statistical methods in classi® cation of multisource remote sensing data. I.E.E.E. T ransactions on Geoscience and Remote Sensing , 28, 540± 552. B ischof, H ., S chneider, W . and P inz, A . J ., 1992, Multispectral classi® cation of Landsat images using neural networks. I.E.E.E. T ransactions on Geoscience and Remote Sensing , 30, 482± 490. C ivco, D . L . , 1993, Arti® cial neural networks for land-cover classi® cation and mapping. International Journal of Geographical Information Systems, 7, 173± 186. D owney, I . D ., P ower, C . H ., K anellopoulos, I . and W ilkinson, G . G ., 1992, A performance comparison of Landsat TM land cover classi® cation based on neural network techniques and traditional maximum likelihood algorithms and minimum distance algorithms. Proceedings 1992 Annual Conference of the Remote Sensing Society: From Research to Operation, Dundee, Scotland, 15± 17 September (Nottingham: Remote Sensing Society), pp. 518± 528. F ierens, F ., K anellopoulos, I ., W ilkinson, G . G . and M e’gier, J ., 1994, Comparison and visualization of feature space behaviour of statistical and neural classi® ers of satellite imagery. Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS 94) Pasadena, California, 8± 12 August , Vol. 4, ( Piscataway NJ: I.E.E.E. Press), pp. 1880± 1882. F ogelman S oulie, F . , 1991, Neural network architectures and algorithms: a perspective. Proceedings of the 1991 International Conference on Arti® cial Neural Networks (ICANN-91) Espoo, Finland, 24± 28 June , Vol. 1, (Amsterdam: North Holland), pp. 605± 615. F oody, G . M . , 1995, Land cover classi® cation by an arti® cial neural network with ancillary information. International Journal of Geographical Information Systems, 9, 527± 542. G ibson, G . J . and C owan, C . F . N ., 1990, On the decision regions of multilayer perceptrons. Proceedings of the I.E.E.E., 78, 1590± 1594. K anellopoulos, I ., V arfis, A ., W ilkinson, G . G . and M e’gier, J ., 1992, Land-cover discrimination in SPOT HRV imagery using an arti® cial neural network: a 20 class experiment. International Journal of Remote Sensing , 13, 917± 924. K anellopoulos, I ., W ilkinson, G . G . and C hiuderi, A ., 1994, Land cover mapping using combined Landsat TM imagery and textural features from ERS-1 Synthetic Aperture Radar Imagery. Image and Signal Processing in Remote Sensing, Proceedings SPIE 2315 (Bellingham, Washington: SPIE), pp. 332± 341. K ey, J ., M aslanic, A . and S chweiger, A . J ., 1989, Classi® cation of merged AVHRR and SMMR arctic data with neural networks. Photogrammetric Engineering and Remote Sensing, 55, 1331± 1338. K lassen, M . S . and P ao, Y .-H . , 1988, Characteristics of the functional-link net: a higher order delta rule net. Proceedings of the 2nd Annual International Conference on Neural Networks, June, San Diego, California , Vol. 1 (Piscataway NJ: I.E.E.E. Press), pp. 507± 513. L ee, J ., W eger, R . C ., S engupta, S . K . and W elch, R . M . , 1990, A neural network approach to cloud classi® cation. I.E.E.E. T ransactions on Geoscience and Remote Sensing , 28, 846± 855. L ippmann, R . P ., 1987, An introduction to computing with neural nets. I.E.E.E. ASSP Magazine , 2, 4± 22. P ao, Y .-H . , 1989, Adaptive Pattern Recognition and Neural Networks (Reading, Massachusetts: Addison-Wesley). P aola, J . D . and S chowengerdt, R . A ., 1994, Comparisons of neural networks to standard techniques for image classi® cation and correlation. Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS 94), Pasadena, California, 8 ± 12 August , Vol. 3 ( Piscataway NJ: I.E.E.E. Press), pp. 1404± 1406. P aola, J . D . and S chowengerdt, R . A ., 1995, A review and analysis of back-propagation neural networks for classi® cation of remotely-sensed multi-spectral imagery. International Journal of Remote Sensing , 16, 3033± 3058. P ress, W . H ., F lannery, B . P ., T eukolsky, S . A . and V etterling, W . T . , 1988, Numerical Recipes in C (Cambridge: Cambridge University Press).

Neural networks in remote sensing

725

Downloaded by [NWFP University of Engineering & Technology - Peshawar] at 00:37 20 June 2014

R ogova, G ., 1994, Combining the results of several neural network classi® ers. Neural Networks, 7, 777± 781. R umelhart, D . E ., H inton, G . E . and W illiams, R . J ., 1986, Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundation s, edited by D. E. Rumelhart, J. L.

McClelland, and the PDP Research Group (Cambridge, Massachusetts: MIT Press), pp. 318± 362. S erpico, S . B . and R oli, F . , 1995, Classi® cation of multisensor remote-sensing images by structured neural networks. I.E.E.E. T ransactions on Geoscience and Remote Sensing , 33, 562± 578. V an D er M aas, H . L . J ., V erschure, P . F . M . J . and M olenaar, P . C . M . , 1990, A note on chaotic behaviour in simple neural networks. Neural Networks, 3, 119± 122. W atrous, R ., 1987, Learning algorithms for connectionist networks: applied gradient methods of non-linear optimization. Technical Report MS-CIS-87-51, University of Pennsylvania, Philadelphia, U.S.A. W ilkinson, G . G ., K anellopoulos, I ., L iu, Z . K . and F olving, S ., 1993, Integrated land cover mapping from satellite imagery using arti® cial neural networks. Ground Sensing, Proceedings SPIE 1941 (Bellingham, Washington: SPIE), pp. 68± 75. W ilkinson, G . G ., K anellopoulos, I ., M ehl, W . and H ill, J ., 1994, Land cover mapping using combined LANDSAT Thematic Mapper imagery and ERS-1 Synthetic Aperture Radar imagery. Proceedings of the Pecora 12 Symposium: L and Information f rom SpaceBased Systems, Sioux Falls, USA, 24± 26 August 1993 (Bethesda, Maryland: American Society for Photogrammetry and Remote Sensing), pp. 151± 158. W ilkinson, G . G ., F olving, S ., K anellopoulos, I ., M c C ormick, N ., F ullerton, K . and M egier, J ., 1995a, Forest mapping from multi-source satellite data using neural network classi® ers: an experiment in Portugal. Remote Sensing Reviews, 12, 83± 106. W ilkinson, G . G ., F ierens, F . and K anellopoulos, I ., 1995b, Integration of neural and statistical approaches in spatial data classi® cation. Geographical Systems, 2, 1± 20. Z huang, X ., E ngel, B . A ., X iong, X . and J ohannsen, C . J ., 1995, Analysis of classi® cation results of remotely sensed data and evaluation of classi® cation algorithms. Photogrammetric Engineering and Remote Sensing , 61, 427± 433.

Related Documents

Nn
July 2020 20
Nn
June 2020 20
Nn
November 2019 22
Nn
October 2019 22

More Documents from ""