Artificial Neural Networks - Svm

  • Uploaded by: Fernando Sereno
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Artificial Neural Networks - Svm as PDF for free.

More details

  • Words: 1,893
  • Pages: 4
Foetal Weight Estimation by Support Vector Regression Fernando Sereno* , J.P. Marques de Sá* , Ana Matos † , João Bernardes †† *

FEUP – Faculdade de Engenharia da Universidade do Porto, Portugal

E-mail: [email protected] † ††

HSJ – Hospital de S. João, Dep. Ginecologia e Obstectrícia, Porto, Portugal FMUP - Faculdade de Medicina da Universidade do Porto, Portugal

INEB - Instituto de Engenharia Biomédica, Porto, Portugal

Abstract Foetal weight estimation based on echographic measurements has paramount importance. This paper reports some results using data taken from a dataset of four Portuguese hospitals that participated in the collection of clinical and echographic data (414 cases) during 1998-99. Firstly, it revises some theoretical concepts from Statistical Learning Theory. Then, reports some results using Support Vector regression to predict foetal weigh in lower and higher bands. Finally, it concludes that the error given by the SVR methods are better than traditional formulas and can be improved further.

1

Introduction

solve real-life problems and is based in the assumption that the investigator knows the problem, the function to be found up to a finite number of parameters. Using information about the statistical law and the maximum likelihood method applied to the data one finds the target function and estimates its parameters, which is the essence of classical Fisherean inference. When one does not have reliable a priori information about the statistical law underlying the problem or about the class of functions and the conditions under which one can get better approximations with an increasing number of examples, one is in the general inference approach, a development that was started by Glivenko, Cantelli and Kolmogorov.

Foetal weight estimation based on echographic measurements has paramount importance in delivery risk assessment [1,2].

In the last 40 years of research this approach culminated in inductive methods, a different type of inference which is more general and more powerful than parametric inference [4,5].

The research objective is to know to what extent SV machines can improve over the 15% error of FW estimation performed by prediction formulas in current day clinical use [3].

3

Minimizing the risk functional from empirical data

Four Portuguese hospitals participated in the collection of clinical and echographic data (414 cases) during 1998-99, according to a protocol. Each case consists of foetal weight (FW) at birth, and five echographic measurements, taken one week before birth. These are: biparietal diameter, cephalic circumference, abdominal circumference, femur length and umbilical artery resistance index..

The basic problem is to formulate a constructive criterion for choosing from parametric sets of functions one function that minimizes the mathematical expectation

2

where Q(z,α) is called the loss function, z is a variable that represents random independent observations z1 ,… , z l , obtained according to

Statistical inference

Parametric statistics aims to create simple statistical methods of inference that can be used to

R(α ) = ∫Q( z, α )dF ( z ) ,

α ∈ Λ,

(1)

unknown distribution F(z), α is a parameter from a Page 1

set Λ, arbitrary, it can be a set of scalar quantities, a set of vectors or a set of abstract elements, and the integral is Lebesgue-Stieltjes for a bounded nonnegative function [4].

4

The distance between empirical and expected risk, involving the number of examples l , and the capacity h of the function space, a quantity measuring the “complexity” of the space can be bounded by a probabilistic measure [4,5].

The problem of regression estimation

Estimating the stochastic dependence based on empirical data pairs ( y1 , x1 ), K , ( y l , x l ), taken randomly and independently from a joint distribution function F(x,y), means estimating the conditional distribution function F(y|x). This is often an ill-posed problem [4,5,6], that can however be determined by the mathematical expectation

r ( x) = ∫ydF ( y x)

(2)

called regression function. It can be shown that this function can be estimated for sets of functions f(x,α) in the metric L2(P), by minimizing the functional

R(α ) = ∫( y − f ( x, α ) ) dF ( x, y ) 2

6

SVR Method

In the Support Vector approach the basic problem is to formulate a constructive criterion for choosing from parametric sets of functions one function that minimizes the expected risk (1). One cannot minimize this functional directly since one ignores the probability distribution function F(z). Instead one can use the classical induction principle based on empirical data pairs ( y1 , x1 ), K , ( yl , xl ), that consists in minimizing an empirical risk functional, for example (4). There are probabilistic bounds on the distance between empirical and expected risks involving the number of examples l and the capacity h of the function space, a quantity measuring the “complexity” of the space.

(3) The solution of learning f(x,α) is found by solving for each constant Am, related to hypothesis spaces, an optimization problem:

5

Principle of Empirical Risk Minimization

One cannot minimize the functional (3) directly since one ignores the probability distribution function F(x,y) that defines the risk. Instead one can use the classical induction principle, that consists in minimizing the empirical risk functional 2

Remp

1 l = ∑ ( y i − f ( x i , α )) , l i=1

α ∈ Λ (4)

on the base of empirical data pairs

( y1 , x1 ),K , ( y l , x l ). It can be shown that under particular conditions there is uniform convergence to the mathematical expectation (3) of an empirical measure estimator, e.g. the functional (4), therefore sup

R(α ) − Remp (α )  → 0, as l → ∞ (5) P

α∈Λ

which means that the principle of minimizing the empirical risk provides a sequence of functions that converges in probability to the best solution.

min f

1 l Q( y , f ( x ) + λ f ∑ i i l i =1

2

(6) K'

subject to f

K

≤ Am ,

(7)

and choosing among the solutions found for each Am, the one with the best trade off between empirical risk and capacity [4,5,7]. The regularization parameter λpenalizes functions with high capacity. In Support Vector Regression (SVR) we used the loss function : Q( yi , f ( xi )) = yi − f ( xi ) ε

(8)

where the function |.|ε , is called ε-insensitive loss. The function given has the general form: f ( x ) = ∑ i=1 ci K ( x, x i ) l

(9)

The data points xi associated with nonzero ci are called support vectors and they represent the most informative data points and compress the information contained in the training set.

Page 2

7

Experimental results

8

The SVR training algorithm [8] has been tested on two subsets of our Foetal weight (FW) data set, each one corresponding to the inferior and superior tails of the FW distribution function, as shown in figures 1 and 2. The central and most frequent cases will not belong to our sub-sets. The experiment consists in the performance determination in a test separate set. The predicted FW (FWpred) was computed from two echographic features abdominal circumference (AC) and femur length (FL), in two different portions of the distribution function, with almost the same number of examples. The polynomial kernels used in this experiment were of order ≤ 7. The ε-insensitive loss function used values of ε >0.05. The number of support vectors returned by our algorithm was SVinf=90.5% and SVsup=96.5%, respectively in the inferior and superior tails of the FW distribution function.

Conclusions

SVR is equivalent to maximizing the margin between training examples and the regression function. It is an alternative to other neural networks with training methods that optimize cost functions such as the mean square error, therefore it can be applied to FW estimation. SVR is motivated by the statistical learning theory, which characterizes the performance of SVR learning using bounds on their ability to predict future data. The training consists in solving a constrained quadratic optimization problem [4,5,9]. Among others, this implies that there is a unique optimal solution for each choice of the SVR parameters. This is unlike other learning machines, such as standard Neural Networks trained using backpropagation.

Finally, the error rates we got were Einf=11.2% Esup= 10.0%, in the inferior and superior tails of the FW distribution function, respectively.

SVR prediction LOW values of FW

SVR prediction HIGH values of FW

4500

4500

4000 FW (blue) - Estimated FW (red)

FW (blue) - Estimated FW (red)

4000

3500

3000

2500

2000

1500

1000

3500

3000

2500

2000

1500

5

10

15 #case

20

25

30

Figure 1 – Support Vector Regression (SVR) predicted Low foetal weights (FW) (inferior tail of the distribution function). Graphical representation of a sample of 30 real and estimated FW, using a SVR with polynomial kernel of the 7th grade, a 10% ε-insensitive loss function, regularization parameter λ= 1000, trained with a separate sub-set of 60 cases. The real foetal weights are ordered increasingly and represented by dots, and the corresponding estimated values are represented by circles (the lines connecting these circles are for visualization purposes only)

1000

5

10

15

20 #case

25

30

35

Figure 2 – Support Vector Regression (SVR) predicted High foetal weights (FW) (superior tail of the distribution function). Graphical representation of a sample of 38 real and estimated FW, using a SVR with polynomial kernel of the 7th grade, a 10% ε-insensitive loss function, regularization parameter λ = 1000, trained with a separate sub-set of 66 cases. The real foetal weights are ordered increasingly and represented by dots, and the corresponding estimated values are represented by circles (the lines connecting these circles are for visualization purposes only)

Page 3

Acknowledgement The authors would like to thank to Steve Gunn and the Image Speech & Intelligent Systems Group, University of Southampton, United Kingdom, for letting us experiment the Matlab software developed for Support Vector Machines for Classification and Regression.

References [1] Farmer R.M., Medearis A.L., Hirata G.I., Platt L.D., 1992, “The Use of a Neural Network for the Ultrasonographic Estimation of Foetal Weight in Macrosomic Fetus”, Am J Obstet Gynecol ,May 1992. [2] Chauhan S. P. et al., 1998, “Ultrasonographic estimate of birth weight at 24 to 34 weeks: A multicenter study”, Am J Obstet Gynecol October 1998. [3] Sereno F, Marques de Sá J.P, Matos A, Bernardes, “The Application of Radial Basis Functions and Support Vector Machines to the Foetal Weight Prediction”, in Dagli C H et al. (eds.), Proceedings of ANNIE ' 2000, Smart Engineering System Design Conference, St Louis, 2000. [4] Vapnik V.N., Statistical Learning Theory, New York, Springer, 1998. [5] Cristianini N. & Shawe-Taylor J., An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods, Cambridge, Cambridge University Press, 2000 [6] Haykin S., Neural Networks - A Comprehensive Foundation (2d Edition), New York ,Prentice Hall, 1999. [7] Cherkassky V, Mulier F., Learning From Data – Concepts, Theory, and Methods, New York, John Wiley & Sons, Inc., 1998 [8] Gunn S., Support Vector Machines for Classification and Regression, Image Speech & Intelligent Systems Group, University of Southampton, United Kingdom, 1998. [9] Evgeniou T, Pontil M, Workshop on support vector machines, theory and applications, Center for Biological and Computational Learning, and Artificial Intelligence Laboratory, MIT, Cambridge, 2000.

Page 4

Related Documents


More Documents from ""