Connexions Module: M11240

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Connexions Module: M11240 as PDF for free.

More details

  • Words: 1,668
  • Pages: 4
Connexions module: m11240

1

Optimization Theory Version 1.4: Mar 22, 2008 4:24 pm GMT-5

Don Johnson This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License



Optimization theory is the study of the extremal values of a function: its minima and maxima. Topics in this theory range from conditions for the existence of a unique extremal value to methodsboth analytic and numericfor nding the extremal values and for what values of the independent variables the function attains its extremes. In this book, minimizing an error criterion is an essential step toward deriving optimal signal processing algorithms. An appendix summarizing the key results of optimization theory is essential to understand optimal algorithms.

1 Unconstrained Optimization The simplest optimization problem is to nd the minimum of a scalar-valued function of a scalar variable f (x)the so-called objective functionand where that minimum is located. Assuming the function is dierentiable, the well-known conditions for nding the minimalocal and globalare1

d f (x) = 0 dx d2 f (x) > 0 dx2 All values of the independent variable x satisfying these relations are locations of local minima. Without the second condition, solutions to the rst could be either maxima, minima, or inection points. Solutions to the rst equation are termed the stationary points of the objective function. To nd the global minimumthat value (or values) where the function achieves its smallest valueeach candidate extremum must be tested: the objective function must be evaluated at each stationary point and the smallest selected. If, however, the objective function can be shown to be strictly convex, then only one solution of d dx (f ) = 0 exists and that solution corresponds to the global minimum. The function f (x) is strictly convex if, for any choice of x1 , x2 , and the scalar a, f (ax1 + (1 − a) x2 ) < af (x1 ) + (1 − a) f (x2 ). Convex objective functions occur often in practice and are more easily minimized because of this property. When the objective function f (·) depends on a complex variable z , subtleties enter the picture. If the function f (z) is dierentiable, its extremes can be found in the obvious way: nd the derivative, set it equal to zero, and solve for the locations of the extrema. However, there are many situations in which this function is not dierentiable. In contrast to functions of a real variable, non-dierentiable functions of 2 a complex variable occur frequently. The simplest example is f (z) = (|z|) . The minimum value of this function obviously occurs at the origin. To calculate this obvious answer, a complication arises: the function f (z) = z is not analytic with respect to z and hence not dierentiable. More generally, the derivative of a ∗ http://creativecommons.org/licenses/by/1.0

1 The

maximum of a function is found by nding the minimum of its negative.

http://cnx.org/content/m11240/1.4/

Connexions module: m11240

2

function with respect to a complex-valued variable cannot be evaluated directly when the function depends on the variable's conjugate. See Churchill [2] for more about the analysis of functions of a complex variable. This complication can be resolved with either of two methods tailored for optimization problems. The rst is to express the objective function in terms of the real and imaginary parts of z and nd the function's minimum with respect to these two variables.2 This approach is unnecessarily tedious but will yield the solution. The second, more elegant, approach relies on two results from complex variable theory. First, the quantities z and z can be treated as independent variables, each considered a constant with respect to the other. A variable and its conjugate are thus viewed as the result of applying an invertible linear transformation to the variable's real and imaginary parts. Thus, if the real and imaginary parts can be considered as independent variables, so can the variable and its conjugate with the advantage that the   2 2 ∂ ∂ mathematics is far simpler. In this way, ∂z (|z|) = z and ∂z (|z|) = z . Seemingly, the next step to minimizing the objective function is to set the derivatives with respect to each quantity to zero and then solve the resulting pair of equations. As the following theorem suggests, that solution is overly complicated.

Theorem 1:

If the function f (z, z) is real-valued and analytic with respect to z and z , all stationary points can be found by setting the derivative (in the sense just given) with respect to either z or z to zero [1]. 2

Thus, to nd the minimum of (|z|) , compute the derivative with respect to either  z or z . In most cases, 2 ∂ 3 the derivative with respect to z is the most convenient choice. Thus, ∂z (|z|) = z and the stationary point is z = 0. As this objective function is strictly convex, the objective function's sole stationary point is its global minimum. When the objective function depends on a vector-valued quantity x, the evaluation of the function's stationary points is a simple extension of the scalar-variable case. However, testing stationary points as possible locations for minima is more complicated [3]. The gradient of the scalar-valued function f (x) of a vector x (dimension N ) equals an N -dimensional vector where each component is the partial derivative of f (·) with respect to each component of x.   ∂ f (x) ∂x 1     .. ∇x (f (x)) =   .   ∂ f (x) ∂xN T T For example, the gradient P of x Ax is Ax + A x. This result is easily derived by expressing the quadratic form as a double sum ( i ∧ j (Aij xi xj )) and evaluating the partials directly. When A is symmetric, which is often the case, this gradient becomes 2Ax. The gradient "points" in the direction of the maximum rate of increase of the function f (·). This fact is often used in numerical optimization algorithms. The method of steepest descent is an iterative algorithm where a candidate minimum is augmented by a quantity proportional to the negative of the objective function's gradient to yield the next candidate.

∀α, α > 0 : (xk , xk−1 − α∇x (f (x))) If the objective function is suciently "smooth" (there aren't too many minima and maxima), this approach will yield the global minimum. Strictly convex functions are certainly smooth for this method to work. The gradient of the gradient of f (x), denoted by ∇2x (f (x)), is a matrix where j th column is the gradient of the j th component of f 's gradient. This quantity is known as the Hessian, dened to be the matrix of all the second partials of f (·).  2  ∂2 ∇x (f (x)) ij = f (x) ∂xi ∂xj 2 The multi-variate minimization problem is discussed in a few 3 Why should this be? In the next few examples, try both and

http://cnx.org/content/m11240/1.4/

paragraphs. see which you feel is "easier".

Connexions module: m11240

3

The Hessian is always a symmetric matrix. The minima of the objective function f (x) occur when

∇x (f (x)) = 0 and

∇2x (f (x)) > 0

, positive denite. Thus, for a stationary point to be a minimum, the Hessian evaluated at that point must be a positive denite matrix. When the objective function is strictly convex, this test need not be performed. For example, the objective function f (x) = xT Ax is convex whenever A is positive denite and symmetric.4 When the independent vector is complex-valued, the issues discussed in the scalar case also arise. Because of the complex-valued quantities involved, how to evaluate the gradient becomes an issue: is ∇z or ∇z∗ more appropriate?. In contrast to the case of complex scalars, the choice in the case of complex vectors is unique. i.e.

Theorem 2:

Let f (z, z) be a real-valued function of the vector-valued complex variable z where the dependence on the variable and its conjugate is explicit. By treating z and z as independent variables, the quantity pointing in the direction of the maximum rate of change of f (z, z) is ∇z (f (z))[1]. To show this result, consider the variation of f given by  X ∂ ∂ T T δf = (f ) δzi + (f ) δzi = (∇z (f )) δz + (∇z (f )) δz ∂z ∂z i i i

  H This quantity is concisely expressed as δf = 2< (∇z (f )) δz . By the Schwarz inequality, the maximum value of this variation occurs when δz is in the same direction as (∇z (f )). Thus, the direction corresponding to the largest change in the quantity f (z, z) is in the direction of its gradient with respect to z. To implement the method of steepest descent, for example, the gradient with respect to the conjugate must be used. To nd the stationary points of a scalar-valued function of a complex-valued vector, we must solve

∇z (f (z)) = 0

(1)

For solutions of this equation to be minima, the Hessian dened to be the matrix of mixed partials given by ∇z (∇z (f (z))) must be positive denite. For example, the required gradient of the objective function zH Az is given by Az, implying for positive denite A that a stationary point is z = 0. The Hessian of the objective function is simply A, conrming that the minimum of a quadratic form is always the origin.

4 Note

that the Hessian of xT Ax is 2A.

http://cnx.org/content/m11240/1.4/

Connexions module: m11240

4

References [1] D.H. Brandwood. A complex gradient operator and its application in adaptive array theory. IEE Proc., Pts. F and H, 130:1116, 1983. [2] R.V. Churchill and J.W. Brown. Complex Variables and Applications. McGraw-Hill, 1989. [3] D.G. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.

http://cnx.org/content/m11240/1.4/

Related Documents