Sturm-Liouville boundary value problems: two different approaches Jorge Arce Garro December 11, 2018 Abstract Sturm-Liouville boundary value problems are important in several branches of PDEs, physics and engineering. Also, they are theoretically interesting in their own right. We will carry out an exploration of these BVPs, using 2 different frameworks. First, we shall refer to how the general methods of functional analysis can be applied to the operator L and yield, with elegance, several results on the nature of the eigenvalues of the solutions. Second, we shall use a direct, parametric approach to study u and p(x)u0 . Much can be said of these problems via elementary arguments, by using a particular set of polar coordinates called the Prüfer variables.
1
Introduction
We start with the definition of a Sturm-Liouville boundary value problem (S-L BVP). It is the second-order differential equation given by 1 d d Lu(x) = λu(x), L= − p(x) + q(x) (1) r(x) dx dx subject to the following boundary conditions: BCa (u) := cos(α)u(a) − sin(α)p(a)u0 (a) = 0
(2)
0
BCb (u) := cos(β)u(b) − sin(β)p(b)u (b) = 0 for a < b, α, β ∈ R. Remark: The angles α and β are here to give us versatility to describe many homogeneous boundary conditions in a single equation. For example:
1
• For an angle equal to 0 we have a boundary condition that only depends on u (Dirichlet boundary condition). • For an angle equal to π/2 we have a boundary condition that only depends on u0 (Neumann boundary condition). • For non-equivalent cases boundary condition).
mod π, we have a mixed boundary condition (Robin
Note that (1) is written as an eigenvalue problem for the linear operator L. Indeed, as part of solving the problem, we will have to determine the values of λ that allow solutions. We shall call these eigenvalues of (1), and the associated solutions u shall be called eigenfunctions of (1) These problems are important in several areas of physics and engineering. Perhaps the most widespread place in which they appear is when the method of separation of variables is applied to solve partial differential equations subject to boundary conditions. When successful, this method splits a PDE into several ODEs, and for many of these PDEs (e.g. heat equation, wave equation, Laplace’s equation) we end up with problems of the form of (1), at least when the geometry of the boundary is simple enough. Definition 1.1. The Sturm-Liouville boundary value problem is said to be regular if p(x), w(x) > 0, and p(x), p0 (x), q(x), and w(x) are continuous over the finite interval [a, b]. We shall only consider regular Sturm-Liouville BVPs here. Example:One of the simplest examples of a S-L BVP is given by taking [a, b] = [0, 2π], p = r = 1, q = 0, α = β = 0. We are left with: −
d2 u = λu dx2
subject to u(0) = u(π) = 0. Let us restrict our attention to real solutions (therefore, to real eigenvalues as well). This is a second order, homogeneous ODE of constant coefficients. Its solutions depend on the sign of λ : • For λ > 0, solutions are sinusoidal • For λ = 0, solutions are linear • For λ < 0, solutions are exponential It is easy to check that the only way to satisfy the √ boundary conditions is with sinusoidal functions. Therefore, we have λ > 0, and setting λ = ω we get:
2
u(x) = C cos(ωx) + D sin(ωx) Plugging in the boundary conditions, we see: 0 = C cos 0 + D sin 0 ⇒ C = 0 and, assuming that u is not identically zero, 0 = D sin πω ⇒ ω = n ∈ N Let us discard the trivial eigenfunction sin(0x). The eigenfunctions are of the form un = D sin (n + 1)x and the eigenvalues are (n + 1)2 : n ∈ N . The example above will serve to illustrate several important properties of S-L problems throughout our exploration.
2
A functional analysis approach
2.1
Compact, symmetric operators
One of the first things we would like to gain insight on is the distribution of the eigenvalues for (1). For our particular example above, we have a sequence of countable eigenvalues which tends to infinity. Is this always the case? Functional analysis provides us with a whole family of operators for which much can be said about the eigenvalues: Definition 2.1. Let X be a Banach space (in our context, this will be a space of functions). A linear operator A on X is said to be compact if for any bounded sequence fn there is a subsequence fnk such that A (fnk ) is convergent. The following theorem is a standard result, found in any functional analysis textbook. Theorem 2.1. (Spectral theorem for compact, symmetric operators): Let H be a Hilbert space with inner product h , i and let A be a bounded, compact operator on H. Assume, furthermore, that A is symmetric, that is: ∀f, g ∈ H (hg, Af i = hAg, f i) Then there exists a sequence of real eigenvalues αj converging to 0. The corresponding normalized eigenvectors uj form an orthonormal set and every f ∈ Range(A) can be written as a finite linear combination of the uj ’s:
3
f=
N X
huj , f i uj
j=0
(Note this implies that if Range(A) is dense, then the eigenvectors form an orthonormal basis.) It wouldn’t make much sense to try to apply this theorem to the operator L right away, given that we saw that there is at least one case (the example above) in which the sequence of eigenvalues tends to infinity, rather than zero! However, precisely for this reason, one might be inclined to look at the inverse of the operator L, which would have eigenvalues equal to the reciprocal of these, which would in turn tend to zero. This shall be our strategy.
2.2
L is symmetric in an appropriate subspace
The symmetry of an operator is a property preserved even when taking inverses, so writing down an appropriate Hilbert space for our solutions in which L is symmetric is valuable work. Define H as the space of continuous functions (can be taken complex-valued) in [a, b], equipped with the following inner product: Z hf, gi =
b
f (x)g(x)r(x)dx a
That is, r shall play the role of a density function in our inner product. Next, we would like to define L as an operator on H. Unfortunately, there are functions on H which are not even differentiable, and we need to differentiate twice. Furthermore, we are only going to consider as inputs functions which satisfy the boundary conditions (2). This means we will need to define the following subspace: H0 = f ∈ C 2 [a, b]: BCa (f ) = BCb (f ) = 0 and with it, we are able to prove: Lemma 2.1. The operator L : H0 → H0 is symmetric. Proof. We sketch the proof, omitting computational details. By using integration by parts twice, it is easy to show the Lagrange identity: hg, Lf i = Wa (g, f ) − Wd (g, f ) + hLg, f i
4
(3)
where Wx (f, g) denotes the Wronskian at x of the functions f and g, modified by multiplying by p(x): Wx (f, g) = p(x)(f (x)g 0 (x) − g(x)f 0 (x)) It can be seen that under the conditions BCa = BCb = 0 for f and g the Wronskians above banish. This leaves us with hg, Lf i = hLg, f i, i.e., L is symmetric!
2.3
Showing compactness for (L − zI)−1
Now, although we want to work with the inverse of L, we might run across the situation of L not being invertible. However, we can always find a fixed z ∈ C such that z is not an eigenvalue of L, and so L − zI is an invertible operator. This transformation will only shift the eigenvalues of L by z, and so we will still be able to study them. By doing this, we are able to prove: Theorem 2.2. The operator (L − zI) : H0 → H0 is a bounded, compact operator. Although the proof uses interesting techniques, is not specially illuminating and is very heavy computationally, so we will just sketch the main ideas below: i. Write (1) as a first-order system of equations by using a variant of the classic substitution used to this end: w = y 0 (p(x) ii. To find the inverse of L − zI, we want to solve the non-homogeneous system (L − zI)f = g, for f and g ∈ H0 Use variation of parameters on the system above to do so, noting that the homogeneous diff. eq. associated to (L − zI)f = g is precisely (1) with eigenvalue z. iii. Let us denote RL (z) = (L − zI)−1 . The solution we just obtained can be written in the form: Z f (x) = RL (z)g(x) =
b
G(z, x, t)g(t)r(t)dt a
where G denotes the Green’s function: 1 G(z, x, t) = W (ub (z), ua (z))
( ub (z, x)ua (z, t), x ≥ t ub (z, t)ua (z, x), x ≤ t
and ua (z, ·) and ub (z, ·) denote any 2 solutions of the homogeneous system (L−zI)f = 0 adapted to the boundary conditions.
5
iv. We now have RL (z) in the form of an integral operator (on a bounded set) against the measure induced by the density function r, where its kernel G is continuous. The methods of functional analysis guarantee that any operator of this form is bounded and compact. (To do so, a bounded sequence fn on H0 is produced and it is shown that RL (z)fn is a equicontinuous sequence of functions. Then the Arzelà–Ascoli theorem is applied to guarantee the existence of a convergent subsequence.)
2.4
Consequences for L
We conclude this section with the following result: Theorem 2.3. The regular Sturm-Liouville problem has a countable number of discrete, simple eigenvalues En which accumulate only at ∞. The corresponding normalized eigenfunctions un can be chosen real-valued and form an orthonormal basis for H0 . To do this, we pick a λ ∈ R such that RL (λ) exists. It is elementary to check that L is symmetric if and only if (L − λI)−1 is symmetric. Thus, we can apply the spectral theorem to RL to find it has a countable number of eigenvalues αn → 0. The corresponding eigenvalues for L are of the form En = λ + α1n , which are discrete and diverge to ∞. The fact that the eigenvalues are simple can be checked by noting that if un and vn are 2 l.i. functions associated to the eigenvalue En , then the Wronskian W (un , vn ) banishes, implying these functions are linearly dependent. With this we finish our use of functional analysis to study (1). We have shed considerable light on the nature of its eigenvalues. Now we turn our attention to the behavior of its eigenfunctions, by virtue of our next tool.
3 3.1
A direct approach: Prüfer variables Definition and equivalent system
Definition 3.1. : Let u(x) be a solution of (1) subject to (2). The Prüfer variables ρu , θu associated to u are defined by applying polar coordinates to (p(x)u0 (x), u(x)). Specifically: u(x) = ρu (x) sin(θu (x)),
p(x)u0 (x) = ρu (x) cos(θu (x))
(4)
It is natural to consider p(x)u0 (x) instead of just u0 (x) if we notice the form of the operator L in (1), which contains the derivative of p(x)u0 (x). For these coordinates to be uniquely defined, we shall assume WLOG that ρu never vanishes (note that if ρu (x0 ) = 0 we would have u(x0 ) = u0 (x0 ) = 0 and by uniqueness
6
we would be talking of the identically zero function). More importantly, the range of θu is not necessarily [0, 2π[: it will be assumed large enough to guarantee θu is a continuous function (and doesn’t jump from 2π back to 0) as the function (p(x)u0 (x), u(x)) winds about the origin. Let us apply the change of variables to (1). For clarity, we shall omit the variable x in our functions. We get: 1 d d 0 − pu + qu = λu ⇒ pu0 = −(λr − q)u (5) r dx dx Recall that in general polar coordinates for (X, Y ) we have the formulas: ρ0 =
XX 0 + Y Y 0 , ρ
θ0 =
Y 0X − X 0Y ρ2
In our case X = pu0 , Y = u, and with help of (5) above we get: X 0 = −(λr − q)Y
Y 0 = X/p
and so: −X(λr − q)Y + Y X/p ρu 2 /p − −(λr − q)Y 2 X θu0 = ρ2u
ρ0u =
which after some simple algebra reduces to the system: 1 sin(2θu ) = ρu + q − λr p 2 2 cos (θu ) θu0 = + (λr − q) sin2 (θu ) p
ρ0u
(6) (7)
As it is usual with this kind of substitutions, we have exchanged a second-order ODE by a 2x2 system of first-order ODEs. However, a remarkable feature of this new system is that the equation for θu0 is independent of ρ, and so we have a first-order ODE which describes the Prüfer angle! Also note that the equation for ρ0u is separable once θu is known.
7
3.2
Oscillatory nature of the eigenfunctions
The study of Prüfer angles alone is enough for us to deduce interesting properties of the eigenfunctions of (1). Let us start by studying the sign of θu0 in (7). We know that p1 > 0. Also, since r and q are continuous on [a, b] they are bounded, which implies that for a sufficiently large λ (which we know we can find, thanks to the results in section 2) we have λr − q > 0. In fact, due to the continuity we can say both of these coefficients are bounded away from zero, and so from the shape of (7) we see there is a positive constant K such that θu0 > K. As such, from (4) we see any u with an eigenvalue large enough must have an oscillatory behavior, since the polar angle has a minimum, positive rate of increase. Furthermore, θu0 will be larger at some points the bigger λ is. This indicates that these eigenfunctions will oscillate faster as λ increases! Now, if the eigenvalue λ is not large enough, we cannot guarantee that θu is increasing. But we can note the following: at a zero of u, the Prüfer angle is always increasing, as we can note from (7) and from u(x) = ρu (x) sin(θu (x)) u(x0 ) = 0 ⇔ θu (x0 ) = 0
mod π ⇒ θu0 (x0 ) =
1 >0 p(x0 )
This leads us to: Lemma 3.1. : Regardless of the magnitude of λ, a Prüfer angle can cross integer multiples of π only from below. This means that, between consecutive zeros, the Prüfer angle must increase by exactly π units.
3.3 Sturm’s comparison theorem and layout of the zeros of the eigenfunctions. These seemingly innocent observations about the Prüfer angles can be used to prove the celebrated Sturm’s comparison theorem. For this, we first need the following: Lemma 3.2. (Monotonicity of θu with respect to the coefficients of (7)): Consider 2 different Sturm-Liouville operators Lj for j = 0, 1 with their respective pj , qj and rj . Let uj be solutions for Lj uj = λj uj . Call the respective Prüfer angles θj . Assume that 1/p1 ≥ 1/p0 and λ1 r1 − q1 ≥ λ0 r0 − q0 . Then: i. If θ1 (c) ≥ θ0 (c) for some c ∈ (a, b), then θ1 (x) ≥ θ0 (x) for all x ∈ [c, b). If the inequality becomes strict at some x ∈ [c, b), it remains so. ii. Moreover, if the angles θ1 (x) and θ2 (x) coincide in 2 points, say x = c and x = d (with c < d) then p1 = p0 and λ0 r0 − q0 = λ1 r1 − q1 on [c, d]
8
Proof. i. It is known that if θ1 (c) ≥ θ0 (c) and θ10 (x) ≥ θ00 (x) for all x ∈ [c, b), then 0 θ1 (x) ≥ θ00 (x) is also true in [c, b), with the inequality staying strict after any time at which it becomes so. One can prove, by elementary means, that this result holds changing the condition "θ10 (x) > θ00 (x)" by θ10 (x) − f (x, θ1 (x)) > θ00 (x) − f (x, θ1 (x)) for any fixed function f (x, y) which is locally Lipschitz continuous with respect to y, uniformly in x. 2
(y) 2 In this case, pick f (x, y) = cos p0 (x) + (λ0 r0 (x) − q0 (x)) sin (y). It is easy to check the required uniform Lipschitz condition from the fact that p01(x) and λ0 r0 (x) − q0 (x) are bounded on [a, b]. With this f , θ00 (x) − f (x, θ0 (x)) = 0, whereas and θ10 (x) − f (x, θ1 (x)) is given by:
cos2 (θ1 ) cos2 (θ1 ) + (λ1 r1 − q1 ) sin2 (θ1 ) − + (λ0 r0 − q0 ) sin2 (θ1 ) p1 (x) p0 (x) 1 1 2 = cos (θ1 ) − + sin2 (θ1 ) ((λ1 r1 − q1 ) − (λ0 r0 − q0 )) > 0 p1 (x) p0 (x) So, using the result we stated above, we have i). ii. Using the first part, we note that the inequality θ1 (x) ≥ θ0 (x) cannot become strict at any point of the interval (or else we would have θ1 (d) > θ0 (d). This means θ1 = θ2 on [c, d], and subtracting the corresponding differential equations of the form (7), we conclude the coefficients must be equal. Now we’re ready to prove: Theorem 3.1. Sturm’s comparison theorem: Consider 2 different Sturm-Liouville operators Lj for j = 0, 1 with their respective pj , qj and rj . Let uj be solutions for Lj uj = λj uj . Call the respective Prüfer angles θj . Assume that 1/p1 ≥ 1/p0 and λ1 r1 − q1 ≥ λ0 r0 − q0 . Let (c, d) ⊆ (a, b). Suppose that at each end of (c, d) we have: W (u0 , u1 ) = 0 or u0 = 0
(8)
(where W denotes the wronskian.) If the functions u0 and u1 are linearly independent on (c, d) (i.e. not a constant multiple one of the other) then u1 has at least a zero in (c, d).
9
Proof. Let us assume, without loss of generality, that the Prüfer angles at c are in [0, π). (They can be taken to lie in [−π, π) as any polar angle, and if needed, we reverse the signs of u0 or u1 to have the polar angle in the desired interval.) The Wronskian W (u0 , u1 ) is readily calculated to give: W (u0 , u1 ) = ρ0 ρ1 sin(θ0 − θ1 )
(9)
Therefore, since we know the Prüfer radius never vanishes and keeping in mind the bounds on θj (c), (8) is equivalent to θ0 (c) = θ1 (c) or θ0 (c) = 0 Either way, we have θ0 (c) ≤ θ1 (c), and so the lemma above implies θ0 (d) < θ1 (d) unless θ0 = θ1 . But the latter would imply, going back to the ODE (6), that ρ0 and ρ1 are equal up to a multiplicative constant, and so the same can be said of u0 and u1 . Since this doesn’t happen by hypothesis, we have proved θ0 (d) < θ1 (d). But note this means θ1 (d) > π. Indeed, applying the condition (8) at x = d, (and similar reasoning to that of above for x = c) we get either of 2 cases: • θ0 (d) = θ1 (d) mod π, and by virtue of θ0 (d) < θ1 (d), we must have θ1 (d) − θ0 (d) ≥ π,
therefore θ1 (d) ≥ π
• θ0 (d) = 0 mod π: We can’t have θ0 (d) ≤ 0 because θ0 (c) ∈ [0, π] and Prüfer angles increase between zeros as noted in lemma 3.1. Therefore θ0 (d) ≥ π, which implies θ1 (d) > π. Finally, by the mean value theorem in [c, d], we find a x∗ such that θ1 (x∗ ) = π, which is a zero of u1 , as desired. By keeping the coefficients p, q and r constant, and just increasing the eigenvalues, we conclude: Corollary 3.2.1. : Let u0 and u1 be eigenfunctions of (1) associated to eigenvalues λ0 and λ1 , with λ1 > λ0 . Then between 2 zeros of u1 one can find a zero of u0 .
3.4
Remark: on the interlacing of zeros:
More careful estimates with the Prüfer angles allow us to show that, once we have our sequence of eigenvalues ordered as E0 < E1 < ..., then the corresponding eigenfunctions
10
un have exactly n zeros on [a, b]. (Note this is in agreement with the elementary example we did at the beginning on [0, π], in which the eigenfunctions are sin((n + 1)x). Knowing the exact number of zeros, along with the corollary we got above, allows us to be much more specific about the relative position of the zeros of the eigenfunctions: Corollary 3.2.2. Let un be the eigenfunctions of (1), sorted according to the size of the eigenvalues. Then the zeros of u( n + 1) interlace the zeros of un . This means, if xn,j are the zeros of un inside (a, b), then: a < xn+1,1 < xn,1 < xn + 1, 2 < ... < xn+1,n+1 < b A quick analysis shows this is the only possible behavior if the zeros of un+1 must always have a zero of un in between.
Figure 1: Illustration of the interlacing of zeros for the eigenfunctions u3 = sin(4x) and u4 = sin(5x), for the elementary example −u00 = λu
11
References [1] William E Boyce, Richard C DiPrima, and Douglas B Meade. Elementary differential equations and boundary value problems, volume 9. Wiley New York, 1992. [2] John K Hunter and Bruno Nachtergaele. Applied analysis. World Scientific Publishing Company, 2001. [3] Gerald Teschl. Ordinary differential equations and dynamical systems, volume 140. American Mathematical Soc., 2012.
12