Chap 8 Dynamic Programming

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Chap 8 Dynamic Programming as PDF for free.

More details

  • Words: 2,663
  • Pages: 10
Chapter 8

Dynamic Optimization 8.1

Introduction to Dynamic Programming: Leading Example

Example 152 (One-Sector Growth Model in Macroeconomics) Consider the following intertemporal optimization problem:

sup

∞ X

{ct ,kt }∞ t=0 t=0

subject to :

β t u(ct )

(8.1)

ct + kt+1 ≤ f (kt ), (Transition Law) k0 given.

where • ct : the consumption in period t, • kt : the capital stock available for period t production, • f (·): a real-valued production function. In (8.1), kt is the state variable at time t, and (ct , kt+1 ) are the control variables. If T is finite, the above problem can be treated as a static optimization problem. Hence, we can apply the Kuhn-Tucker Theorem. Let sup U (x) x∈Π(x0 )

where x = (x0 , x1 , ..., xT +1 )0 ∈ R2(T +2) with x0 = (c0 , k0 )0 , x1 = (c1 , k1 )0 , ... , xT +1 = (cT +1 , kT +1 )0 , and

U (x) =

T X t=0

213

β t u(ct ).

(8.2)

214

CHAPTER 8. DYNAMIC OPTIMIZATION

The constraint set Π(x0 ) is         

         Π(x0 ) = {x such that }.          cT + kT +1 ≤ f (kT )        kT +1 given k0 ≥ 0 given c0 + k1 ≤ f (k0 ) c1 + k2 ≤ f (k1 ) .. .

These are T + 3 constraints! By the strict monotonicity of u an optimal plan will satisfy ct + kt+1 = f (kt ) with equality for all t. Hence, ct = f (kt ) − kt+1 . Plugging into (8.2) we can eliminate ct such that (8.1) becomes the unconstrained optimization problem T X max β t u(f (kt ) − kt+1 ). +1 {kt }T t=1 t=0

Because of the additive structure, the relevant portion of the objective function is U (x) = u(f (k0 ) − k1 )... + β t u(f (kt ) − kt+1 ) + β t+1 u(f (kt+1 ) − kt+2 ) + · · · β T u(f (kT ) − kT +1 ). The FOC w.r.t. kt+1 gives: ∗ ∗ ∗ ∗ )(−1) + β t+1 u0 (f (kt+1 ) − kt+2 )f 0 (kt+1 )=0 β t u0 (f (kt∗ ) − kt+1 {z } | {z } | =c∗t

=c∗t+1

Hence, the solution is characterized by the well-known Euler equation u0 (c∗t ) βu0 (c∗ ) | {zt+1 }

=

∗ f 0 (kt+1 ) | {z }

Marginal Product of Capital

Marginal Rate of Substitution

Note that the Euler equation is a necessary but not sufficient condition for an optimum. Problems like the one considered in the previous example can also be solved in a different way that exploits the special structure of the intertemporal problem. This method is called dynamic programming.

8.2. DYNAMIC PROGRAMMING: PROBLEM FORMULATION

8.2

215

Dynamic Programming: Problem Formulation

In general, the class of dynamic optimization problems we are going to consider can be represented by V ∗ (x0 ) = s.t. x0 xt+1

∞ X

sup {xt+1 }∞ t=0

β t F (xt , xt+1 )

t=0

∈ X given ∈ Γ(xt ) for all t

where • V ∗ (x0 ) is the “true” value function given initial state x0 . • x ∈ X is a vector of state variables. • F is a time-invariant function of the current and future states, whose discounted sum describes the objective function. • Γ is a time-invariant correspondence describing feasibility. Γ(xt ) is the constraint set at time t. • β ∈ (0, 1) is the discount factor. From these primitives, the problem can be rewritten in a more compact way: V ∗ (x0 ) =

sup U (x)

(DP)

x∈Π(x0 )

with • x = {xt+1 }∞ t=0 is the sequence of states/controls. • Π(x0 ) is the set of feasible sequences. That is, for any sequence x with initial value x0 , define the set Π(x0 ) = {{xt+1 }∞ t=0 such that xt+1 ∈ Γ(xt ) for all t}, where Γ(xt ) is the constraint set at t. • If x ∈ Π(x0 ) ⊆ R∞ , then x is a feasible plan. • For any sequence x we define the intertemporal payoff function as follows U (x) ≡

∞ X

β t F (xt , xt+1 )

t=0

We now impose the following assumptions on the primitives Γ and U . Assumption 1 Γ(x) is non-empty for all x ∈ X. Note that if Γ(x) is non-empty for all x ∈ X, then Π(x0 ) is non-empty for each x0 ∈ X.

216

CHAPTER 8. DYNAMIC OPTIMIZATION

Assumption 2 For all x0 ∈ X and x ∈ Π(x0 ), the limit lim Un (x) exists although it may be infinity, where n→∞

Un (x) ≡

n X

β t F (xt , xt+1 ).

t=0

Assumption 2 says that we allow the problem to have an unbounded value, yet we will not consider infinite sums when the series has an indeterminate character.

8.3. BELLMAN’S PRINCIPLE OF OPTIMALITY

8.3

217

Bellman’s Principle of Optimality

Richard Bellman (1957) stated his Principle of Optimality as follows: “An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.” Bellman (1957, p83). The Bellman Principle of Optimality (BPO) is essentially based on the following two properties of any real-valued function. Lemma 8 Let f be a real-valued function defined on X × Y . • Then ( sup

f (x, y) =

sup x∈X

(x,y)∈X×Y

=

) supf (x, y)

y∈Y ∗

sup f (x) x∈X

where f ∗ (x) = supf (x, y). y∈Y

• Let a, b ∈ R be independent of x. Then   a + bsup f (x), if b > 0, x∈X sup a + bf (x) =  a + b inf f (x), if b < 0. x∈X x∈X

Lemma 8 says that in multidimensional optimization problems, an alternative to solving for all choice variables simultaneously is to split the multidimensional problem into one- or lowerdimensional sub-problems. See Figure 8.1 for an illustration. For instance, let f (x, y) be a real-valued function of two dimensions (x, y). The standard way is to jointly optimize over (x, y) (green line). Alternatively, one can first maximize f (x, y) in the y-dimension given each x. Denote the solution by y ∗ (x). Second, one maximizes f (x, y ∗ (x)) in the x-dimension (red line). The result can be extended to the case where the choice variables (x, y) are restricted to a constraint set D ⊂ X × Y . In this case, one must be able to decompose the feasibility set in an appropriate way. In these notes, we will always analyze models where this decomposition can be done. Lemma 8 can be used to prove Bellman’s Principle of Optimality. Theorem 64 (Bellman’s Principle of Optimality) If x = (x0 , x1 , x2 , ...) is optimal in (DP), then x0 = (x1 , x2 , x3 ...) is also optimal in (DP).

218

CHAPTER 8. DYNAMIC OPTIMIZATION

Figure 8.1: Multidimensional Optimization.

8.4

The Bellman Equation: Dynamic Programming

The special structure of the dynamic programming problem V ∗ (x0 ) =

sup U (x)

(DP)

x∈Π(x0 )

allows the application of the BPO. Let x = (x0 , x1 , x2 , ...), x0 = (x1 , x2 , x3 ...). yourself that

First, convince

U (x) = F (x0 , x1 ) + βU (x0 ). The value function V ∗ in (DP) can now be written as V ∗ (x0 ) =

sup F (x0 , x1 ) + βU (x0 ) (by (8.3)) x∈Π(x0 )

=

sup

sup

x1 ∈Γ(x0 ) x0 ∈Π(x1 )

=

F (x0 , x1 ) + βU (x0 ) (by Lemma 8, Part 1) U (x0 )] (by Lemma 8,Part 2)

sup [F (x0 , x1 ) + β sup x1 ∈Γ(x0 )

x0 ∈Π(x

|

1)

{z

=V ∗ (x1 )

}

(8.3)

8.4. THE BELLMAN EQUATION: DYNAMIC PROGRAMMING

219

Hence, the value function V ∗ in (DP) satisfies the function equation V (x0 ) =

sup [F (x0 , x1 ) + βV (x1 )].

(FE)

x1 ∈Γ(x0 )

The function equation (FE) is called the Bellman equation. The BPO principle is thus equivalent to the possibility of writing the value function V ∗ for the infinite-dimensional problem (DP) in the form of a functional equation, which may be simpler to study and solve. Theorem 65 (Necessary Conditions for Optimality in (DP): The Bellman Equation) Assume assumptions 1 and 2 hold, then the following holds: 1. The solution to (DP), V ∗ (x0 ) = supx∈Π(x0 ) U (x), satisfies the Bellman equation (FE) for V (x0 ) = V ∗ (x0 ). 2. If x∗ ∈ Π(x0 ) is feasible and solves (DP), then x∗ solves the Bellman equation (FE). That is V (x∗t ) = F (x∗t , x∗t+1 ) + βV (x∗t+1 ) ∀t = 0, 1, 2, .... When is the Bellman Equation (FE) also sufficient for finding the solution to the dynamic program (DP)? • If T is finite, then the Bellman Equation (FE) is necessary and sufficient for (DP). The idea is that the BPO states that an optimal path is such that the agent does not have incentives to deviate for one period from her maximizing strategy and then revert to the optimal policy. By induction on this principle, one can show that it is never optimal to deviate for finitely many periods. But the BPO does not say anything about infinite deviations, that is, deviations that never revert back to the optimizing behavior. • In the general case T ≤ ∞, we need to impose so-called transversality conditions. In order to be able to apply the BPO for identifying optimal plans using (FE), one must induce some additional structure on the problem so that optimality of infinite deviations is precluded. In terms of the objective function U (x) one typically requires the so-called continuity-at-infinity. Theorem 66 (Sufficient Conditions for Optimality in (DP): Transversality Conditions) 1. Let a feasible plan x∗ ∈ Π(x0 ) that satisfies lim sup β t V ∗ (x∗t ) ≤ 0.

t→∞

Then x∗ attains V ∗ (x0 ). 2. If V is a solution to the Bellman equation (FE) and V satisfies lim β t V (xt ) = 0 for all x ∈ Π(x0 ), ∀x0 ∈ X,

t→∞

(TC)

then V is a solution to (DP), i.e., V (x0 ) = V ∗ (x0 ). The theorem says that if the functional equation (FE) has multiple solutions, the true value function V ∗ is the solution to (FE) that is bounded in the sense of (TC).

220

8.5

CHAPTER 8. DYNAMIC OPTIMIZATION

Solution Methods

There are three main solution methods for solving dynamic programming problems. 1. Method of Undetermined Coefficients:

(Guess & Verify Analytical Solutions)

(a) Verify that the solution V to the Bellman equation (FE) is a solution V ∗ to (DP). (b) Guess the function form of V up to some undetermined coefficients. (c) Determine the undetermined coefficients and verify the function form assumption. 2. Value Function Iteration: (For Numerical Solutions) Construct a sequence of value functions Vj , j = 0, 1, ..., and determine the associated policy functions gj , j = 0, 1, .... (a) Start with V0 = 0. (b) Create the sequence Vj+1 , gj , j = 0, 1, ..., by iterating on the equation Vj+1 (x) = max F (x, x0 ) + βVj (x0 ) for all x ∈ X, 0 x

gj (x) = arg max F (x, x0 ) + βVj (x0 ) for all x ∈ X. 0 x

(c) Repeating until Vj converges. 3. Policy Function Iteration: (For Numerical Solutions) (a) Pick a feasible policy g0 (x). (b) Compute the value Vj (x) associated with using gj (x) forever: Vj (x) =

∞ X

β t F (xt , gj (xt )).

t=0

(c) TakeVj (x), ∀x, and generate a new policy for gj+1 (x), where gj+1 (x) = arg max F (x, x0 ) + βVj (x0 ). 0 x

(d) Iterate until gj converges. Example 153 (One-Sector Growth Model) Consider the intertemporal optimization problem max∞

{ct ,kt }t=0

∞ X

β t u(ct )

t=0

s.t. kt+1 + ct = f (kt ), k0 ≥ 0 given. Assume u(c) = ln c, f (k) = Ak α , with A > 0, α ∈ (0, 1), β ∈ (0, 1). To solve this problem, there are three steps to follow:

8.5. SOLUTION METHODS

221

1. Set up the Bellman Equation: V (kt ) = =

maxu(f (kt ) − kt+1 ) + βV (kt+1 ), ∀t, kt+1

max ln(Aktα − kt+1 ) + βV (kt+1 ), kt+1

(8.4)

2. Guess the functional form for the value function V (·): V (k) = E + F ln k, where E and F are undetermined constants. 3. Verify the conjecture: The FOC to the rhs of (8.4) is −1 + βV 0 (kt+1 ) = 0 − kt+1 −1 F +β = 0 using the conjecture. α Akt − kt+1 kt+1

FOC:

Aktα ⇒

Solving for the choice variable kt+1 in terms of the state variable kt gives the policy function g(kt ) by βF kt+1 = g(kt ) = Aktα . 1 + βF Yet g(k) still depends on F . Plug g(k) into the Bellman equation to obtain: V (kt ) = ln(Aktα − g(kt )) + βV (g(kt ))   βF βF α α α = ln(Akt − Akt ) + β E + F ln( Akt ) 1 + βF 1 + βF 1 βF = ln( A) + βE + βF ln( A) + (1 + βF )α ln kt | {z } 1 + βF 1 + βF | {z } =F =E

The conjecture is verified for E = F

=

ln[A(1 − αβ)] +

αβ 1−αβ

1−β

ln Aβα

,

α . 1 − αβ

As a result, the optimal policy function is g(kt ) = βαAktα . The stationary level of capital (obtained by setting kt+1 = kt ) is 1

k∞ = (βαA) 1−α .

222

CHAPTER 8. DYNAMIC OPTIMIZATION

8.6

Regularity of the Solution

We impose some additional assumptions. Assumption 3 Γ(x) is a non-empty, compact-valued, continuous correspondence, and x ∈ RL . F (·) is bounded and continuous, and β ∈ [0, 1). Assumption 4 Γ(x) has a closed graph. F (·) is concave. Proposition 74 (Properties of the Value Function V ∗ and the Policy Function g) Assume the previous assumptions are satisfied, then the following properties hold: • The value function V ∗ (x) is strictly concave. • The policy function g(x) is a continuous function. • If F is differentiable, then the value function V ∗ (x) is continuous differentiable and DV ∗ (x) =

∂F (x, g(x)) = F1 (x, g(x)), ∂x

for all x ∈ intX, g(x) ∈ intΓ(x). (Benveniste & Scheinkman) Last but not least an important caveat: The Bellman Principle of Optimality is applicable to the problem (DP) because of the special structure of the problem considered: • The objective function is time-additive, hence, time-separable. • The felicity function F (·) is time-invariant, and the constraint set Γ(·) is time-invariant. Hence, the problem has the Stationarity property. • There are a finite number of state variables x ∈ X ⊆ RL which the solution can be conditioned on. Hence, the problem has the Markov property. As a result, the solution to the problem is time-consistent.

Related Documents