Week 6: ‘Random Calculus’ (see also Wilmott, Chapter 4 and Neftci, Chapters 7,9)
Lecture VI.1 The “Size” Of Random Increments The goal of the next two lectures is to learn how to navigate in a ‘stochastic sea’. Stochastic calculus is very important in the mathematical modelling of financial processes. This is because of the underlying random nature of financial markets. ‘Random’ means nothing but risky. Risk is the life-blood of business. Naturally, in option pricing risk plays a central role. Indeed, it is the desire to eliminate, or take, risk that leads to the existence of financial derivative assets. In deterministic environments, where everything can be fully predicted, there will be no need for financial derivative products. But if randomness is essential, one has to know how to deal with it in a systematic way. Prelude Consider the following situation. Suppose S(t) is the price of a security and let V(S,t) denote the price of a derivative instrument on S(t). A stockbroker will be interested in knowing dS(t), the next instant’s incremental change in the security price. On the other hand, a derivative desk needs dV(S,t), the incremental change in the price of the derivative instrument written on S(t). The main question is: How one can calculate the change in the derivative, dV(S,t), departing from estimate of dS(t)? Note that the derivative desk is not interested in how the underlying security changes, but instead, how the financial derivative responds to change in the price of the underlying asset. In other words, a “chain rule” (discussed in the previous week) needs to be utilized. If the rules of standard calculus were applicable, a market participant could simply use the formula dV =
∂V dS , ∂S
or in the partial derivative notation, dV = VS dS . But the question is: Are the rules of deterministic calculus really applicable in stochastic environment? Can, in particular, this chain rule be used as well? These questions are all relied on the definition of differentiation for stochastic variables. Now I’ll try to clarify this point. As we discussed a week earlier, standard differentiation is the limiting operation defined as
57
lim
∆→0
f (x + ∆) − f (x ) = fx , ∆
where f(x+∆) – f (x) represents the change in the function as x changes by ∆. If x represents time, then the mathematical derivative, f x, is the rate at which f(x) is changing during an infinitesimal interval, ∆. In this case, it is fair to think of time as a deterministic variable and, hence, one can use “standard” calculus to find f x. In the case of the financial derivative, V(S,t) is a function of both time and a random process S. Now suppose we want to expand V(S,t) around a known value of S, say S 0. A Taylor expansion will yield V ( S ) = V ( S 0 ) + VS (S 0 )(S − S 0 ) + 12 VSS (S 0 )( S − S 0 ) 2 + 31! VSSS ( S 0 )( S − S 0 ) 3 + ... where the dots represent all the remaining terms of the Taylor expansion. When S = S0 + δS, where δS is small, the Taylor series approximation contains only a few terms: V ( S ) − V ( S 0 ) ≈ VS ( S 0 )δ S + 12 V SS ( S 0 )δS 2 + 31! VSSS (S 0 )δ S 3 . Note that although δS is considered to be small, we do not want it to be so small that it becomes negligible. Hence, in a potential approximation of the right-hand side, we would like to keep the term VSδS. Consider the second term, VSSδS 2. If the variable S was deterministic, one could have said that the term δS2, for sufficiently small δS, is negligible. However, in the present case S is a share and, hence, a random variable . So, changes in S will also be random. Suppose these changes have zero mean. The important point is that a random variable is random, because it has a positive variance: E[δS 2] > 0. This equality means that, “on the average” the size of δS 2 is nonzero. In other words, as soon as S is a random variable, treating δS2 as if it were zero wouldl be equivalent to equating its variance to zero. This amounts to approximating the random variable S by a non-random quantity and will defeat our purpose. After all, we are trying to find the effect of a random change in S on V(S,t). Hence, as long as S is random, the right-hand side of the Taylor series approximation must keep the second-order term. On the other hand, note that while keeping the first- and second-order terms in δS is required, one can still make a reasonable argument to drop the terms, which contain the third- and higher-order powers in δS. This would not cause any inconsistency if higher-order terms are “negligible”.
58
The “Size” Of δS2 Above we argued that the term δS2 should not be dropped from the Taylor series approximation of the financial derivative, because in a stochastic world it has a nonzero “size”. Now I would like to “measure” the size of δS2. In week 4 lectures we saw that the random part of the share price movement is described by Brownian motion, which we denoted W. Thus, the size of δS2 is determined by the size of δW2. This is what we are going to estimate in this lecture. But first we define some notation we are all quite familiar with. We will consider a time interval t∈[0,T] partitioned into n intervals of equal length δt: 0=t0
n n v = E ∑ δW k = ∑ vk , k =1 k =1
where the property that δWk are uncorrelated across k is used and the expectation of cross product terms are set equal to zero. The later statement simply means that E[δWk ⋅δWl] = 0, for k ≠ l. The further discussion will be based on simple financial assumptions. Assumption 1: Uncertainty Principle 0 < A1 < v, where A1 is a constant independent on n, the number of partitions. This assumption imposes a lower bound on the volatility of security prices. It says that when the period [0,T] is divided into finer and finer subintervals, n→∞, the variance of cumulative errors v will be positive. That is, more and more frequent observations of securities prices will not eliminate all the “risk”. Clearly, most financial market participants will accept such an assumption. Uncertainty of asset
59
prices never vanishes even when one observes the markets during finer and finer intervals. This empiric market observation is consistent with the fact that the randomness of Brownian motion does not smooth out as we zoom in. Assumption 2: Stability Principle v < A2 < ∞, where A2 is independent on n. This assumption imposes an upper bound on the variance of cumulative errors and makes the volatility bounded from above. As the time axis is chopped into smaller and smaller intervals, more frequent trading is allowed. Such trading does not bring unbounded instability to the system. A large majority of market participants will agree with this assumption as well. After all, allowing for more frequent trading and having access to on-line screens does not necessarily lead to infinite volatility. Although some increase in volatility of frequent trading has been observed. Assumption 3: Risk Spread Principle vk > A3 , vmax
0 < A3 < 1 ,
where A3 is independent on n and vmax = max[vk, k=1,…,N]. That is, vmax is the variance of the asset price during the most volatile subinterval. According to this principle, uncertainty of financial markets is not concentrated in some special periods but more or less evenly spread through the whole operational time. Whenever markets are open, there exists at least some volatility. Now I want to make (as mathematicians say) a very important proposition that under assumptions 1, 2 and 3, the variance of δWk is proportional to δt. That is, E[δWk2] = σ2k⋅δt, where the coefficient proportionality,σk, is a finite constant that does not depend on δt. However, it may depend on the information at time tk-1, so it may depend on time. According to this proposition, asset prices become less volatile as δt gets smaller. Since this is a central result, we provide a proof of the proposition. The proof, actually, is very simple! Proof: We can use assumption 3 to write
60
vk > A3 vmax. Now, sum both sides over all intervals: n
v k > n ⋅ A3 ⋅ vmax . ∑ k =1 Now use assumption 2, which says that the left-hand side of the above inequality is bounded from above. That is, n
A2 > ∑ v k > n ⋅ A3 ⋅ vmax . k =1
Divide both sides by nA3: 1 A2 ⋅ > vmax . n A3 Note that n=T/δt. Then, since vmax is greater than any vk, we obtain
1 A2 ⋅ > vmax > vk , n A3 or
δt A2 ⋅ > vk . T A3
This gives an upper bound on vk that depends only on δt. We now obtain a lower bound that depends only on δt also. We already know (Assumption 1) that n
vk > A1 ∑ k =1 is true. Then, we can also write n
nvmax > ∑ vk > A1 . k= 1
Divide all terms in the last relation by n. You get vmax > A1/n, or vmax > A1 ⋅ δt/n.
61
Recall Assumption 3: vk > A3 ⋅ vmax. Combining this assumption with the previous relation, we arrive at the lower bound for vk: vk > A1A3δt/T. Put together, the upper and lower bounds produce the following inequality δt A2 δt > vk > A1 A3 . T A3 T Now divide all terms by δt: A2 vk A1 A3 > > . TA3 δ t T This is a very important result, which clearly demonstrates that the ratio vk/δt has upper and lower bounds that are constants independent of n. When we take the limit n →∞, the small time interval δt goes to zero. In order for vk/δt to remain bounded in this limit, the variance vk must be proportional to δt. Otherwise, we will get infinity. In other words, we should be able to find a coefficient σk depending on k, such that vk = E[δW k2] = σ2k⋅δt. So, we have proved the proposition! You may not believe this, but we have obtained one of the most important results of our course. We have proved that the change of the random process in time δt has the variance proportional to the time interval.
62
Lecture VI.2 The Itô Integral And Stochastic Differential Equations We have seen in the previous lectures that observed changes of asset prices can be decomposed into two components: one that is predictable given the information at that time, and another that is unpredictable. In other words, in a small time interval of length δt, we can present the share price change as S(tk) – S(tk-1) = a(S(tk-1),tk) · δt + ó(S(tk-1),tk) · δWk,
k=1,2,…,n.
In the above expression, the coefficients, a(S(tk-1),tk) and ó(S(tk-1),tk), may depend on the share price information available at the moment tk-1. Previously, we saw that for geometric Brownian motion, a(S(tk-1),tk) = ì · S(tk-1) and ó(S(tk-1), tk) = ó · S(tk-1), with constant ì and ó. Now we want to consider the most general case. As δt gets smaller, we obtain the continuous-time version valid for infinitesimal intervals, which we will write using the following notation: dS(t) = a(S(t),t) · dt + ó(S(t),t) · dW Equation 11. This is called stochastic differential equation, which for short we will call SDE. We already know that in order to calculate the share price at a future time T, we have to take integrals on both sides of eq. (11). That is, T
T
T
0
0
0
∫ dS(t ) = ∫ a(S (t ), t ) dt + ∫ σ ( S (t ), t )dW (t ) ,
where the last term on the right-hand side is an integral with respect to increments in the Wiener process W(t). The first integral on the right-hand side is taken with respect to a deterministic variable – time. Therefore, this integral can be evaluated in standard way as a Riemann integral. We have discussed the Riemann integrals during the last week. Thus, the time we have spent on deterministic calculus was not completely wasted. The second integral is with respect to a random variable – a Wiener process. This integral represents a summation of very erratic random variables, since two price shocks which are just å>0 apart from each other, that is dW(t) and dW(t+ å), are still uncorrelated. In this case, the Riemann integral is useless. Instead, this summation has to be defined in a different way as the Itô integral, which will be the main subject of this lecture. Definition I give the definition of the Ito integral using as an example a share price. I shall use the same logic as when we defined the Riemann integral. That is, we start with the
63
finite interval approximation of the share price behaviour and then we will take the continuous limit. In this case, the share price, S, is represented as the sum of the increments in S in n small time intervals: n
S (T ) = S (0) + ∑ δ S i . i =1
The increments are given by the familiar formula δSk = S(tk) – S(tk-1) = a(S(tk-1),tk) · δt + ó(S(tk-1),tk) · δW k,
k=1,2,…,n
where δWk = W(tk) – W(tk-1) is a standard Wiener process with zero mean and variance δt. 1) We will assume that the functions a(S(tk-1),tk) and ó(S(tk-1),tk) are nonanticipative, meaning exactly that they do not know about the future. We know the values of these functions only at moment tk-1. At any future time, these functions are completely unpredictable. 2) We also have to make a “technical” conjecture that the random variable ó(S(t),t) is “non-explosive”, meaning that T E ∫ σ ( S , t ) 2 dt < ∞ . 0 This conjecture is nothing but a continuous-time generalisation of assumption 2 (Stability principle) from the previous lecture. Then, the Ito integral, T
I = ∫ σ ( S (t ), t )dW (t ) , 0
is defined as the mean square limit of the following sum n
σ ( S (t k −1 ), t k )[W (t k ) − W (t k −1 ) ], ∑ k =1 as n goes to infinity (δt 0). The mean square limit means literally this 2
n lim E ∑ σ ( S (t k −1 ), t k )[W (t k ) − W (t k −1 )] − I = 0 . n →∞ k =1 If all variables in the above formula were deterministic, the Ito integral would coincide with the Riemann integral, because in this case, the expectation value of a
64
function would just coincide with the function itself. We will see in a moment that for stochastic variables, this definition gives rise to somewhat different results. According to the given definition of the Ito integral, as the number of intervals goes to infinity and the length of each time interval becomes infinitesimal, the finite sum will approach the random variable, I, presented by the Ito integral. Clearly, this definition makes sense only if such a limiting random variable exists. (Of course, it exists in all examples of financial derivatives we are considering in this course.) The assumption that ó(S(tk-1),tk) is non-anticipating turns out to be a fundamental condition for the existence of such a limit. To summarize, we can spot three major differences between deterministic and stochastic integrations. First, the notion of limit used in stochastic integration is different. Second, the Ito integral is defined for non-anticipating functions only. And third, while integrals in standard calculus are defined using the actual “path” followed by functions, stochastic integrals are defined within stochastic equivalence, that is, via averaging. It is essentially these differences that make some rules of “random” calculus different from standard calculus. It might be helpful to you, if we discussed an example of mean square convergence in defining the Ito integral. An Expository Example The Ito integral is a limit. It is the mean square limit of a certain finite sum. Thus, in order for the Ito integral to exist, some appropriate sums must converge. A mathematician would say: Given proper conditions, one can show that Ito sums converge and the corresponding Ito integral exists. Yet it is, in general, not possible to calculate explicitly the mean square limit. This can be down only in some special cases. We say: Well, it is fine with us. Give us then an example. Then, the mathematician will write the following formula T
∫W (t ) dW (t ) , 0
where it is known that W(0)=0 (remember the definition of the Wiener process in week 4 lectures?). If W(t) was a deterministic variable, one could easily calculate this integral using the finite Riemann sums as we did it in the previous week (we just used a different notation S(t) instead of W(t)). The Riemann integral is given by T
∫ W (t )dW (t ) =
1 2
W 2 (T ) .
0
Now, if W(t) is a Wiener process, the same approach cannot be used.
65
First of all, the Riemann sums must be defined with respect to the “lower rectangles” only: n
I n = ∑W (tk −1 )[W (t k ) − W (t k −1 )]. k =1
In other words, the first W(t) has to be evaluated at time tk-1, because otherwise these terms will fail to be non-anticipating. The term W(tk) will be known as of time tk, and will be correlated with the increments [W (tk) – W(tk-1)] that we do not want to happen. As we have seen it a week before, in the case of the Riemann integral, one could use either type of the sum and still get the same answer in the end. In the case of stochastic integration, results will change depending on whether one used W(tk) or W(tk-1). It is an absolutely crucial condition of the Ito integral that the integrands be non-anticipating. In Appendix I consider one example showing that any other definition of the Ito integral leads to contradictions. Second, In is now a random variable and simple limits cannot be taken, simply because each time we evaluate In we will get a different answer. In taking the limit of In we have to use a probability approach. As mentioned earlier, the Ito integral uses the mean square limit. Thus, our task is to determine a limiting random variable I such that lim E [I n − I ]2 = 0 . n →∞
Or, equivalently, 2
n lim E ∑ W (t k −1 )δW k − I = 0 , n →∞ k =1 where for simplicity we let äW k = W(tk) – W(tk-1). We will calculate the limiting random variable I step by step in order to clarify the meaning of Ito integral as a mean square limit of a random sum. The first step is pretty much what we did for the deterministic integral, that is, to manipulate the terms inside the In. But this time, we will do a slightly different manipulation. We begin by noting that for any a and b we have (a + b)2 = a2 + 2ab + b2, or ab = (½) [(a + b)2 – a2 – b2].
66
If we call a = W(tk-1) and b = äWk, then we can rewrite In in the following form In =
[(W (t k −1 ) + δWk ) 2 −W (t k −1 ) 2 − δWk 2 ] . ∑ k =1 n
1 2
The thing to notice is that W(tk-1) + äWk = W(tk), which gives: n n n I n = 12 ∑W 2 (t k ) − ∑W 2 (t k −1 ) − ∑δ W 2 k . k =1 k =1 k =1
Like in the deterministic example last week, the first and second summations are the same except for the very first and last elements. Cancelling similar terms and noting that W(0)=0 by definition, we arrive at: n I n = 12 W 2 (T ) − ∑δ W 2 k . k =1
Note that W(T) is independent on n. In the deterministic calculus, we would have dropped the last term with äW 2k and got just the standard result for the integral. However, we have proved in the previous lecture that the terms äW2k are not negligible when W is a random process. Exactly these terms change the definition of the Ito integral. Now we come to the mean square limit lim E [I n − I ]2 = 0 n →∞
Based on what we have already obtained, we can make a guess about how the random variable I looks like. Indeed, we can write I = (½)W 2(T) - A, where A is still an unknown random variable. The mean square limit is now
67
2
n lim E ∑ δW 2 k − A = 0 . n →∞ k =1 In order to find the random variable A, we first calculate a few other things. The reason for doing this will become clear in a moment. Let us calculate the expectation: n E ∑ δW 2 k . k= 1 Taking expectations in a straightforward way, we get:
[
]
n n n E ∑ δ W 2 k = ∑ E δW 2 k = ∑ δ t k = T . k =1 k =1 k =1
Now let us evaluate the following expectation 2
(
)(
)
n n n n n E ∑ δ W 2 k − T = E ∑ δW 4 k + 2∑∑ δW 2 k δ W 2 l + T 2 − 2T ∑ δ W 2 k . k =1 k =1 k =1 l < k k =1
We consider the components of the right-hand side individually. Realising that Wiener process increments with different subscripts are independent, we have E[äW2k · äW 2l] = E[äW 2k] · E[äW 2l] = ätk · ätl . Also we use the following results for the Wiener process: E[äW 4k] = 3 ät2k . Now if we put all ingredients together, we get 2
n n n n n E ∑ δ W 2 k − T = ∑ 3δ t 2 k + 2∑∑ δt k ⋅ δt l + T 2 − 2T ∑ δt k . k =1 k =1 k =1 l
Since all time intervals ätk have the same size, ät, we have for different sums: n
∑3δt 2 k = 3nδt 2 = k =1
n
3T 2 , n
n
2 ∑∑ δ t k ⋅δ t l = n (n − 1)δ t 2 = k= 1 l < k
n
∑δt k =1
k
(n − 1)T 2 , n
= nδ t = T .
Taking into account the above formulas, we obtain the following expression
68
2
n 3T 2 (n − 1)T 2 2T 2 E ∑ δW 2 k − T = + + T 2 − 2T 2 = . n n n k =1 This result implies that as n goes to infinity, the right-hand side of the above formula will vanish, that is, 2
n 2T 2 lim E ∑ δ W 2 k − T = lim =0. n →∞ k =1 n →∞ n n
Thus, we have shown that the mean square limit of
∑δW
2
k
is T.
k =1
Going back to the variable A, we see that A = T. This brinks us to the concluding point of our calculation of the Ito integral: T
∫W (t ) dW (t ) = 0
W 2 (T ) − T . 2
In the case of the Riemann integral, there was no additional term T! This is one example where the Ito integral can be calculated explicitly using mean square limit. I know that by this time you have been already overwhelmed with math. But let me make one final point, which is of a great importance in derivation of the BlackScholes equation. A Crucial Point It was shown that 2
n lim E ∑ δW 2 k − T = 0 . n →∞ k =1 It is interesting to convert this into integral notation. Consider the following stochastic integral T
∫ (dW (t ))
2
,
0
which can be interpreted as the sum of squared increments in W(t). If this integral exists in the Ito sense, then by definition,
69
2
T n 2 lim E ∑ δ W k − ∫ (dW (t )) 2 = 0 . n →∞ k =1 0
Thus, we can conclude that T
∫ (dW (t ))
2
=T .
0
But we also know that T
∫ dt = T. 0
Combining the above equalities together, we obtain a result that may seem a bit “unusual” to one who is used to working with standard calculus: T
T
0
0
2 ∫ (dW (t )) = ∫ dt ,
where the equality holds in the mean square sense. It is in this sense that, if W(t) represents a Wiener process, for infinitesimal dt, one can write: (dW(t))2 = dt.
In fact, in all practical calculations dealing with stochastic calculus, it is a common practice to replace the terms involving d2W by dt. The preceding discussion traces the logic behind this procedure. The equality should be interpreted in the sense of mean square limit. From this moment on you do not have to remember all details of how we got this result and you can simply use it as a rule. Appendix Like you have not had enough, I want to write even more formulas. This will be a promised example of contradictions associated with any other definition of the Ito integral, but you can skip this bit if you want. Let us consider the following stochastic process: dS(t) = W(t)dW(t),
70
which is nothing but the integrand of the integral we discussed in this lecture. The right-hand side of this process is purely random and, hence, its (conditional) expected value must vanish, that is, E[W(t)dW(t)] = 0. On the other hand, we can present the increment dS as an integral t +dt
dS =
∫ W (u) dW (u). t
Now let us try to use a “wrong” definition of the Ito integral by using a rectangular approximation at “midpoints” of subintervals. We get t + dt
W (t + dt) + W (t ) (W (t + dt) − W (t ) ). 2
∫t W (u) dW (u) ≅
Apply the (conditional) expectation operator Et[.] to the right-hand side:
[
]
W (t + dt) + W (t ) Et (W (t + dt) − W (t )) = Et 12 (W 2 (t + dt ) − W 2 (t ) = 12 dt. 2 Clearly, dt 0. This means that the approximating sum has a (conditional) expectation that is not equal to zero. It is predictable . Obviously, this contradicts the claim that the integral on the left-hand side represents a random term. Thus, in contrast with the Rie mann integral, the Ito integral has to be defined only for non-anticipating functions, that is, using the “lower rectangle” approximation.
Try To Answer The Following Questions 1) Can you explain those financial assumptions, which give rise to the formula Var[äW] = ät. 2) Can you name reasons why stochastic calculus is so important in financial calculations? Why cannot we just use standard calculus? 3) Can you explain the logic behind the rule d2W = dt, which is routinely used in financial calculations?
71
Week 7: Reading Week
72
73