Stochastic Calculus Notes 1/5

  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Stochastic Calculus Notes 1/5 as PDF for free.

More details

  • Words: 7,759
  • Pages: 25
Stochastic Calculus Laura Ballotta MSc Financial Mathematics October 2008

0 c

Laura Ballotta - Do not reproduce without permission.

2

Table of Contents 1. Review of Measure Theory and Probability Theory (a) The basic framework: the probability space (b) Random variables (c) Conditional expectation (d) Change of measure 2. Stochastic processes (a) Some introductory definitions (b) Classes of processes 3. Brownian motions (a) The martingale property (b) Construction of a Brownian motion (c) The variation process of a Brownian motion (d) The reflection principle and functionals of a Brownian motion (e) Correlated Brownian motions (f) Simulating trajectories of the Brownian motion - part 1 4. Itˆo Integrals and Itˆo Calculus (a) Motivation (b) The construction of the Itˆo integral (c) Itˆo processes and stochastic calculus (d) Stochastic differential equations (e) Steady-state distribution (f) The Brownian bridge and stratified Monte Carlo 5. The Change of Measure for Brownian Motions (a) Change of probability measure: the martingale problem (b) PDE detour (c) Feynman-Kac representation (d) Martingale representation theorem

REFERENCES

3

References [1] Grimmett, G. and D Stirzaker (2003). Probability and Random Processes. Oxford University Press. [2] Mikosch, T. (2004). Elementary Stochastic Calculus, with Finance in View. World Scientific Publishing Co Pte Ltd. [3] Shreve, S. (2004). Stochastic Calculus for Finance II - Continuous-time models. Springer Finance.

4

REFERENCES

Introduction This set of lecture notes will take you through the theory of Brownian motions and stochastic calculus which is required for a sound understanding of modern option pricing theory and modelling of the term structure of interest rates. As the theory of stochastic processes has its own special “language”, the first chapter is devoted to introducing this new notation but also to some revision of the basic concepts in probability theory required in the following chapter. Particular attention is given to the conditional expectation operator which is the building block of modern mathematical finance. This will allow us to introduce the idea of martingale, which underpins the theory of contingent claim pricing. Once these concepts are clear and well understood, we will devote the rest of the module to the Brownian motion and the rules of calculus that go with it. These will be our main “tools” for financial applications, which are explored in great details in the module “Mathematical Models for Financial Derivatives”. As the Brownian motion by construction links us to a prespecified distribution of the increments of the process, we will introduce very briefly a more general class of processes which can be used in the context of mathematical finance. However, the full investigation of these processes and their applications will be the focus of the module “Advanced Stochastic Modelling in Finance” which runs in Term 2. The material in this booklet covers the entire module; however it is far from being exhaustive and students are strongly recommended to do some self-reading. Some references have been provided in the previous page. Each chapter contains a number of sample exam questions, some in the form of solved examples, others in form of exercises for you to practice. Solutions to these exercises will be posted on CitySpace at some point before the end of term, together with the solutions to the exam papers that you will find in the very last chapter of this booklet. Needless to say that waiting for these solutions to become available before attempting the exercises on your own will not help you much in preparing for the exam itself. You need to test yourself first!

5

1

Review of Measure Theory and Probability Theory

1.1

The basic framework: the probability space

Imagine a random experiment like the toss of a coin or the prices of securities traded in the market in the next period of time. Imagine that we want to explore the features of this random experiment in order to make appropriate and informed decisions. These features could be: the expected price of the security tomorrow, or its volatility; the characteristics of the tails of the price distribution (if for example you need to calculate some risk measure such as VaR, or shortfall expectation). In order to be able to do all this, we need appropriate tools describing the random experiment in such a way that we can extract all this information, i.e. we need a mathematical model of the random experiment. This is represented by the so-called probability space. Definition 1 (Probability space) We denote the probability space by the triplet Θ := (Ω, F , P) . A probability space can be considered as a mathematical model of a random experiment. This definition is telling us that the probability space is made up of three building blocks, which we are going to explore one by one. The first piece of the probability space is Ω, which represents our sample space, i.e. the set of all possible outcomes of random experiment. Example 1 Let the random experiment be defined as: choose a number from the unit interval [0, 1]. Then Ω = {ω : 0 ≤ ω ≤ 1} = [0, 1]. Example 2 Assume now that the random experiment you are interested into is the evolution of a stock price over an infinite time horizon, when only 2 states of nature can occur, i.e. up or down. Then Ω = the set of all infinite sequences of ups and downs = {ω : ω1 ω2 ω3 ...} , where ωn is the result at the n-th period. The second piece you need in order to have a probability space is F which is called σ-algebra. The σ-algebra of a random experiment can be interpreted as the collection of all possible histories of the random experiment itself. Formally, it is defined as follows. Definition 2 (σ-algebra) Given a set Ω, a collection F of subsets of Ω is a σ-algebra if: 1. ∅ ∈ F 2. A ∈ F implies Ac ∈ F 0

c Laura Ballotta - Do not reproduce without permission.

6

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY 3. {Am } ∈ F implies

Example 3

∞ S

m=1

Am ∈ F (infinite union).

1. F = {∅, Ω} is a σ-algebra

2. Consider some event A ⊂ Ω. Then the σ-algebra generated by A is F = {∅, Ω, A,Ac }. 3. Consider the sample space defined above for the evolution of the stock price in a 2-state economy, i.e. Ω = the set of infinite sequences of ups and downs, and define AU = {ω : ω1 = U} AD = {ω : ω1 = D} . The σ-algebra generated by these two sets is F (1) = {∅, Ω, AU ,AD } . Now consider the sets AU U AU D ADU ADD

= = = =

{ω {ω {ω {ω

: ω1 : ω1 : ω1 : ω1

= U, ω2 = U} = U, ω2 = D} = D, ω2 = U} = D, ω2 = D} .

Then F (2) = {∅, Ω, AU U ,AU D , ADU ,ADD , AcU U ,AcU D , AcDU ,AcDD , AU ,AD , o [ [ [ [ AU U ADU , AU U ADD , ADU AU D , AU D ADD

is the corresponding σ-algebra.

Example 4 The Borel σ-algebra B on R is the σ-algebra generated by open subsets of R. Every σ-algebra has a set of properties that will be useful in the future. Theorem 3 The σ-algebra has the following properties: 1. Ω ∈ F . 2. {Am } ∈ F implies

∞ T

m=1

Am ∈ F .

7

1.1 The basic framework: the probability space

Proof. 1) ∅ ∈ F by definition, hence ∅c = Ω ∈ F by definition as well, (apply properties 1 and 2 from the previous definition). ∞ S 2) By assumption: {Am } ∈ F ; hence Acm ∈ F which implies that Acm ∈ F . By the m=1

law of De Morgan (b)1 :

∞ [

Acm

=

m=1

therefore

∞ \

Am

m=1 ∞ \

m=1

Am

!c

!c

,

∈ F.

 ∞ c c T From the definition of σ-algebra, it follows that Am ∈ F and consequently m=1

∩∞ m=1 Am ∈ F . The last piece of our probability space is represented by the symbol P. This is called probability measure, and you can consider it as a sort of “metrics”, that measures the likelihood of a specific event or story of the random experiment. Definition 4 A probability measure P is a set function P : F → [0, 1] such that: 1. P (Ω) = 1 S P∞ 2. For any sequence of disjoint events {Am } , P ( ∞ m Am ) = m=1 P (Am ). Based on this definition, you can show that

P (∅) = 0;  [  P A B = P (A) + P (B) ; P (Ac ) = 1 − P (A) .

Moreover, we Tcan define independent events: two events, A and B, are independent if and only if P (A B) = P (A) P (B).

Example 5 Consider the previous example of the evolution of the stock price over an infinite time horizon, so that Ω = {ω : ω1 ω2 ω3 ...}, and AU = {ω : ω1 = U}, AD = {ω : ω1 = D}. Assume that the different up/down movements at each time step are independent, and let P (AU ) = p; P (AD ) = q = 1 − p. 1

c

c

Proposition (Law of De Morgan) (a) (A ∪ B) = Ac ∩B c . More in general: (∪m Am ) = ∩m Acm . c c (b) (A ∩ B) = Ac ∪ B c . Generalising: (∩Am ) = ∪Acm . ∞ c Proof. (a) Assume x ∈ ∩m=1 Am . Then x ∈ Acm ∀m. Hence x ∈ / Am ∀m, which implies x ∈ / c ∞ ∪m=1 Am . Therefore x ∈ (∪∞ m=1 Am ) . c c / Am for the same m. Therefore (b) Assume x ∈ ∪∞ m=1 Am ; then x ∈ Am for some m. Hence x ∈ c ∞ ∞ x∈ / ∩m=1 Am and hence x ∈ (∩m=1 Am ) . The other direction of the statement can be proved in a similar fashion.

8

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY

Then P (AU U ) = p2 ; P (AU D ) = P (ADU ) = pq; P (ADD ) = q 2 . Further, P (AcU U ) = 1 − p2 ; similarly, you can calculate the probability of each other set in F (2) . Moreover, if AU U U = {ω : ω1 = U, ω2 = U, ω3 = U}, you can calculate that P (AU U U ) = p3 . And so on. Hence, in the limit you can conclude that the probability of the sequence UUU... is zero. The same applies for example to the sequence UDUD...; in fact this sequence is the intersection of the sequences U, UD, UDU, .... From this example, we can conclude that every single sequence in Ω has probability zero. In the previous example, we have shown that P (every movement is up) = 0; this implies that this event is sure not to happen. Similarly, since the above is true, we are sure to get at least one down movement in the sequence, although we do not know exactly when in the sequence. Because of this fact, and the fact that the infinite sequence UUU... is in the sample space (which means that still is a possible outcome), mathematicians have come up with a somehow strange way of saying: we will get at least one down movement almost surely. Definition 5 Let (Ω, F , P) be a probability space. If A ⊂ F is such that P (A) = 1, we say that the event A occurs almost surely (a.s.). Now, in order to introduce the next definition, consider the following, maybe a little silly, example. Assume that you want to measure the length of a room, and assume you express this measure in meters and centimeters. It turns out that the room is 4.30m. long. Now assume that you want to change the reference system and express the length of the room in terms of feet and inches. Then, the room is 14ft. long. But in the process of switching from one reference system to the other, the room did not change: it did not shrink; it did not expand. The same applies to events and probability measures. The idea is given in the following. Definition 6 (Absolutely continuous/equivalent probability measure) Given two probability measures P and P∗ defined on the same σ-algebra F , then: i) P is absolutely continuous with respect to P∗ , i.e. P << P∗ , if P (A) = 0 whenever, P∗ (A) = 0∀A ∈ F . ii) If P << P∗ and also P∗ << P, then P ∼ P∗ , i.e. P and P∗ are equivalent measures. Thus, for P ∼ P∗ the following are equivalent: • P (A) = 0 ⇔ P∗ (A) = 0 (same null sets)

• P (A) = 1 ⇔ P∗ (A) = 1 (same a.s. sets)

9

1.1 The basic framework: the probability space • P (A) > 0 ⇔ P∗ (A) > 0 (same sets of positive measures)

Example 6 Consider a closed interval [a, b], for 0 ≤ a ≤ b ≤ 1 and consider the experiment of choosing a number from this interval. Define the following P (the number chosen is in [a, b]) = P [a, b] := b − a. But you can also define a different metrics P∗ , according to which P∗ (the number chosen is in [a, b]) = P∗ [a, b] := b2 − a2 . As there is a conversion factor that helps you to switch between meters and feet, so that 4.30m = 14ft, there is also a conversion factor between probability measures. However, this conversion factor depends on few objects that we have not met yet. Therefore, the discussion of this last feature is postponed to the end of this unit. Exercise 1 Let A and B belong to some σ-algebra F . Show that F contains the sets T A B, A\B, and A∆B, where ∆ denotes the symmetric difference operator, i.e. A∆B = {x : x ∈ A, x ∈ / B or x ∈ / A, x ∈ B} .

Exercise 2 Show that for every function f : Ω −→ R the following hold: 1. f −1 ( 2. f −1 (

S

T

n

An ) =

n

An ) =

S

T

n

f −1 (An );

n

f −1 (An );

 C 3. f −1 AC = (f −1 (A))

for any subsets An , A of R.

Exercise T 3 Let F be a σ-algebra of subsets of Ω and suppose that B ∈ F . Show that G = {A B : A ∈ F } is a σ-algebra of subsets of B. Exercise 4 Let P be a probability measure on F . Show that P has the following properties: 1. for any A, B ∈ F such that A

T

B = ∅, P (A

S

B) = P (A) + P (B);

2. for any A, B ∈ F such that A ⊂ B, P (A) ≤ PS(B) [Hint: use the fact that for any two sets A and B such that A ⊂ B, B = A (B\A) , where we define B\A := {x : x ∈ B, x ∈ / A}, (difference operator for sets] 3. for any A, B ∈ F such that A ⊂ B, P (B\A) = P (B) − P (A)

10

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY

1.2

Random variables

So far, we have considered random events, like and up or down movement in the stock price over the next period of time, and the likelihood of such events to occur, as described by the probability measure. The next step in which you might be interested is to “quantify” the outcome of the random event, for example you might want to know how much the stock price is going to change if an up or down movement is going to occur in the next time period. In order to do this, you need the idea of random variable. Definition 7 (Random variable) Let (Ω, F , P) be a probability space. A random variable X is a function X : Ω → R such that {ω ∈ Ω |X (ω) ≤ x } ∈ F ∀x ∈ R. Note that if B is any subset of the Borel σ-algebra B, i.e. B is a set of the form B = (−∞, x] ∀x ∈ R, then Definition 7 implies that X −1 (B) ∈ F ∀x ∈ R. In other words, any random variable is a measurable function2 , i.e. a numerical quantity whose value is determined by the random experiment of choosing some ω ∈ Ω. Example 7 Consider once again the random experiment of the evolution of the stock price over an infinite time horizon in a 2-state economy, described in Example 3. Let us define the stock prices by the formulae: S0 (ω) = 4;  8 if ω1 = up S1 (ω) = 2 if ω1 = down   16 if ω1 = ω2 = up 4 if ω1 6= ω2 S2 (ω) =  1 if ω1 = ω2 = down.

All of these are random variables, assigning a numerical value to each sequence of up and down movements in the stock price at each time period. Example 5 tells us how to calculate the probability that the random variable S takes any of these values; for example P (S1 (ω) = 8) = P (AU ) = p;   [ P (S2 (ω) = 4) = P ADU AU D = 2pq.

The above Example shows that we can associate to any random variable another function measuring the likelihood of the outcomes. This is what we call the law of X. Precisely, by law of X we mean a probability measure on (R, B), LX : B → [0, 1] such that LX (B) = P (X ∈ B) ∀B ⊂ B. 2

Definition (Measurable function) Let F be a σ-algebra on Ω and f : Ω → R. For A ∈ R let f −1 (A) = {ω ∈ Ω |f (ω) ∈ A } ;

then, f is called F -measurable if f −1 (E) ∈ F ∀E ∈ B, where f −1 (E) is called the pre-image of E.

11

1.2 Random variables

In general, we prefer to speak in terms of distribution of a random variable; this is a function FX : R → [0, 1] defined as FX (a) = P (X ≤ a) = P (ω : X (ω) ≤ a) . This is the law of X for any set B of the form B = (−∞, a], i.e. FX (a) = LX (−∞, a]. In some special cases, we can describe the distribution function of a random variable X in even more details. The first case is the case of a discrete random variable, like the one introduced in Example 7, which assigns lumps of mass to events. For this random variable, we can express the distribution function as X FX (a) = P (X ≤ a) = pX (x) , X≤a

where pX (x) is the probability mass function of X. If instead the random variable X spreads the mass continuously over the real line, then we have a continuous random variable and Z a FX (a) = P (X ≤ a) = fX (x) dx, (1) −∞

where f (x) denotes the density function of X.

Exercise 5 Let X be a random variable. Show that the distribution FX of X defined by  FX (A) = P (X ∈ A) = P X −1 (A) , A ∈ B (R) ,

is a probability measure on the σ-algebra B (R).

Remark 1 (A matter of notation) From equation (1), we see that we could write the density function as dFX dP (ω) fX (x) = = ∀x ∈ R. dx dx The expectation E of a random variable X on (Ω, F , P) is then defined by: E [X] =

Z

X (ω) dP (ω) =

Z∞

xdFX (x) .

−∞



The expectation returns the mean of the distribution; you might be interested in the dispersion around the mean, this feature is described by the variance of a random variable. Further features that characterize the distribution of a random variable are the skewness (degree of asymmetry) and the kurtosis (behaviour of the tails). These features are described by the moments (from the mean) of a random variable which can be recovered via the moment generating function (MGF)   MX (k) = E ekX =

Z∞

−∞

ekx dFX (x) .

12

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY

Example 8 Few (and very important, as we will use them throughout the entire year) examples of random variables: 1. The Poisson random variable is an example of discrete random variable. More precisely, a Poisson random variable N ∼ P oi(λ), with rate λ has probability mass pN (n) =

e−λ λn n!

from which it follows that E(N) = λ = Var(N); MN (k) = eλ(e

).

k −1

2. The normal (or Gaussian) random variable X ∼ N (µ, σ 2 ) is a continuous random variable defined by the density function (x−µ)2

e− 2σ2 fX (x) = √ . σ 2π You can easily show that E(X) = µ; Var(X) = σ 2 ; MX (k) = ekµ+

k2 σ 2 2

3. Assume X ∼ Γ (α, λ), α > 0. Then X is a non-negative random variable which follows a Gamma distribution; its density function is given by f (x) =

1 λα xα−1 e−λx , Γ (α)

where Γ (α) is the Gamma function, which is defined as Z ∞ Γ (α) = xα−1 e−x dx, 0

and has the property that3 Γ (α) = Γ (α − 1) (α − 1) . This means that Γ (α) = (α − 1)! where α is a positive integer. The MGF of X is Z ∞ 1 α MX (k) = λ xα−1 e−x(λ−k) dx. Γ (α) 0 3

Why don’t you try to prove this last property... just integrate by parts.

13

1.2 Random variables Set y = x (λ − k), then 1 MX (k) = λα Γ (α)

Z



0



y λ−k

α−1

e−y dy = λ−k



λ λ−k



.

Note that if α = 1, then X follows an exponential distribution with rate λ. Using the MGF you can show that the Gamma random variable has mean µ = α/λ and variance ν = α/λ2. The parameter α is the shape parameter, whilst λ is the scale parameter. Moment generating functions suffer the disadvantage that the integrals which define them may not always be finite. Example 9 A Cauchy random variable X has density function f (x) =

1 π(1 + x2 )

x ∈ R.

Hence the MGF of X is given by MX (k) =

Z



−∞

ekx . π(1 + x2 )

This is an improper integral of the 1st kind which does not converge unless k = 0 (which of course is a nonsense...) In fact, if you perform the convergence test, you obtain that:  ekx π(1+x2 ) α=2 1 kx lim = lim e = x→∞ ( 1 )α π x→∞ x  ekx π(1+x2 ) α=2 1 kx lim = lim e = x→−∞ ( 1 )α x→−∞ π x

0 if k < 0 ∞ if k > 0, 0 if k > 0 ∞ if k < 0.

Hence, the MGF of a Cauchy random variable does not exist. Characteristic functions are another class of functions equally useful and whose finitiness is guaranteed. Definition 8 The characteristic function of X is the function φX : R → C defined by  φX (u) = E eiuX √ where i = −1. This is a common transformation and is often called the Fourier transform of the density f of X if this quantity exists. In this case Z Z iux φX (u) = e dF (x) = eiux f (x)dx.

14

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY

The characteristic function of a random variable has several nice properties. Firstly it always exists and it is finite (in L1 ): note that  φX (u) = E eiuX = E (cos (uX) + i sin (uX)) ,

hence4

q |cos (uX) + i sin (uX)| := cos (uX)2 + sin (uX)2 = 1.

Then Moreover:

  E eiuX ≤ E eiuX = 1.

1. if X and Y are independent random variables, φX+Y (u) = φX (u) φY (u) ; 2. if a, b ∈ R and Y = aX + b, then φY (u) = eiub φX (au). 1.2.1

Examples of characteristic functions

Calculations of integrals involving complex numbers are not always pleasant; usually you should know about contour integration... but for our purposes you can get away with only knowing about analytic continuation. Analytic continuation provides a way of extending the domain over which a complex function is defined. Let us start from a complex function f (like the characteristic function); this function is complex differentiable at z0 and has derivative A if and only if f (z) = f (z0 ) + A (z − z0 ) + o (z − z0 ) , ∀z ∈ C. A complex function is said to be analytic on a region D if it is complex differentiable at every point in D (i.e. has no singularities, i.e. points at which the function “blows up” or becomes degenerate). Now, let f1 and f2 be analytic T functions on domains D1 and D2 respectively, with D1 ⊂ D2 , such that f1 = f2 on D1 D2 . Then f2 is called the analytic continuation of f1 to D2 . Moreover, if it exists, the analytic continuation of f1 to D2 is unique. Consider now the MGF MX of some random variable X; we can say that the function Z ∞ MX (z) = f (x) ezx dx z∈C −∞

is the analytic continuation of MX to the complex plane, if it respects the condition above. Then, the characteristic function of X, φX , is the restriction of MX to the imaginary axis, i.e. φX (u) = MX (iu) And now, let’s calculate some characteristic functions. 4

Note that this is the complex square of the complex number z = cos (uX) + i sin (uX), and you can interpret the notation as a norm.

15

1.3 Conditional expectation 1. Let X ∼ N (0, 1). The characteristic function is Z ∞ x2 1 φX (u) = √ eiux− 2 dx. 2π −∞ Now consider the real valued function

Z ∞ x2 k2 1 MX (k) = √ ekx− 2 dx = e 2 , 2π −∞ T i.e. the MGF of X. Since R C 6=∅, then MX has analytic continuation on the complex plane given by Z ∞ z2 x2 1 z ∈ C. MX (z) = √ ezx− 2 dx = e 2 2π −∞ Therefore, by analytic continuation u2

φX (u) = MX (iu) = e− 2 . 2. Let X be a Poisson random variable with rate u. You can apply the same argument as above (i.e. analytic continuation) to show that φX (u) = MX (iu) = eλ(e

).

iu −1

3. Consider now the Gamma distribution. Analytic continuation implies that  α λ φX (u) = . λ − iu 4. Assume X is a Cauchy random variable, i.e. f (x) =

1 . π (1 + x2 )

We cannot use the analytic continuation argument because the function is not analytic (can you spot why?). Here you need to use contour integration and the residue theorem. You should obtain that φX (u) = e−|u| .

1.3

Conditional expectation

At the beginning of this Unit, we talked about the problem of setting up a mathematical model of a random experiment, in order to support our decision process. Specifically, we talked about informed decisions, and we have seen that information in the probability

16

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY

space is captured by the σ-algebra. Then, in the previous section, we have seen how to quantify a random event by using random variables. Now, consider as always that some random experiment is performed, whose outcome is some ω ∈ Ω. Imagine that we are given some information, G, about this possible outcome, not enough to know the precise value of ω, but enough to narrow down the possibilities. Then, we can use this information to estimate, although not precisely, the value of the random variable X (ω). Such an estimate is represented by the conditional expectation of X given G. In order to understand the definition of conditional expectation, we need to familiarize first with the indicator function. Precisely, we use the notation 1A for  1 if ω ∈ A 1A (ω) = 0 otherwise Hence 1A is a random variable which follows a Bernoulli distribution, taking values 1 with probability P (A), and 0 with probability P (Ac ). Hence E [1A ] = P (A). Properties of the indicator function are listed below. 1. 1A + 1AC = 1A∪Ac = 1Ω = 1; 2. 1A∩B = 1A 1B . Now, we are ready for the following. Definition 9 (Axiomatic definition-Kolmogorov) Let (Ω, F , P) be a probability space and X a random variable with E |X| < ∞. Let G be a sub σ-algebra of F . Then the random variable Y = E [X |G ] is the conditional expectation of X with respect to G if: 1. Y is G-measurable (Y ∈ G). 2. E |Y | < ∞ 3. ∀A ∈ G : E (Y 1A ) = E (X1A ) , i.e.

R

A

Y dP =

R

XdP.

A

The idea is that, if X and G are somehow connected, we can expect the information contained in G to reduce our uncertainty about X. In other words, we can better predict X with the help of G. In fact, Definition 9 is telling us that, although the estimate of X based on G is itself a random variable, the value of the estimate E [X |G ] can be determined from the information in G (property 1). Further, Y is an unbiased estimator of X (property 3 with A = Ω). Example 10 Consider once again the stock price evolution described in Example 7. Suppose you are told that the outcome of the first stock price movement is “up”. You can now use this information to estimate the value of S2 E [S2 (ω) |up] = 12p + 4.

17

1.3 Conditional expectation In this case, G = AU . Similarly, E [S2 (ω) |down ] = 3p + 1, and G = AD . Question: what is E [S2 (ω) |G = AU D ]? Theorem 10 The conditional expectation has the following properties: 1. E [E (X |G )] = E [X] , i.e. E [Y ] = E [X]. 2. If G = {∅, Ω} (smallest σ-algebra),E [X |G ] = E [X]. 3. If G = F , E [X |G ] = X. 4. If X ∈ G, E [X |G ] = X 5. If Z ∈ G, then E [ZX |G ] = ZE [X |G ] = ZY 6. Let G0 ⊂ G, E [E(X |G ) |G0 ] = E [X |G0 ] . 7. Let G0 ⊂ G, E [E (X |G0 ) |G ] = E [X |G0 ] . 8. If X is independent of G, then E [X |G ] = E [X] Proof. One by one:

1. Check point 3 in the previous definition for A = Ω (remember that Ω ∈ G ...): E [Y 1Ω ] = E [X1Ω ] but 1Ω = 1. 2. Check point 3 in the axiomatic definition. ForA = ∅, we have Z Z Y dP = XdP = 0 ∅



For A = Ω E [X1Ω ] = E [X] E [E (X) 1Ω ] = E [X] in virtue of property 1. Hence both sides return E [X]. 3. Verify the definition of conditional expectation on X for G = F : • X ∈ F because it is F -measurable by definition of random variable.

• E |X| < ∞ by assumption (axiomatic definition).

• E (Y 1A ) = E (X1A ) ∀A ∈ G.

18

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY In this case you have available the entire “history” of X. Hence you know everything and therefore there is no uncertainty left. 4. If X ∈ G, then we go back to the same situation as depicted in (3). 5. We prove this property for the simple case of an indicator function; hence, assume Z = 1B for some B ∈ G; then condition 3 in the definition of conditional expectation reads: ∀A ∈ G E (ZX1A ) = E (X1A 1B ) = E (X1A∩B ) . But ∀A ∩ B ∈ G, condition 3 implies E (X1A∩B ) = E (Y 1A∩B ) = E (Y 1A 1B ) = E (ZY 1A ) . The extension to the case of a more general random variable relies on the construction of a random variable as the limit of the sum of indicator functions. However, this is out of the grasp of this unit. 6. Let Y = E [X|G ] and Z = E [X|G0 ]. If A ∈ G0 , then E (Z1A ) = E (X1A ), but since G0 ⊂ G, A ∈ G as well, and by definition E (Y 1A ) = E (X1A ). Therefore E (Z1A ) = E (Y 1A ) ∀A ∈ G0 . 7. Let Z = E [X|G0 ], then Z ∈ G0 . Since G0 ⊂ G, it follows that Z ∈ G. Therefore E [Z|G ] = Z. 8. ∀A ∈ G : E (X1A ) = E (X) E (1A ) = E [E (X) 1A ] .

Exercise 6 Let X1 , X2 , ... be identically distributed random variables with mean µ, and let N be a random variable taking values in the non-negative integers and independent of the Xi . Let S = X1 + X2 + ... + XN . Show that E ( S| N) = µN and deduce that E (S) = µE (N). Exercise 7 We define the conditional variance of a random variable X given a σ-algebra F by V ar(X|F) = E[(X − E(X|F))2 |F]. Show that V ar (X) = E [V ar(X|F)] + V ar [E(X|F)] .

1.4

Change of measure

Let us go back to the example of measuring the length of a room and of wishing to do this using different references. If you want to convert meters in feet, you need a “bridge” between the two (1 ft = 0.30 meters). There is something equivalent to this also for probability measures and it is defined as follows.

19

1.4 Change of measure

Theorem 11 (Radon-Nikod´ ym) If P and P∗ are two probability measures on (Ω, F ) such that P ∼ P∗ , then there exists a random variable Y ∈ F such that Z ∗ P (A) = Y dP = E [Y 1A ] , ∀A ∈ F . (2) A

Y is called the Radon-Nikod´ym derivative of P∗ with respect to P and is also written as Y =

dP∗ dP

Remark 2 From the discussion in Section 1.1, it should be obvious by now that Y is not a proper derivative but more something like a likelihood ratio. Example 11 Consider Example 6. Here we defined two metrics on the interval [a, b], 0 ≤ a ≤ b ≤ 1: P (the number chosen is in [a, b]) = P [a, b] := b − a P∗ (the number chosen is in [a, b]) = P∗ [a, b] := b2 − a2 . We could be more specific and say that P [a, b] =

Z

b

dω =

P [a, b] =

Z

dP (ω) ;

[a,b]

a



Z

b

2ωdω =

Z

2ωdP (ω) .

[a,b]

a

The last equation is (2) with Y (ω) = 2ω. Exercise 8 Consider the usual probability space (Ω, F, P) and a standard normal random variable X, i.e. X ∼ N (0, 1). Define a new random variable Y as Y = X + θ, and let ˆ (A) be another probability measure on Ω, defined by P ˆ dP = Z, dP where

θ2

Z = e−θx− 2 .   ˆ . Show that Y ∼ N (0, 1) on Ω, F, P Note that for any random variable X, Z Z ∗ ∗ E [X] = XdP = XY dP = E [XY ] .

20

1 REVIEW OF MEASURE THEORY AND PROBABILITY THEORY

Theorem 12 (Bayes formula) Let P and P∗ be two equivalent probability measures on the same measurable space (Ω, F ) and let Y =

dP∗ dP

be the Radon-Nikod´ym derivative of P∗ with respect to P. Furthermore, let X be a random variable on (Ω, F , P∗ ) such that E∗ |X| < ∞ and G ∈ F a sub σ-algebra of F . Then the following generalised version of the Bayes formula holds: E∗ [X |G ] =

E [XY |G ] . E [Y |G ]

Proof. Let Z = E∗ [X |G ]. By definition: Z ∈ G, E∗ |Z| < ∞ and E∗ (Z1A ) = E∗ (X1A ) ∀A ∈ G. Hence R R ZdP∗ = XdP∗ A R A R ⇔ ZY dP = XY dP A

A

⇔ E (ZY 1A ) = E (XY 1A )

Now E (XY 1A ) = E [E (XY |G ) 1A ] ; E (ZY 1A ) = E [E (ZY |G ) 1A ] , ∀A ∈ G. Then E [(E (ZY |G ) − E (XY |G )) 1A ] , which implies that E (ZY |G ) = E (XY |G ). Since Z ∈ G, E (XY |G ) = E (Y |G ) Z. We will use this rule to link expectations calculated in a particular “universe” to the ones calculated in another universe.

1.5

Some more exercises

1. a) Formally define the components of any probability space Θ = (Ω, F ,P) . b) Let Ω = {1, 2, 3, 4, 5} and let U be the collection

U = {{1, 2, 3} , {3, 4, 5}} . Find the smallest σ-algebra F (U) generated by U.

c) Define X : Ω → R by

X (1) = X (2) = 0; X (3) = 10; X (4) = X (5) = 1. Define the condition of F -measurability for X. Check if X is measurable with repsect to F (U).

21 d) Define Y : Ω → R by Y (1) = 0; Y (2) = Y (3) = Y (4) = Y (5) = 1. Find the σ-algebra F (Y ) generated by Y and show that Y is F (Y )-measurable. 2. Let X be a non-negative random variable defined on a probability space (Ω, F, P) with exponential distribution, which is P (X ≤ x) = FX (x) = 1 − e−λx , x ≥ 0, ˜ be another positive constant, and define where λ is a positive constant. Let λ Z= ˜ by Define P ˜ (A) = P

Z

˜ λ ˜ e−(λ−λ)X . λ

ZdP

A

for all A ∈ F.

˜ (Ω) = 1. (a) Show that P (b) Compute the cumulative distribution function ˜ (X ≤ x) P

for x ≥ 0

˜ for the random variable X under the probability measure P.

A

Set theory: quick reminder

For further references, you can look at Grimmett and Stirzaker, and Schaum (Chapter 2).

A.1

Sets, elements and subsets

• a ∈ A: stays for “ a is an element of set S”; • if a ∈ A implies (⇒, in short) a ∈ B, then A is a subset of B, or A ⊆ B, which is read “ A is contained in B”; • A = B ⇐⇒ (read: “if and only if”) A ⊆ B and B ⊆ A; • Negations:

a∈ / A; A * B; A 6= B

• If A ⊆ B and A 6= B, then A ⊂ B (proper subset) • An example: let A = {1, 3, 5, 7, 9} ; B = {1, 2, 3, 4, 5} ; C = {3, 5}

22

A SET THEORY: QUICK REMINDER • C⊂A • C⊂B • A*B • B*A • Sets can be specified in – tabular form (roster method): A = {1, 3, 5, 7, 9}

– set-builder form (property method): B = {x : x is an even integer, x > 0}

• Special sets: – Universal set U – Empty set ∅: S = {x : x is a positive integer, x2 = 3} = ∅

A.2

Union and intersection

• Union of A and B: set of all elements which belong either to A, B, or both: [ A B := {x : x ∈ A or x ∈ B} • Intersection of A and B: set of all elements which belong to both A and B: \ A B := {x : x ∈ A and x ∈ B} • If A

T

B = ∅, then A and B are disjoint.

• If A ⊆ B, then A A A.2.1

[ \

B = B B = A

Properties S T • A ∅ = A; A ∅ = ∅ S T • If A ⊆ U, A U = U and A U = A

• Commutative Law

A A

[ \

B = B B = B

[ \

A A

23

A.3 Complements and difference • Associative Law

• Distributive Law

• Idempotent Law

 [ [ [ [  A B C = A B C  \ \  \ \  A B C = A B C  [ \ [  [ \  B C = A B A C      \ [ \ [ \  A B C = A B A C

A

A A

A.3

[

\

A = A A = A

Complements and difference

• The set (absolute) complement of A is defined as

AC = {x : x ∈ U, x ∈ / A}

i.e. the set of elements which do not belong to A; • The set relative complement of B with respect to A (or difference of A and B) is defined as A\B = {x : x ∈ A, x ∈ / B} ! Note that

Example 12 Let

U = {1, 2, 3, 4, 5, ...} A = {1, 2, 3} B = {3, 4, 5, 6, 7}

then

Ac = {4, 5, 6, ...} A\B = {1, 2}

Note: • A • A

\ A\B = A B C  [  \ A\ B C = (A\B) (A\C)  \  [ A\ B C = (A\B) (A\C)

S T

Ac = U Ac = ∅

24

B MODES OF CONVERGENCE OF A RANDOM VARIABLE

A.3.1

Properties

• (Ac )c = A • if A ⊂ B, then B c ⊂ Ac • De Morgan Laws S T – (A B)c = Ac B c T S – (A B)c = Ac B c

A.4

Further definitions

• A × B := {(x, y) : x ∈ A, y ∈ B} is the Cartesian product of A and B • A is finite if it is empty or if it consists of exactly n elements, where n is a positive integer; • Otherwise A is infinite; • A is countable if it is finite or if its elements can be listed in the form of a sequence (countable infinite) • Otherwise A is uncountable Example 13

• A = {letters of the English alphabet}

• D = {days of the week} • R = {x : x is a river on Earth} • Y = {x : x is a positive integer, x is even} = {2, 4, 6, 8, ...} • I = {x : 0 ≤ x ≤ 1}

B

Modes of convergence of a random variable

Let {Xm }m∈N be a sequence of random variables, and let X be another random variable. Then: a.s

• ALMOST SURE CONVERGENCE: Xm → X if, ∀ε > 0, the event {ω ∈ Ω :Xm (ω) → X (ω) as m → ∞} has probability 1. P

• CONVERGENCE IN PROBABILITY: Xm → X if, ∀ε > 0 lim P (|Xm − X| > ε) = 0.

m→∞

25

B.1 Further convergences L?p

• CONVERGENCE IN Lp (in Lp mean): Xm → X if lim E (|Xm − X| p ) = 0

m→∞

D

• CONVERGENCE IN DISTRIBUTION: Xm → X lim P (Xm ≤ x) = P (X ≤ x) ∀x ∈ R.

m→∞

B.1

Further convergences

• MONOTONE CONVERGENCE: if 0 ≤ Xm ↑ X a.s., then E (Xm ) ↑ E (X) < ∞, or equivalently, limm→∞ E (Xm ) = E (limm→∞ Xm ) = E (X) , as X = limm→∞ Xm . • DOMINATED CONVERGENCE: for Xm → X a.s., if |X m | ≤ Y (ω) with E (Y ) < ∞, then E (|Xm − X|) → 0

In other words E (Xm ) ↑ E (X) or limm→∞ E (Xm ) = E (X). • BOUNDED CONVERGENCE THEOREM: for Xm → X a.s., if |Xm | ≤ K E (|Xm − X|) → 0 implied by dominated convergence.

Related Documents