EE 178
Lecture notes 14 Stochastic Processes II: Markov Chains
EE 178
Discrete-State Process Markov chains are popular models in many applications areas, including economic systems and networking. Their language is sometimes different than the common usage of stochastic processes, but essentially they are a special kind of stochastic process.
Lecture Outline • Discrete State Processes
For simplicity we consider only discrete time models with discrete outputs. • Markov Chains
Suppose that {Xn; n = 0, 1, 2, . . .} is a stochastic process
• State Classification
Suppose also that Xn can only take on values in the set {1, 2, . . . , m}.
• Steady State Probabilities
Think of these values as “states” of a system. Examples include • Birth and Death Processes • # of unread email messages at the begining of each hour Reading: Bertsekas & Tsitsiklis 6.1,6.2 ,6.3 Stochastic Processes II: Markov Chains
• # of customers in a bank at each minute 14–1
Stochastic Processes II: Markov Chains
14–2
EE 178
• # of cars in a parking lot at the beginning of each hour
EE 178
Notation: In the Markov chain literature the abbreviation pij = P (Xn = j | Xn−1 = i)
• # of packets in a queue at each clock period A Markov chain is a stochastic process where the probability of the next state given the current state and the entire past depends only on the current state: P (Xn = xn | Xn−1 = xn−1, Xn−2 = xn−2, · · · , X0 = x0) = P (Xn = xn | Xn−1 = xn−1 ) The Markovian property is a constraint on the memory of the process: knowing the immediate past means the earlier outcomes are no longer relevant.
is often used. Arguably pj|i would be better, but we adopt common convention. Be careful when dealing with Markov chains to not confuse this conditional probability with a joint probability. This can be conveniently expressed using matrix notation as p11 p12 · · · p1m p21 p22 · · · p2m P = .. .. .. .. pm1 pm2 · · · pmm Note that here i is the conditioning so that
Alternatively: The future is conditionally independent of the past given the present.
m X
pij = 1
j=1
i.e., the sums of any row of the matrix is 1. Stochastic Processes II: Markov Chains
14–3
Stochastic Processes II: Markov Chains
14–4
EE 178
Note that any iid process trivially satisfies the Markov condition, since both probabilities are simply P (Xn = xn).
As a simple nontrivial example, consider a sequence of dependent coin flips where each coin has probability of 1 − p of having the same outcome as the previous coin flip, regardless of all previous flips. 1 1−p p
1 2
EE 178
p 1−p
H
T
1−p
p
Graph shows memory structure. “binary symmetric Markov process”
2 p 1−p
pij
Graphs called transition diagrams are often used to depict Markov chains. Nodes are states, arcs are state transitions (i, j) from state i to state j (only draw transitions with pij > 0) Stochastic Processes II: Markov Chains
14–5
Stochastic Processes II: Markov Chains
EE 178
Can construct this process from an iid process: Suppose that {Zn} is a Bernoulli process with property p. Define a new process Xn recursively as follows: Let X0 be a Bernoulli random variable with some bias q. Define
14–6
EE 178
If you are not convinced, consider: Note that from binary arithmetic a ⊕ b = c if and only if a = b ⊕ c and hence P (Xn = xn | Xn−1 = xn−1, Xn−2 = xn−2 , · · · , X0 = x0 ) = P (Xn−1 ⊕ Zn = xn | Xn−1 = xn−1, Xn−2 = xn−2 , · · · , X0 = x0)
Xn = Xn−1 ⊕ Zn; n = 1, 2, . . .
= P (Zn = Xn−1 ⊕ xn | Xn−1 = xn−1, Xn−2 = xn−2 , · · · , X0 = x0)
⊕ is exclusive-or, addition modulo 2 (0 ⊕ 0 = 1 ⊕ 1 = 0, 0 ⊕ 1 = 1 ⊕ 0 = 1)
= P (Zn = xn−1 ⊕ xn | Xn−1 = xn−1, Xn−2 = xn−2 , · · · , X0 = x0)
You should be able to convince yourself that P (Xn = xn | Xn−1 = xn−1 , Xn−2 = xn−2, · · · , X0 = x0)
= P (Zn = xn−1 ⊕ xn)
=
p 1−p
6 xn−1 xn = xn = xn−1
The conditioning disappears because the conditioning event involves only Xk for k ≤ n − 1 and hence Zk for k ≤ n − 1, but the Zn are by assumption iid and hence mutually independent.
Hence {Xn} is a Markov chain.
Stochastic Processes II: Markov Chains
= pZn (xn−1 ⊕ xn)
14–7
Stochastic Processes II: Markov Chains
14–8
EE 178
EE 178
Can also have asymmetric binary Markov process.
Another example: Two spiders and a fly
State 1 = You are up-to-date in EE178
A fly’s possible positions are represented by four states.
State 2 = You are behind in EE178
1 2
States 2, 3 : safely flying in the left or right half of a room
1 2 0.8 0.2 0.6 0.4 pij
State 1: A spider’s web on the left wall State 4: A spider’s web on the right wall
0.2
0.8
1
2
1 1.0 0.3 0 0
1 2 3 4
0.4
0.6
2 0 0.4 0.3 0
3 0 0.3 0.4 0
4 0 0 0.3 1.0
pij Stochastic Processes II: Markov Chains
14–9
Stochastic Processes II: Markov Chains
14–10
EE 178
0.4
0.4
Some Questions of Interest
0.3 1
1
0.3
0.3 2
3
EE 178
4
1
• Steady-state occupancy probabilities of each state (long term fraction of time spent in each state)?
0.3
• Probability of hitting certain absorbing states? (e.g., spider web) For example, in binary symmetric Markov chain, could use known pX0 and pX1|X0 to compute pX1 :
pX1 (1) =
1 X x0 =0
pX1|X0 (1 | x0)pX0 (x0)
= q(1 − p) + (1 − q)p This can then be combined with pX2|X1 = pX1|X0 to compute pX2 , and so on. Stochastic Processes II: Markov Chains
14–11
Stochastic Processes II: Markov Chains
14–12
EE 178
But is there a limit of pXn as n → ∞? a steady state probability of the states.
EE 178
In particular, if repeat the computation of pX1 from pX0 to find pX2 from pX1 , then repeat again to find pX3 from pX2 , and so on to find pXn from pXn−1 , then each time we are solving an equation of the form
pXn (1) =
1 X x=0
pXn|Xn−1 (1 | x)pXn−1 (x)
= pXn−1 (1)(1 − p) + (1 − pXn−1 (1))p
?
Suppose that indeed pXn (1) has a limit as n → ∞, say pX (1) = α, so that n pX (x) =
α 1−α
x=1 x=0
Substituting this into ?: α = α(1 − p) + (1 − α)p Stochastic Processes II: Markov Chains
14–13
Stochastic Processes II: Markov Chains
14–14
EE 178
or
A similar strategy yields the steady state probability for the asymmetric Markov chain. Suppose that asymptotically state 1 has probability α and hence state 2 has probability 1 − α.
α(1 − 1 + p + p) = p so that
1 2 is the steady state probability that Xn = 1. α=
In the long run the fraction of transitions from right to left in the graph must equal that from the left to the right (one cannot enter a state more often than one leaves it) which implies that
shortcut and interpretation: If assume the steady state probabilities exist, can argue that on the average number of transitions from state 1 to state 2 must equal the average of transitions in the reverse direction. If further assume that the average frequency of events is given by their probability ⇒ P( in state 1 and go to state two ) = P( in state 2 and go to state 1) or
0.6(1 − α) = 0.2α or α = 3/4. In both cases write equations for steady state behavior (if it holds) and solve them.
αp = (1 − α)p or α = 1/2.
In general this is a question of writing matrix equations and solving them (but solutions do not always exist).
(Later will do this carefully)
Stochastic Processes II: Markov Chains
EE 178
14–15
Stochastic Processes II: Markov Chains
14–16
EE 178
EE 178
n-step transition probabilities rij (n + 1) = P (Xn+1 = j|X0 = i) m states = • rij (n) = P ( state = j after n transitions, starting from state i) =
• Recursive formula, starting with rij (1) = pij : rij (n + 1) =
m X
rik (n)pkj ; 1 ≤ i, j ≤ m
=
?
k=1
=
m X k=1 m X k=1 m X k=1 m X
P (Xn+1 = j and Xn = k|X0 = i) P (Xn+1 = j | Xn = k, X0 = i)P (Xn = k|X1 = i) P (Xn+1 = j | Xn = k)P (Xn = k|X0 = i) pkj rik (n)
k=1
(Special case of a “Chapman-Kolmogorov” equation.) Note: this is just total probability:
Stochastic Processes II: Markov Chains
14–17
Stochastic Processes II: Markov Chains
14–18
EE 178
EE 178
• rij (n) can be viewed as the “nth power” of the matrix pij . i.e., 1 2
Define matrix P = {pij ; i = 1, . . . m, j = 1, . . . m} and define the matrices R(n) = {rij (n); i = 1, . . . m, j = 1, . . . m}; n = 1, 2, . . . Then the recursion and initial value can be expressed as
1 0.8 0.6
2 0.2 0.4
rij (1) 1 2
R(1) = P R(2) = R(1)P = P 2
1 .76 .72
2 .24 .28
rij (2)
R(3) = R(2)P = P 3 ..
1 2
R(n) = R(n − 1)P = P n
1 .752 .744
2 .248 .256
rij (3) 1 2
1 .7504 .7488
2 .2496 0.2512
rij (4) Stochastic Processes II: Markov Chains
14–19
Stochastic Processes II: Markov Chains
14–20
EE 178
1 .7501 .7498
1 2
EE 178
Spider-and-Fly example
2 .2499 .2502
rij (5)
1 2 3 4
As n → ∞, rij (n) converges to a limit which does not depend on the initial state i.
1 1.0 0.3 0 0
Part of the theory of Markov chains is the demonstration of general conditions under which this behavior occurs, i.e., rij (n) converges to a limit which does not depend on the initial state
2 0 0.4 0.3 0
3 0 0.3 0.4 0
4 0 0 0.3 1.0
3 0 .24 .25 0
4 0 .09 .42 1.0
pij
1 2 3 4
1 1.0 .42 .09 0
2 0 .25 .24 0 rij (2)
Stochastic Processes II: Markov Chains
14–21
Stochastic Processes II: Markov Chains
14–22
EE 178
1 1.0 .50 .16 0
1 2 3 4
2 0 .17 .17 0
3 0 .17 .17 0
4 0 .16 .50 1.0
3 0 .12 .12 0
4 0 .21 .55 1.0
rij (∞) Note that • States 2 and 3 have steady-state probability 0
rij (3) 1 1.0 .55 .21 0
1 2 3 4
2 0 .12 .12 0
EE 178
• The steady-state probabilities of states 1 and 4 depend on the initial state.
rij (4) 1 2 3 4 Stochastic Processes II: Markov Chains
1 1.0 2/3 1/3 0
2 0 0 0 0
3 0 0 0 0
4 0 1/3 2/3 1.0 14–23
Stochastic Processes II: Markov Chains
14–24
EE 178
Recurrent and Transient States
• A recurrent class = set of accessible states A(i) from some recurrent state i.
• Let A(i) be the set of states that are accessible from state i. • State i is recurrent if starting from i, than any accessible state j must be such that i is accessible from j, i.e., j is in A(i) iff i is in A(j). • Note: if i is recurrent, then necessarily i ∈ A(i).
Suppose i is recurrent and k ∈ A(i). Then k is accessible from i and hence, since i is recurrent, k can access i and hence, through i, any state in A(i). Thus A(i) ⊂ A(k). Since i can access k and hence any state in A(k), A(k) ⊂ A(i). Thus A(i) = A(k).
• Example:
2
Note: Suppose that i is recurrent and hence A(i) is a recurrent class. Then if k ∈ A(i), k is also recurrent and A(k) = A(i). Proof:
• A state is transient if it is not recurrent.
1
EE 178
• Recurrence/transcience is determined by the arcs (transitions with nonzero probability), not be actual values of probabilities.
3
k must be recurrent since it can access any state in A(k) and hence any state in A(i), which can access i (since i is recurrent), which in turn can access k.
4
Note: Recurrent classes contain no transient states. Stochastic Processes II: Markov Chains
14–25
Stochastic Processes II: Markov Chains
14–26
EE 178
EE 178
Note: Two recurrent classes must be identical or disjoint.
Periodic Classes
proof: Suppose have A(i) and A(j) where i and j are distinct and each is recurrent.
A recurrent class is called periodic if the states can be grouped in d > 1 subsets such that all transitions from one subset lead to the next subset.
If j ∈ A(i), then have seen that j is recurrent and A(j) = A(i).
1 2 3
If j 6∈ A(i), then no state in A(j) can be also be in A(i) so that A(i) ∩ A(j) = ∅. To see this, assume the contrary. Suppose that l ∈ A(j) is also in A(i). Then j can access l which in turn can be accessed by i. Since i is recurrent, l can access i and hence j ∈ A(i), a contradiction.
1 2 0 1.0 1.0 0 0.5 0.5 pij
3 0 0 0
3
Note: A Markov chain can be decomposed into one or more recurrent classes plus possibly some transient states.
0.5
0.5 1
A recurrent state is accessible from all states in its class, but it is not accessible from states in other recurrent classes.
1
2 1
n rii(n) = Stochastic Processes II: Markov Chains
14–27
Stochastic Processes II: Markov Chains
1 if n is even 0 if n is odd 14–28
EE 178
EE 178
Steady-State Probabilities S1
3
Steady-State Convergence Theorem
1
If there is only one recurrent class and it is not periodic, then rij (n) tends to a steady-state value πj that is independent of i:
4 2
S2 5
lim rij (n) = πj , for all i
n→∞
Steady-state equations 6
S3
Take limit as n → ∞ of rij (n + 1) =
If the class is periodic, rii(n) never converges to a steady-state value as n increases.
m X
rik (n)pkj ; 1 ≤ i, j ≤ m
k=1
get Stochastic Processes II: Markov Chains
14–29
Stochastic Processes II: Markov Chains
14–30
EE 178
πj =
m X k=1
EE 178
Computation of Steady State Probabilities πk pkj , j = 1, 2, . . . , m m X
Solve the steady state probabilities πj = 1
j=1
πj =
m X
πk pkj , j = 1, 2, . . . , m
k=1 m X
πj = 1
j=1
by elimination or by computer.
Stochastic Processes II: Markov Chains
14–31
Stochastic Processes II: Markov Chains
14–32
EE 178
Example Model of class mobility
EE 178
Expected Relative Frequency Interpretation 0.1 0.7
0.45
1
.45
0.25
3
2 0.05
Upper
Middle 0.05
0.5
Interpretation: Under suitable conditions should have the long-term expected frequency of visits to state j
0.45
Lower
πj = lim
n→∞
E[# of visits to j in n transitions] n
Reminder: “suitable conditions” above means that the steady state convergence theorem holds. π1 = 0.45π1 + 0.05π2 + 0.05π3
This is an interpretation of the steady state “balance” equations
π2 = 0.45π1 + 0.7π2 + 0.5π3 π3 = 0.1π1 + 0.25π2 + 0.45π3
πj =
1 = π1 + π2 + π3
m X
πk pkj , j = 1, 2, . . . , m
k=1
which followed from combining the steady-state convergence theorem with the Chapman-Kolmogorov equation. Stochastic Processes II: Markov Chains
14–33
Stochastic Processes II: Markov Chains
14–34
EE 178
We proved the Chapman-Kolmogorov equation, we did not prove the steadystate convergence theorem. (Just stated it with some handwaving, B&T sketch a proof. The usual careful proofs involve the limiting properties of powers of stochastic matrices.)
EE 178
The left hand side can be written as πj
=
k=1,k6=j
i.e., original form: πj = Stochastic Processes II: Markov Chains
m X
pjk πj
= pjj πj +
m X
pjk πj
k=1,k6=j
The right hand side can be written as
Rewrite the steady state equations as m X
pjk
k=1
Another “stability” result follows for transitions into and out of a fixed state:
πk pkj =
m X k=1
This “interpretation” is actually a theorem, a variation on the law of large numbers relating probabilities to relative frequencies (to be proved in the next lecture notes)
m X
= πj
m X
πj pjk , j = 1, 2, . . . , m
πk pkj
k=1
k=1,k6=j
= pjj πj +
m X
πk pkj
k=1,k6=j
and subtracting the pjj πj from both sides yields the desired form.
Pm
k=1 πk pkj , j = 1, 2, . . . , m
14–35
Stochastic Processes II: Markov Chains
14–36
EE 178
1
1
πj pj 1
π1 p1j
EE 178
Although we have not proved the connection between probabilities and relative frequencies for Markov chains with one recurrent class that is not periodic, the result is very useful for quick derivations and will be used.
2
2
π2 p2j
j
πj pj 2 .. .
.. .
πj pjm
πm pmj
m
m
E[ frequency of trans. into j] = E[ frequency of trans. out of j] As when looking at frequency of visits to states, the frequency of transitions between states is an interpretation of the steady state equations in terms of relative frequencies. Here the interpretation is that πipij is the expected frequency of transitions from state i into state j. As before, this is actually a theorem, another variation on the law of large numbers.
Stochastic Processes II: Markov Chains
14–37
Stochastic Processes II: Markov Chains
EE 178
• State space= {1, 2, . . . , m} • Only three transitions possible: go left, go right, stay where you are
···
2
i
EE 178
Here by “frequency interpretation” we mean specifically that the expected frequency (under “suitable conditions”) of making a transition from state i into state j is πipij , i.e., one small piece of the transition frequency interpretation.
Birth-death Markov Chains
1
14–38
i+1
The “balance equations” or limiting probabilities simplify greatly for birthdeath Markov chains because each state has only limited transitions possible. If we interpret the probabilities of transitions, we must have as many transtitions going left at cut as we have going right, e.g.,
···
If we go left through a cut, we will not be able to go left again through it until we have gone right through it, there is no other route possible. Hence at any time we can have at most a difference of 1 between the number of left passages and right passages (and 1/n → 0)
• Cut the chain between states i and i + 1
This implies that πipi,i+1 = πi+1pi+1,i
• Apply frequency intepretation:
In fact, you do not need the expected frequency interpretation to come to this conclusion, it is just the easiest way to do so.
πipi,i+1 = πi+1pi+1,i Stochastic Processes II: Markov Chains
14–39
Stochastic Processes II: Markov Chains
14–40
EE 178
EE 178
To do this, recall the steady state (or balance) equation.
πj =
m X
πi =
πk pkj , j = 1, 2, . . . , m
m X
πk pki
k=1
k=1
= πi−1pi−1,i + πipii + πi+1pi+1,i = πi−1pi−1,i + πi(1 − pi,i−1 − pi,i+1) + πi+1pi+1,i
For state i=1 (the leftmost), this reduces to
= πipi,i−1 + πi(1 − pi,i−1 − pi,i+1) + πi+1p(i+1)i
π1 = π1p11 + π2p21
= πi(1 − pi,i+1) + πi+1pi+1,i
and hence since p11 = 1 − p12, this is
so that πipi,i+1 = πi+1pi+1,i
π1p12 = π2p21
as claimed.
Next use mathematical induction: Assume the result is true for i − 1, i.e., that πi−1pi−1,i = πipi,i−1, and prove that it holds for i. (We know it holds for i = 1, so this will complete the proof.)
Stochastic Processes II: Markov Chains
14–41
Stochastic Processes II: Markov Chains
14–42
EE 178
Queueing Example
EE 178
p π0 q 2 p p π1 = π1p = π2q ⇒ π2 = π0 q q 3 p p π2 = π2p = π3q ⇒ π3 = π0 q q .. m p p πm−1 = π0 πm−1p = πmq ⇒ πm = q q π0p = π1q ⇒ π1 =
Packets arrive at a communication node with storage capacity m. Time is discretized in very small periods. At each period: • Prob. of 1 arrival = p • Prob. of 1 transmission = q • Prob. of 0 arrivals and 0 transmissions = 1 − p − q 1−p
1−p−q p
0
p
Stochastic Processes II: Markov Chains
q
or defining ρ = p/q
1−q
πi = ρiπ0
p ···
1
q
1−p−q
q
Since π0 + π1 + · · · + πm = 1,
m
m−1
π0(1 + ρ + ρ2 + · · · + ρm) = 1
q
14–43
Stochastic Processes II: Markov Chains
14–44
EE 178
hence for i = 1, . . . , m πi =
ρi 1 + ρ + + · · · + ρm ρ2
Using geometric progression formula, πi = ρi
1−ρ ρi → m→∞ 1 − ρm+1 1−ρ
if ρ < 1
Stochastic Processes II: Markov Chains
14–45