Hmm

June 2020
PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Hmm as PDF for free.

More details

Words: 2,503
Pages: 31

Preview
Full text

P.H. CS123a

Hidden Markov Model (HMM) Pengyu Hong COSI 123a Statistical Machine Learning

P.H. CS123a

Markov Model qt-k

•••

qt-1

qt-1

qt

1st-order Markov model qt represents the state at time t P(qt | qt-1, qt-2, …) = P(qt | qt-1) P(qt, qt-1, qt-2, …) = P(qt | qt-1)P(qt-1 | qt-2) …

P.H. CS123a

Markov Model qt-k

•••

qt-1

qt-1

2nd-order Markov model P(qt | qt-1, qt-2, …) = P(qt | qt-1, qt-2)

qt

P.H. CS123a

Markov Model •••

qt-k

qt-1

qt-1

qt

A 1st-order Markov model for weather predictor State-transition probability P(qt | qt-1) 0.3 rainy 0.4 0.3

cloudy

0.2 0.1

0.2

sunny 0.8

0.6 0.1

qt-1

qt rainy cloudy

sunny

rainy cloudy

0.4 0.2

0.3 0.6

0.3 0.2

sunny

0.1

0.1

0.8

P.H. CS123a

Markov Model 0.3 rainy 0.4 0.3

cloudy

0.2 0.1

0.2

0.6 0.1

sunny 0.8 P(rainy Æ rainy Æ sunny Æ cloudy Æ sunny)? today

P.H. CS123a

What is an HMM? qt-k

qt-1

qt

Hidden states

ot-k

ot-1

ot

Observed states

• Graphical Model • Circles indicate states • Arrows indicate probabilistic dependencies between states

P.H. CS123a

An Example “Grammar” a

e

i

u o Hidden noise variation

Observed

P.H. CS123a

An Example a

e

i

u o a

u

i

o

u

o

i

o

a

Variations: temporal, volume, noise, …

e

P.H. CS123a

Applications of HMM • Speech recognition – hidden: phoneme/words – observed: acoustic signals

i

• Handwriting recognition – hidden: words – observed: images

3

• Part-of-speech tagging – hidden: part-of-speech tags – observed: words

noun change

• Machine translation – hidden: words in the target language – observed: words in the source language

• ……

learn 学习

P.H. CS123a

Formulate Discrete HMM qt-k

qt-1

qt

ot-k

ot-1

ot

λ = {A, B, π} Time invariant

qt – The hidden state at time t S = {1, 2, …, N} – all possible values of hidden states. ot – The observation at time t V = {1, 2, …, M} – all possible values of observed states. A = [aij]i,j – state transition probabilities. aij = P(qt+1 = j | qt = i),

aij ≥ 0, 1 ≤ i, j ≤ N. Σjaij =1.

B = [bik]i,k – emission probabilities.

bik =P(ot = k | qt = i), bik ≥ 0, 1 ≤ i ≤ N & 1 ≤ k ≤ M. Σkbik =1.

π = {πi} – initial state probabilities. πi = P(q1 = i). Σiπi =1.

P.H. CS123a

Three Basic Problems for HMMs q1

qT-1

qT

o1

oT-1

oT

λ = {A, B, π} 1. Given the observation sequence O = o1o2…oT and the model λ, how to efficiently compute P(O|λ).

P.H. CS123a

Three Basic Problems for HMMs q1

qT-1

qT

o1

oT-1

oT

λ = {A, B, π} 2. Given the observation sequence O = o1o2…oT and the model λ, how to compute the hidden state sequence Q = q1q2…qT which best “explains” the observations (i.e., compute the most likely hidden state sequence)

P.H. CS123a

Three Basic Problems for HMMs q1

qT-1

qT

o1

oT-1

oT

λ = {A, B, π} 3. How to adjust the model parameters λ = {A, B, π} to maximize P(O | λ). (adjust the model to fit the data best?)

P.H. CS123a

Problem 1: Probability of an Observation Sequence q1

•••

qT-1

qT

o1

•••

oT-1

oT

λ = {A, B, π}

Given O = o1o2…oT and λ, compute P(O|λ). P(O|λ) = ΣQP(O,Q|λ) = ΣQP(O|Q,λ) P(Q|λ) P (O | Q, λ ) = ∏t =1 P (ot | qt , λ ) T

P (Q | λ ) = π q1 aq1q 2 aq 2 q3 L aqT −1qT

However, there are NT possible hidden state sequences! There is an efficient way – dynamic programming.

P.H. CS123a

The Forward Procedure P(o1...oT | λ ) =

SN

∑ P(o ...o , q

qT = S1

1

T

T

q1

•••

qT-1

qT

o1

•••

oT-1

oT

| λ)

P (o1...oT , qT | λ )

λ = {A, B, π}

= P (o1...oT | qT , λ ) P (qT | λ ) = P (o1...oT −1 | qT , λ ) P (oT | qT , λ ) P (qT | λ ) = P (o1...oT −1 , qT | λ ) P (oT | qT , λ ) = ∑q N

P (o1...oT −1 , qT −1 , qT | λ ) P (oT | qT , λ )

= ∑q N

P (o1...oT −1 , qT | qT −1 , λ ) P(qT −1 | λ ) P(oT | qT , λ )

= ∑q N

P(o1...oT −1 | qT −1 , qT , λ ) P(qT | qT −1 , λ ) P(qT −1 | λ ) P(oT | qT , λ )

= ∑q N

P(o1...oT −1 | qT −1 , λ ) P(qT −1 | λ ) P(qT | qT −1 , λ ) P(oT | qT , λ )

= ∑q N

P(o1...oT −1 , qT −1 | λ ) P(qT | qT −1 , λ ) P(oT | qT , λ )

S

T −1 = S1

S

T −1 = S1

S

T −1 = S1

S

T −1 = S1

S

T −1 = S1

P.H. CS123a

The Forward Procedure P(o1...oT | λ ) =

SN

∑ P(o ...o , q

qT = S1

1

T

T

q1

•••

qT-1

qT

o1

•••

oT-1

oT

| λ)

λ = {A, B, π} P(o1...oT , qT | λ ) = ∑q N S

T −1 = S1

P(o1...oT −1 , qT −1 | λ ) P(qT | qT −1 , λ ) P (oT | qT , λ )

Define the forward variable α t (i) = P(o1...ot , qt = i | λ ) = [∑ j =1α t −1 ( j )a ji ]biot N

A = [aij]i,j – state transition probabilities. aij = P(qt+1 = j | qt = i) B = [bik]i,k – emission probabilities. bik =P(ot = k | qt = i)

P.H. CS123a

The Forward Procedure P(o1...oT | λ ) =

SN

∑ P(o ...o , q

qT = S1

1

T

T

| λ)

α t (i ) = P(o1...ot , qt = i | λ ) = [∑ j =1α t −1 ( j )a ji ]bio N

t

q1

•••

qT-1

qT

o1

•••

oT-1

oT

λ = {A, B, π}

(1) Initialize

α1 (i ) = π ibio , 1

1≤ i ≤ N

(2) Induction

α t (i ) = [∑ j =1α t −1 ( j )a ji ]bio , N

t

2≤t ≤T 1≤ i ≤ N

(3) Termination N

P(O | λ ) = ∑ α T (i ) i =1

P.H. CS123a

The Forward Procedure N

P(O | λ ) = ∑ α T (i )

λ = {A, B, π}

α t (i ) = [∑ j =1α t −1 ( j )a ji ]bio N

i =1

α1(i)

α2(i)

S1 S2

π

αT(i) •••

S1

A = [aij]i,j

t

S1

•••

S2

S2

••

••

••

••

•

•

•

• SN

SN

•••

SN

o1

o2

•••

oT

B = [bik]i,k

Σ

P(O | λ )

P.H. CS123a

The Backward Procedure Analogous to the forward variable, define a backward variable β t (i ) = P(ot +1...oT | qt = i, λ )

q1

•••

qt

qt+1

•••

qT

o1

•••

ot

ot+1

•••

oT

Probability of the partial observation (1) Initialize

βT(i)=1, 1 ≤ i ≤ N (2) Induction

β t (i) = ∑ j =1 aij b jo β t +1 ( j ), N

t +1

1 ≤ t ≤ T −1 1≤ i ≤ N

(3) Termination N

P(O | λ ) = ∑ π i β1 (i ) i =1

P.H. CS123a

The Forward-Backward Procedure N

N

P(O | λ ) = ∑ π i β1 (i )

P(O | λ ) = ∑ α T (i )

i =1

i =1

q1

•••

qt-1

qt

qt+1

•••

qT

o1

•••

ot-1

ot

ot+1

•••

oT

α t (i ) = P(o1...ot , qt = i | λ ) N

β t (i ) = P(ot +1...oT | qt = i, λ )

P(O | λ ) = ∑ α t (i ) β t (i ) i =1

P.H. CS123a

Problem 2: Optimal State Sequence ?

•••

?

?

?

•••

?

o1

•••

ot-1

ot

ot+1

•••

oT

Q* = arg max P(Q | O, λ ) = arg max P(Q, O | λ ) Q

Q

O = o1o2…oT Q = q1q2…qT

P.H. CS123a

The Viterbi Algorithm Q* = arg max P(Q | O, λ ) = arg max P(Q, O | λ ) Q

Q

•••

S1

Sj

•• •

•

• SN

SN

oT-1

oT

•••

B = [bik]i,k o1

•••

•

Sj

••

••

•

•••

••

•

•

•

A =• •[a• ij]i,j

•• SN

S1

••

••

••

π

Sj

S1

best

P (o1 ,..., oT −1 , q1 ,..., qT −1 , qT = j , oT | λ )

P.H. CS123a

The Viterbi Algorithm Q* = arg max P(Q | O, λ ) = arg max P(Q, O | λ ) Q

Q

•••

S1

•• • Si

best

P (o1 ,..., oT −1 , q1 ,..., qT −1 , qT = j , oT | λ )

•• •

•

•

•

•••

Si

[aij]i,j

••

••

•• SN

•

•

•

A =• •[a• ij]i,j

Si

S1

••

••

••

π

S1

SN

SN

oT-1

oT

•••

B = [bik]i,k

•••

o1

P(o1 ,..., oT −1 , q1 ,..., qT −1 , qT = j , oT | λ ) = [max P(o1 ,..., oT − 2 , q1 ,..., qT − 2 = i | λ )aij ]b joT i

P.H. CS123a

The Viterbi Algorithm Q* = arg max P(Q | O, λ ) = arg max P(Q, O | λ ) Q

Q

•••

S1

[aij]i,j

o1

•••

oT-1

best

P (o1 ,..., oT −1 , q1 ,..., qT −1 , qT = j , oT | λ )

•

• SN

•••

B = [bik]i,k

Si

••

••

• SN

•

Si

••

•

•••

••

•

•

•

A =• •[a• ij]i,j

•• SN

S1

••

••

••

π

Si

S1

oT

δt+1(j) – The highest probability of landing in state j at time t+1 after seeing the observations up to time t+1

Define δ t +1 ( j ) = max P (o1 ,..., ot , q1 ,..., qt , qt +1 = j , ot +1 | λ ) q1 ,..., q t

δ t +1 ( j ) = [max δ t (i )aij ]b jo

t +1

P.H. CS123a

The Viterbi Algorithm Q* = arg max P(Q | O, λ ) = arg max P(Q, O | λ ) Q

Q

•••

S1

•

Si

[aij]i,j

Si

••

••

••

•

•

•

•

•••

••

A =• •[a• ij]i,j

•• SN

•

•

• Si

S1

••

••

••

π

S1

SN

SN

oT-1

oT

•••

B = [bik]i,k o1

•••

δ t +1 ( j ) = [max δ t (i )aij ]b jo

t +1

ψ t +1 ( j ) = arg max δ t (i )aij b jo i

t +1

precedence

P.H. CS123a

The Viterbi Algorithm 1. Initialization

δ1 (i ) = π ibio1 and ψ 1 (i ) = 0,

1≤ i ≤ N

2. for t = 1 to T-1

δ t +1 ( j ) = [max δ t (i )aij ]b jo

t +1

ψ t +1 ( j ) = arg max δ t (i )aij b jo i

endfor

t +1

3. Termination

P* = max δ T (i ) 1≤ i ≤ N

qT* = arg max δ T (i )

4. State sequence backtracking

qt* = ψ t +1 (qt*+1 )

1≤ i ≤ N

P.H. CS123a

Problem 3: Learning Parameters of HMMs •••

o1

•••

•••

ot-1

ot

ot+1

•••

oT

λ = arg max P(O | λ ) *

λ

Easy if the hidden states are known.

Hidden variable problem

The EM algorithm

P.H. CS123a

The EM Algorithm – The forward-backward (or Baum-Welch) algorithm Expectation step: P (Q | O , λ( n ) )

Maximization step: λ( n +1) = arg max E[log p (O, Q | λ ) | O, λ( n ) ] λ

O – Observations {O1, O2, … OD} Q – The corresponding hidden states {Q1, Q2, … QD}

Od = od 1od 2 ...odTd

Qd = qd 1qd 2 ...qdTd

If d ≠ e, Td may not = Te

P.H. CS123a

Transition Probabilities

Given the observations and the current parameters λ(n): (a) The probability of traversing an arc: From state i at time t to state j at time t+1 ξt( n ) (i, j ) = P(qt = i,qt +1 = j | O, λ( n ) ) •• • •

••

••

α t( n ) (i )

Sj ••

•

ot

•

•

••

•

ot-1

••

•

•• •

•• ••

Si

aij

ot+1

ot+2

β t(+n1) ( j )

(n) (n) (n) (n) α ( i ) a b β t ij jo t +1 ( j ) (n) = ξt (i, j ) = (n) P(O | λ ) t +1

α t( n ) (i )aij( n ) b (jon ) β t(+n1) ( j ) t +1

(n) (n) (n) (n) α ( x ) a b β ∑ x =1 ∑ y =1 t xy yo t +1 ( y ) N

N

t +1

P.H. CS123a

Transition Probabilities

Given the observations and the current parameters λ(n): (a) The probability of traversing an arc: From state i at time t to state j at time t+1 ξt( n ) (i, j ) = P(qt = i,qt +1 = j | O, λ( n ) ) (b) The probability of being in state i at time t

γ

(n) t

(i ) = ∑ j =1ξt( n ) (i, j ) N

••

••

•

•

•

t

t+1

•

•

• ••

t-1

••

••

••

Si

ξt( n ) (i, j )

P.H. CS123a

Re-estimate Parameters

Given the observations and the current parameters λ(n): (a) The probability of traversing an arc: ξt( n ) (i, j ) (n) (b) The probability of being in state i at time t: γ t (i )

• The probability that i is a start state πi π i( n +1) = γ 1( n ) (i ) • Transition Probabilities T −1 (n) ξ (i, j ) ∑ ( n +1) t =1 t aij = T −1 ( n ) ∑t =1 γ t (i) • Emission Probabilities ( n +1) ik

b

∑ = ∑

γ (i )

{t :ot = k } t T

γ (t ' )

t ' =1 i

Hmm

Overview

More details

Related Documents

Hmm

Hmm

Lessons Hmm

Hmm Report Summary

Application Of Hmm

Gmo Humor Hmm :) Czy :( ?