Hidden Markov Models applications in Bioinformatics
2XWOLQH
❚ ❚ ❚ ❚ ❚
*UDGLHQ 3URILO
0XOWLSO 3URWHL
+
'HVFHQ
IR
PHWKR
SURWHL
VHTXHQF
DQDO\VL
IR
OHDUQLQJ
IDPLOLHV
DOLJQPHQWV
FODVVLILFDWLR
6WUXFWXUD
DQ
DWWHU
GLVFRYHU\
1
*UDGLHQ
'HVFHQW
log P (O | w) → max the variables of the problem:
Subject to
∑e
iX
=1
w = (..., eiX ,..., t ij ,... )
for all i ∈ E
X
∑t j∈S
ij
=1
for all i ∈ S
We reparameterize the variables exp( viX ) exp( vij ) tij = ∑ Y exp( viY ) ∑ k exp( vik ) These equations hold automatically. The emission and transition probabilities will be between 0 and 1. eiX =
We will need the following formulae ∂eiX = eiX (1 − eiX ) ∂viX
and
∂eiY = − eiX eiY . ∂ v iX
These formulae can be obtained by simple calculations from eiX =
exp( viX ) ∑ Y exp( viY )
tij =
exp( vij )
∑
k
exp( vik )
2
We need to solve the maximization problem
log P (O | w) → max
with the new variables (..., viX ,..., vij ,...) .
To do that we need to calculate the coordinates of the gradient
∂ log P (O | w) = 0 for all i ∈ E and X ∈ A, ∂ v iX
∂ log P (O | w) = 0 for all i ∈ S and j ∈ N (i ). ∂vij
By the chain rule,
∂ ∂ log P (O | w) ∂eiY = log P (O | w) = ∑ ∂ v iX ∂eiY ∂ vi X Y =
∂ log P (O | w) ∂eiX ∂eiX ∂ vi X n iX eiX
=
∂ log P (O | w) ∂eiY = ∂eiY ∂v i X Y≠X n iY − eiX eiY eiY
∑
+
eiX (1 − eiX )
niX n eiX (1 − eiX ) + ∑ iY (− eiX eiY ) = niX − niX eiX + ∑ (− niY eiX ) = eiX Y ≠ X eiY Y≠X
= niX + ∑ (− niY eiX ) = niX − eiX ∑ niY = niX − eiX ni Y
Y
3
7
*UDGLHQ
UD
VW
'HV
W
PHWKRG
∂ log P (O | w) = niX − ni eiX . ∂ v iX
We obtained that
We can obtain similarly that
∂ log P (O | w) = nij − ni t ij . ∂vij
The general step of the Gradient Descent method, i.e k k how we move from point (..., viX ,..., vij ,...) to (..., vikX+1 ,..., vijk +1 ,... ) :
vijk +1 = α (nij − ni tij )
vikX+1 = α (niX − ni eiX ) step length
We calculate niX , ni , eiX , nij and t ij based on the previous formulae using (..., vikX ,..., vijk ,... ) .
*UDGLHQ
'HVFHQ
PHW
6XPPDU\
Initialization: We take the starting point (..., vi0X ,..., vij0 ,...) . For example, we can calculate the initial eiX and t ij similarly as in the Baum-Welch method, then calculate (..., vikX ,..., vijk ,...) by viX = log eiX and vij = log tij . General step: We calculate the points (..., vikX ,..., vijk ,...) subsequently for k=1,2,3… until there is no significant change in the loglikelihood function. The step length α is determined by using non-linear optimization meth. The value of loglikelihood function gets larger step by step.
4
*UDGLHQ
'HVFHQ
PHW
6XPPDU\
Batch gradient-descent equations can easily be derived by summing over all training sequences. Similarly to the Baum-Welch algorithm, it is not guaranteed that the gradient descent method provides us the optimum. (What we obtain is a local maximum.) Similarly to the Baum-Welch algorithm, the gradient-descent equations one forward and one backward algorithm. Thus, the running time of each iteration is O(KN2).
7KU
DVL
VWLRQV
Likelihood question: How likely is a given sequence for a particular HMM? Given O and w, P (O | w) = ? Decoding question: What is the most probable sequence of transitions and emissions through the HMM underlying the production of this particular sequence? Given O and w, max P (O, π | w) = ? π
Learning question: Given a sequence. What is the model that produces this sequence with the highest probability? Given O, max P (O | w) = ? w
5
3URILO
+00
Given a protein family. Our aim is to set up an HMM to characterize this protein family. Profile HMM
We train the HMM by the members of a family, i.e. by the sequences O1,…,OK. The members of the family are similar to each other.
0XOWLSO ´DOLJQL
VH
VHTXHQFH
OLJQPHQW
W
WK
PRGHOµ
Given the sequences O1,…,OK. Given a trained model, i.e. w is known. The HMM can be trained by a large amount of sequences (see learning question).
We calculate the Viterbi path for all sequences O1,…,OK, that provides us a multiple sequence alignment. Example: N · F L N K Y L Q · W -
start → m1 → m2 → m3 → end start → m1 → i2 → m2 → m3 → end start → m1 → m2 → d3 → end
Viterbi path of NFL NKYL QW.
m1 i2 m2 m3
6
Once we have the trained model, the running time of msa is O(KN2). computation cost of K Viterbi paths, K* O(N2), where N is the length of the model.
It is significantly less than the computation cost of dynamic programming method, that is O(NK).
The multiple alignments derived by HMMs are in some sense richer than conventional alignments. A gap can be a result of an insert as well as a delete state.
3URWHL
ODVVLILFDWLRQ
Given a sequence O and protein families characterized by their profile HMM. We calculate a score of the alignment of sequence O to the protein family by the log-odds ratio
log where P (O | r ) = ∏t =1 qX t . T
P ( O | w) P (O | r )
Model r is referred as standard random model; it is the simple dice model or … … it can be interpreted as an HMM, where the transition . are 1, all the others are 0. probabilities between the main states The probability of letter X being emitted is the same qX for all emitting states and for all letter X.
7
null model, whose parameters reflect the general amino acid distribution in the training sequences. Barrett, Hughey, and Karplus have shown that a good choice of null model is the geometric mean of amino acids in the match states C. Barrett, R. Hughey, and K. Karplus. Scoring hidden Markov models. CABIOS, 13(2):191-199. 1997.
3URWHL
ODVVLILFDWLRQ
We use a cut-off value to predict whether sequence O is a member of the family or not. In practice, this cut-off (threshold) value can be determined easily, because the points corresponding to family members are clearly separated from those corresponding to proteins not in the family. Alternatively, a score based on Viterbi path can be calculated by
P ( O , π * | w) log . P (O | r )
8
6WUXFWXUD SDWWHU
DQDO\VL
D
LVFRYHU\
Given an HMM profile of a protein family. The parameters of the HMM can be studied. High emission or transition probabilities are usually associated with conserved regions that may have structural/functional significance. It can be done by plotting the entropy of the emission distributions along the backbone of the model. n
− ∑ pi log pi i =1
pi is the probability that letter Xi is emitted
9