Presentation Pune

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Presentation Pune as PDF for free.

More details

  • Words: 881
  • Pages: 18
CARL

Probabilistic Cellular Automata Model for Identification of CpG island in DNA string

carl

Soumyabrata Ghosh

Cellular Automata Research Lab Group Shiladitya Munshi | Nirmalya S. Maiti | Soumyabrata Ghosh

Salt Lake City | Kolkata | www.carltig.res.in

carl

Agendas •

Background



CpG Island Identification by Hidden Markov Model & its drawbacks



Probabilistic Cellular Automata (PrCA) & its Rule Vector Graph



PrCA Model for CpG island identification



Filter Design to reduce false positive predictions



Experimental results

Background •

Identification of CpG island in DNA string • Higher concentration of C-G dinucleotides • Probable “start” regions of genes



De facto standard: Hidden Markov Model • Based on Markov chain with transition & emission probabilities • Maximum probable path prediction by Viterbi Algorithm

carl



Disadvantages of HMM: • • • •

States are assumed to be independent of each other Local Maxima Over training False positive prediction

Hidden Markov Model (HMM) • • •



carl



Based on Markov Chain Probability of a state depends only on previous state Transition Probability Matrix • Transition between underlying states (A+, T+ , G+, C+ for CpG island and A-, T-, G-, C- for non-CpG) Emission Probability Matrix • Emission of expressed symbol (A, T, G, C) with respect to underlying state Viterbi Algorithm – Initializes the probability calculations by taking the product of the initial Transition probabilities with the associated emission probabilities. – Determines the most probable path through the underlying states of the HMM

carl

Our Approach •

Design of Probabilistic CA based model



Encoding of nucleotides by 3 bits binary string



Derivation of Probabilistic Rules from Transition Probability Matrix



Model Traversal by Viterbi algorithm



Feature Extraction from encoded DNA string



Filter design based on Feature Extraction to reduce false positive predictions

Probabilistic Cellular Automata (PrCA) •

Cellular Automata: discrete dynamical system. Space, time, and the states of the system are discrete.



The next state function of ith cell depends on current state of (i-1)th, ith, (i+1)th cell. The next state function is represented by CA Rule.



Three Neighborhood Periodic Boundary Probabilistic CA

carl

An n cell CA with Rule Vector •

Rule Mean Term (RMT): Minterm of a 3 variable Boolean function for a 3-neighborhood CA cell denoted as T= {T(m)}, (m=0 to 7)



In Probabilistic CA, for a given RMT, the next state value is 0 or 1 with probability less than 1

Probabilistic CA Rules

carl

Deterministic & Probabilistic Rule 232

carl

Rule Vector Graph (RVG) of PrCA •

Efficient data-structure for characterization of CA evolution



RVG of n cell CA consists of n subgraphs



The output of ith subgraph serves as input node of (i+1)th subgraph



Nodes contain possible RMTs derived from RMTs of previous level



Root Node & Sink Node contain all possible RMTs for periodic boundary CA



An edge explicitly denotes probability of 1 whereas the probability of 0 is implicit

RVG Structure of Probabilistic Periodic Boundary 3 Neighborhood CA

carl

Probabilistic Rule Table

RVG Traversal • 2n possible number of paths for n cell PrCA • Forward RVG Traversal • Most probable next state identification

• Backward RVG Traversal • Most probable transition identification

• Forward/Backward Traversal

carl

• Most probable path determination through the RVG

carl

Pr CA Model for CpG island Identification •

3 cell CA can generate 8 states



Each of 8 states represents one of underlying states (A+,G+, T+, C+,A-, T- G-, C-)



One RVG represent single underlying state transition



Sequence of N such transitions can be modeled by a network of N RVGS



Most probable path determination by N sequential traversal of the RVG using Viterbi Algorithm

carl

RVG Network of CA Model & Model Traversal

Synthesis of Probabilistic Rules • •

Determination from Transition Probability Matrix by permuting binary encoding Lowest conflict encoding is taken

carl

Probabilistic Rules applied on three bits of binary encoding of underlying states

Three bit encoding for 8 under lying states

Filter Design •

Two Binary slices derived from DNA sequence by using common two bit encoding of nucleotides: C(00), A(01), G(10), T(11)



Feature Extraction from binary string of traing DNA sequences based on experimentally known CpG/non-CpG differentiation

carl

– – –



Class separation using Bhattacharyya distance Window size 200 bits (lowest difference between 2 classes) K-means Clustering to produce representative Datapoints

Analysis of feature vectors of testing sequence to assign 4th bit encoding –

– –

Using Mahanalobis Distance Measure

1 for window belongs to CpG 0 for window belongs to non-CpG

carl

Experimental Results

Prediction of CpG Islands on AF111167 gene sequence. The model and filter are trained by CPGISLE dataset

Inferences of experimental results • HMM prediction: » 4 False Positive Predictions » 2 False Negative Predictions

• Pr CA Model (with Filter):

carl

» 1 False Positive Prediction » 2 False Negative Predictions

carl

Concluding Remarks •

Probabilistic CA based Model for DNA sequence analysis



Improvement of CpG island prediction by reducing False Positive predictions



Analysis of binary encoded DNA string may provide better insight of local interaction of nucleotides

carl

Thanks !

Cellular Automata Research Lab Group Shiladitya Munshi | Nirmalya S. Maiti | Soumyabrata Ghosh

Salt Lake City | Kolkata | www.carltig.res.in

Related Documents

Presentation Pune
November 2019 18
Pune
November 2019 41
Energy Pune
May 2020 9
Pune Pmt
November 2019 17
M.com From Pune Univ
July 2019 20
Pune Trackwork - 6th[1]
October 2019 27