Presentation Pune

November 2019
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Presentation Pune as PDF for free.

More details

Words: 881
Pages: 18

Preview
Full text

CARL

Probabilistic Cellular Automata Model for Identification of CpG island in DNA string

carl

Soumyabrata Ghosh

Cellular Automata Research Lab Group Shiladitya Munshi | Nirmalya S. Maiti | Soumyabrata Ghosh

Salt Lake City | Kolkata | www.carltig.res.in

carl

Agendas •

Background

•

CpG Island Identification by Hidden Markov Model & its drawbacks

•

Probabilistic Cellular Automata (PrCA) & its Rule Vector Graph

•

PrCA Model for CpG island identification

•

Filter Design to reduce false positive predictions

•

Experimental results

Background •

Identification of CpG island in DNA string • Higher concentration of C-G dinucleotides • Probable “start” regions of genes

•

De facto standard: Hidden Markov Model • Based on Markov chain with transition & emission probabilities • Maximum probable path prediction by Viterbi Algorithm

carl

•

Disadvantages of HMM: • • • •

States are assumed to be independent of each other Local Maxima Over training False positive prediction

Hidden Markov Model (HMM) • • •

•

carl

•

Based on Markov Chain Probability of a state depends only on previous state Transition Probability Matrix • Transition between underlying states (A+, T+ , G+, C+ for CpG island and A-, T-, G-, C- for non-CpG) Emission Probability Matrix • Emission of expressed symbol (A, T, G, C) with respect to underlying state Viterbi Algorithm – Initializes the probability calculations by taking the product of the initial Transition probabilities with the associated emission probabilities. – Determines the most probable path through the underlying states of the HMM

carl

Our Approach •

Design of Probabilistic CA based model

•

Encoding of nucleotides by 3 bits binary string

•

Derivation of Probabilistic Rules from Transition Probability Matrix

•

Model Traversal by Viterbi algorithm

•

Feature Extraction from encoded DNA string

•

Filter design based on Feature Extraction to reduce false positive predictions

Probabilistic Cellular Automata (PrCA) •

Cellular Automata: discrete dynamical system. Space, time, and the states of the system are discrete.

•

The next state function of ith cell depends on current state of (i-1)th, ith, (i+1)th cell. The next state function is represented by CA Rule.

•

Three Neighborhood Periodic Boundary Probabilistic CA

carl

An n cell CA with Rule Vector •

Rule Mean Term (RMT): Minterm of a 3 variable Boolean function for a 3-neighborhood CA cell denoted as T= {T(m)}, (m=0 to 7)

•

In Probabilistic CA, for a given RMT, the next state value is 0 or 1 with probability less than 1

Probabilistic CA Rules

carl

Deterministic & Probabilistic Rule 232

carl

Rule Vector Graph (RVG) of PrCA •

Efficient data-structure for characterization of CA evolution

•

RVG of n cell CA consists of n subgraphs

•

The output of ith subgraph serves as input node of (i+1)th subgraph

•

Nodes contain possible RMTs derived from RMTs of previous level

•

Root Node & Sink Node contain all possible RMTs for periodic boundary CA

•

An edge explicitly denotes probability of 1 whereas the probability of 0 is implicit

RVG Structure of Probabilistic Periodic Boundary 3 Neighborhood CA

carl

Probabilistic Rule Table

RVG Traversal • 2n possible number of paths for n cell PrCA • Forward RVG Traversal • Most probable next state identification

• Backward RVG Traversal • Most probable transition identification

• Forward/Backward Traversal

carl

• Most probable path determination through the RVG

carl

Pr CA Model for CpG island Identification •

3 cell CA can generate 8 states

•

Each of 8 states represents one of underlying states (A+,G+, T+, C+,A-, T- G-, C-)

•

One RVG represent single underlying state transition

•

Sequence of N such transitions can be modeled by a network of N RVGS

•

Most probable path determination by N sequential traversal of the RVG using Viterbi Algorithm

carl

RVG Network of CA Model & Model Traversal

Synthesis of Probabilistic Rules • •

Determination from Transition Probability Matrix by permuting binary encoding Lowest conflict encoding is taken

carl

Probabilistic Rules applied on three bits of binary encoding of underlying states

Three bit encoding for 8 under lying states

Filter Design •

Two Binary slices derived from DNA sequence by using common two bit encoding of nucleotides: C(00), A(01), G(10), T(11)

•

Feature Extraction from binary string of traing DNA sequences based on experimentally known CpG/non-CpG differentiation

carl

– – –

•

Class separation using Bhattacharyya distance Window size 200 bits (lowest difference between 2 classes) K-means Clustering to produce representative Datapoints

Analysis of feature vectors of testing sequence to assign 4th bit encoding –

– –

Using Mahanalobis Distance Measure

1 for window belongs to CpG 0 for window belongs to non-CpG

carl

Experimental Results

Prediction of CpG Islands on AF111167 gene sequence. The model and filter are trained by CPGISLE dataset

Inferences of experimental results • HMM prediction: » 4 False Positive Predictions » 2 False Negative Predictions

• Pr CA Model (with Filter):

carl

» 1 False Positive Prediction » 2 False Negative Predictions

carl

Concluding Remarks •

Probabilistic CA based Model for DNA sequence analysis

•

Improvement of CpG island prediction by reducing False Positive predictions

•

Analysis of binary encoded DNA string may provide better insight of local interaction of nucleotides

carl

Thanks !

Cellular Automata Research Lab Group Shiladitya Munshi | Nirmalya S. Maiti | Soumyabrata Ghosh

Salt Lake City | Kolkata | www.carltig.res.in

Presentation Pune

Overview

More details

Related Documents

Presentation Pune

Pune

Energy Pune

Pune Pmt

M.com From Pune Univ

Pune Trackwork - 6th[1]