Bibm0.4

  • Uploaded by: jing xia
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Bibm0.4 as PDF for free.

More details

  • Words: 2,850
  • Pages: 44
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Exploring Alternative Splicing Features using Support Vector Machines Jing Xia1 , Doina Caragea1 , Susan J. Brown2 1 Computing and Information Sciences Kansas State University, USA 2 Bioinformatics Center Kansas State University, USA

Nov 4. 2008

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Outline 1

Background & Motivation

2

Problem & Feature Construction Problem Definition Data Set Feature Construction

3

Experiments Design & Results Experimental Design Experimental Results

4

Conclusions and Future Work Conclusion

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Alternative Splicing

Alternative Splicing

exon

intron

exon

intron

exon

DNA

Splicing: important step during gene expression Variable splicing process (Alternative splicing) one gene -> many proteins

5’UTR Trasncription

GT

AG

GT

AG

3’UTR

TSS ATG exon

intron

exon

exon

intron

pre−mNRA cap 5’UTR Splicing

GU

AG

GT

AG

3’UTR

AUG

mRNA Translation protein

Genes expression: genes to proteins

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Alternative Splicing

Alternative Splicing Splicing: important step during gene expression Variable splicing process (Alternative splicing) one gene -> many proteins

Gene

pre−mRNA Alternative Splicing

transcript isoforms

Proteins

One genes to many proteins

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Patterns of Alternative Splicing

Patterns of Alternative Splicing Exon skipping (most frequent)

Constitutively Spliced Exon (CSE)

Alternatively Spliced Exon (ASE) CSE exon1

CSE exon2

ASE exon3

CSE exon4

Alternative 5’ splice sites Alternative 3’ splice sites Intron retention Mutually exclusive

Here, focus on predicting alternatively spliced exons (ASE) and constitutively spliced exons (CSE) based on SVM

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Identifying Alternative Splicing in genome

Alternative splicing Wet lab experiments finding AS is time consuming Traditionally, align EST to genome alignments (limited to amount of EST available to the genome)

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Identifying Alternative Splicing in genome

Transcripts

Alternative splicing Wet lab experiments finding AS is time consuming

genomic DNA

Traditionally, align EST to genome alignments (limited to amount of EST available to the genome) Alternative 3’ Exon

Exon Skipping

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Identifying Alternative Splicing in genome Alternative splicing Wet lab experiments finding AS is time consuming Traditionally, align EST to genome alignments (limited to amount of EST available to the genome) Use machine learning algorithms that to predict AS at the genome level

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Problem Definition

Problem Definition: given an exon, can we predict it as alternatively spliced exons (ASE) or constitutively spliced exons (CSE)? Constitutively Spliced Exon (CSE)

Alternatively Spliced Exon (ASE) CSE exon1

CSE exon2

ASE exon3

CSE exon4

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Problem Addressed and Our Approach

Problem Definition predict alternatively spliced exons (ASE) vs constitutively spliced exons (CSE) Use Support Vector Machine (SVM) Task:Two-class (ASE and CSE) classification problem Need:Training data set containing labeled examples (ASE & CSE) Learning: Train classifier with training data Application: Predict unknown ASE Need features to represent ASEs & CSEs

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Problem Addressed and Our Approach

Problem Definition predict alternatively spliced exons (ASE) vs constitutively spliced exons (CSE) Use Support Vector Machine (SVM) Task:Two-class (ASE and CSE) classification problem Need:Training data set containing labeled examples (ASE & CSE) Learning: Train classifier with training data Application: Predict unknown ASE Need features to represent ASEs & CSEs

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Problem Addressed and Our Approach

Problem Definition predict alternatively spliced exons (ASE) vs constitutively spliced exons (CSE) Use Support Vector Machine (SVM) Task:Two-class (ASE and CSE) classification problem Need:Training data set containing labeled examples (ASE & CSE) Learning: Train classifier with training data Application: Predict unknown ASE Need features to represent ASEs & CSEs

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Problem Addressed and Our Approach

Problem Definition predict alternatively spliced exons (ASE) vs constitutively spliced exons (CSE) Use Support Vector Machine (SVM) Task:Two-class (ASE and CSE) classification problem Need:Training data set containing labeled examples (ASE & CSE) Learning: Train classifier with training data Application: Predict unknown ASE Need features to represent ASEs & CSEs

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Problem Addressed and Our Approach

Problem Definition predict alternatively spliced exons (ASE) vs constitutively spliced exons (CSE) Use Support Vector Machine (SVM) Task:Two-class (ASE and CSE) classification problem Need:Training data set containing labeled examples (ASE & CSE) Learning: Train classifier with training data Application: Predict unknown ASE Need features to represent ASEs & CSEs

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Data Set Published data set from the model organism, C. elegans (worm) Includes alternatively spliced exons (ASE) and constitutively spliced exons (CSE) Contains 487 ASEs and 2531 CSEs 100-base local sequences around splice sites Example of data set ASE ASE CSE

GTACTATAGCGTGCTG....ACCGTTCGTACTCGCT ATACTATAGCGTCTTG....ACCGATCGTACACGCT GTACTATAGCGTCTTG....ACCGATCGTACTCGCT

AG

exon

GT

AG −100

0

+100

−100

0

+100

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Data Set Published data set from the model organism, C. elegans (worm) Includes alternatively spliced exons (ASE) and constitutively spliced exons (CSE) Contains 487 ASEs and 2531 CSEs 100-base local sequences around splice sites Example of data set ASE ASE CSE

GTACTATAGCGTGCTG....ACCGTTCGTACTCGCT ATACTATAGCGTCTTG....ACCGATCGTACACGCT GTACTATAGCGTCTTG....ACCGATCGTACTCGCT

AG

exon

GT

AG −100

0

+100

−100

0

+100

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Data Set Published data set from the model organism, C. elegans (worm) Includes alternatively spliced exons (ASE) and constitutively spliced exons (CSE) Contains 487 ASEs and 2531 CSEs 100-base local sequences around splice sites Example of data set ASE ASE CSE

GTACTATAGCGTGCTG....ACCGTTCGTACTCGCT ATACTATAGCGTCTTG....ACCGATCGTACACGCT GTACTATAGCGTCTTG....ACCGATCGTACTCGCT

AG

exon

GT

AG −100

0

+100

−100

0

+100

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Data Set Published data set from the model organism, C. elegans (worm) Includes alternatively spliced exons (ASE) and constitutively spliced exons (CSE) Contains 487 ASEs and 2531 CSEs 100-base local sequences around splice sites Previous work: Motifs captured and identified by kernel G. Ratch et al., Length of exons and flanking introns Sorek et al. Our work: Exploit more biologically significant features Use several additional approaches to derive features

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Data Set Published data set from the model organism, C. elegans (worm) Includes alternatively spliced exons (ASE) and constitutively spliced exons (CSE) Contains 487 ASEs and 2531 CSEs 100-base local sequences around splice sites Previous work: Motifs captured and identified by kernel G. Ratch et al., Length of exons and flanking introns Sorek et al. Our work: Exploit more biologically significant features Use several additional approaches to derive features

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Feature List

Several features known to be biologically important Strength of splice sites (SSS) Motif features Intronic splicing regulator (ISR) Motifs derived from local sequences (MAST) Exonic splicing enhancer (ESE)

Reduced set of motif features based on locations of motifs on secondary structure (MAST-R) Optimal folding energy (OPE) Basic sequence features (BSF)

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Strength of Splice Sites SSS

ISR

MAST

ESE

MAST-R

OPE

BSF

SSS: Strength of Splice Site

CGAG

exon

AGGTAAGT

We consider all splice sites

CGAG

exon

AGGTAAGT

GGAG

exon

AGGTAGGT

CGAG

exon

AGGTTAGT

CCAG −3 +7

exon

score =

X i

log

F (Xi ) , F (X )

where X ∈ {A, U, G, C}. i ∈ {−3, +7} for 3’ splice sites (3’ss) and i ∈ {−26, +2} for 5’ splice sites (5’ss).

3’ ss

−26

AGGTAAGT +2 5’ ss

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Motif Features SSS

ISR

MAST

ESE

MAST-R

OPE

BSF

Motif: sequence pattern that occurs repeatedly in group of sequences Intronic Splicing Regulator: identified in Kabat et al. MAST: derived by MEME using [-100,+100] sequence Exon Splicing Enhancers: based on two assumption

ISR exon Illustration of ISR dispersed among sequences

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Motif Features SSS

ISR

MAST

ESE

MAST-R

OPE

BSF

Motif: sequence pattern that occurs repeatedly in group of sequences Intronic Splicing Regulator: identified in Kabat et al. MAST: derived by MEME using [-100,+100] sequence Exon Splicing Enhancers: based on two assumption

Example: a 20-base motif derived from sequences around splice sites

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Motif Features SSS

ISR

MAST

ESE

MAST-R

OPE

BSF

Motif: sequence pattern that occurs repeatedly in group of sequences Intronic Splicing Regulator: identified in Kabat et al. MAST: derived by MEME using [-100,+100] sequence Exon Splicing Enhancers: based on two assumption more frequent in exons than in introns more frequent in exons with weak splice sites than in exons with strong splice sites

ISR

MAST ESE

Motifs - dispersed among exons and introns

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

pre-mRNA Secondary Structure SSS

ISR

MAST

ESE

MAST-R

OPE

Pre-mRNA secondary structures influence exon recognition Secondary structure:

BSF

motif AUCCAUGGGCCGGAUGUGACGGUAGUAGGGUAUACGUCACAUAGGCUUCCUCUCAUGA Located at different structure

derived from Mfold filter motifs using secondary structure Loop

Optimal Folding Energy: stability of RNA secondary structure

Stem

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

pre-mRNA Secondary Structure SSS

ISR

MAST

ESE

MAST-R

OPE

Pre-mRNA secondary structures influence exon recognition Secondary structure:

BSF

motif AUCCAUGGGCCGGAUGUGACGGUAGUAGGGUAUACGUCACAUAGGCUUCCUCUCAUGA Located at different structure

derived from Mfold filter motifs using secondary structure Loop

Optimal Folding Energy: stability of RNA secondary structure

Stem

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Problem Definition Data Set Feature Construction

Sequence features SSS

ISR

MAST

ESE

MAST-R

GC content (G & C ratio), = sequence Sequence length

OPE

BSF

G+C A+U+G+C ,

characteristics of

Length of exons and length of exons’ flanking introns frames of stop codons

Summary of features Motif features Secondary structure Strength of splice sites Sequence features

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental Design Experimental Design List of previous defined features as SVM input Combination of different features to represent ASEs & CSEs

split1

split2

Tune SVM parameters to train (kernel linear, RBF.., Cost C) Choose parameters with best cross-validation (CV) accuracy Test trained SVM on testing ASEs & CSEs

20%

80%

split3

split4

split5

5−fold cross validation

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental Design Experimental Design List of previous defined features as SVM input Combination of different features to represent ASEs & CSEs

split1

split2

Tune SVM parameters to train (kernel linear, RBF.., Cost C) Choose parameters with best cross-validation (CV) accuracy Test trained SVM on testing ASEs & CSEs

20%

80%

split3

split4

split5

5−fold cross validation

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental Design Experimental Design List of previous defined features as SVM input Combination of different features to represent ASEs & CSEs

split1

split2

Tune SVM parameters to train (kernel linear, RBF.., Cost C) Choose parameters with best cross-validation (CV) accuracy Test trained SVM on testing ASEs & CSEs

20%

80%

split3

split4

split5

5−fold cross validation

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental Design Experimental Design List of previous defined features as SVM input Combination of different features to represent ASEs & CSEs

split1

split2

Tune SVM parameters to train (kernel linear, RBF.., Cost C) Choose parameters with best cross-validation (CV) accuracy Test trained SVM on testing ASEs & CSEs

20%

80%

split3

split4

split5

5−fold cross validation

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental Design Experimental Design List of previous defined features as SVM input Combination of different features to represent ASEs & CSEs

split1

split2

Tune SVM parameters to train (kernel linear, RBF.., Cost C) Choose parameters with best cross-validation (CV) accuracy Test trained SVM on testing ASEs & CSEs

20%

80%

split3

split4

split5

5−fold cross validation

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental results

Results of alternatively spliced exon classification. All features, including ISR motifs, are used. C Split1 Split2 Split3 Split4 Split5

0.05 0.05 0.1 0.01 0.1

Cross Validation Score fp 1% AUC % 32.45 86.55 39.33 88.32 37.56 87.76 40.86 89.02 36.48 87.50

Test score fp 1% AUC% 56.48 90.05 52.04 89.04 38.71 87.97 37.63 84.42 35.79 85.69

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental results 1 0.9

True Positive Rate

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Mixed-Feas (85.55%) Base-Feas(78.78%) 0

0.2

0.4 0.6 False Positive Rate

0.8

1

Comparison of ROC curves obtained using basic features only and basic features plus other mixed features (except conserved ISR motifs). Models trained using 5-fold CV with C = 1.

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Experimental results

AUC score comparison between data sets with secondary structural features and data sets without secondary structural features

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Motif Evaluation Intersection between motifs derived from sequences & intronic splicing regulators

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Motif Evaluation

Conserved ESE in metazoans (animals), Human and Mouse

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Experimental Design Experimental Results

Motif Evaluation

Comparison with A. thaliana

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Conclusion

Conclusions

Alternative splicing (AS) events can be found using transcripts Machine learning effectively used for prediction of AS events Identified features informative in predicting AS Explored comparatively comprehensive feature sets from biological point of view

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Conclusion

Conclusions

Alternative splicing (AS) events can be found using transcripts Machine learning effectively used for prediction of AS events Identified features informative in predicting AS Explored comparatively comprehensive feature sets from biological point of view

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Conclusion

Conclusions

Alternative splicing (AS) events can be found using transcripts Machine learning effectively used for prediction of AS events Identified features informative in predicting AS Explored comparatively comprehensive feature sets from biological point of view

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Conclusion

Conclusions

Alternative splicing (AS) events can be found using transcripts Machine learning effectively used for prediction of AS events Identified features informative in predicting AS Explored comparatively comprehensive feature sets from biological point of view

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Conclusion

Future Work Apply this approach to specific organism Identify motifs more accurately Refine relationships between features (2nd Structure:w and motifs) Learn other types of AS events (not only skipped exons)

adapted from "Detection of Alternative Splicing Events Using Machine Learning"

Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work

Conclusion

Thank you for your attention!

Questions? Related work RASE http://www.fml.tuebingen.mpg.de/ raetsch/projects/RASE Acknowledgement data set from Dr. Ratsch’s FML group http://www.fml.tuebingen.mpg.de/raetsch/ projects/RASE/altsplicedexonsplits.tar.gz Dr. Caragea’s MLB group http://people.cis.ksu.edu/~dcaragea/mlb Dr. Brown’s Bininformatics Center at KSU http://bioinformatics.ksu.edu

More Documents from "jing xia"

Bibm0.4
December 2019 9
November 2019 31
Anabolic.docx
December 2019 7
Leopard Shortcuts Cn
October 2019 15