Bioinfo

  • Uploaded by: api-19917497
  • 0
  • 0
  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Bioinfo as PDF for free.

More details

  • Words: 692
  • Pages: 23
The Poor Beginners’ Guide to Bioinformatics

What we have – and don’t have...  a computer connected to the Internet (incl. Web browser)  a text editor (Notepad or better)  public databases of genomic sequences  public databases of cDNA + EST  public databases of protein sequences, structures and motifs  money for specialised software packages  public servers capable of (almost) anything we wish to do

Dealing with a sequence: model tasks • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

Notes on basic sequence handling  Make sure you have the correct format.  FASTA format is (almost) always correct. >sequencename thisisasequenceinfastaformat

 If not, you can always use raw data.  If things don’t work, check for gaps in sequence, empty lines, and file extension.  BEWARE OF MICROSOFT!

Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

Defining a gene family… • By overall domain structure FH3?

FH1

FH2

• By domain sequence

• Based on a peptide motif

L-X-X-G-N-X-[ML]-N

Sequence comparison-based searches • Entrez “related sequences”  easy identification of “false starts”  no organism selection

• BLAST/FASTA  all DNA/protein combinations  taxonomy selection possible  statistical data provided  domain structure comparison available  divergent motifs may be missed

Two methods are better than one.

Notes on all sequence comparisons, searches, alignments…  Start with defaults (the authors know what they are doing)…  … BUT don’t be afraid to vary the parameters  Chose a reasonable scoring matrix: Distant sequences: low BLOSUM, high PAM Closely related sequences: low PAM, high BLOSUM

Motif-based searches  sensitive  no statistics  only protein databases can be searched

• TAIR PatMatch  Arabidopsis - specific  Problematic user interface

• ISREC - INSECTS  admirable technology  access to SwissProt and TrEMBL  no organism selection

Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

Some genes are more alike than others… • A number of splicing prediction servers available • Agreement of different methods is a good sign but no absolute measure • Always align ESTs if possible • Beware of non-conventional intron boundaries (GC-AG instead of GT-AG) • Plant data for transcription start/factor binding sites prediction are limited

Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

Searching for known domains/motifs

• Searching for PROSITE patterns – allowing ambiguities

• PROSITE and Pfam profile searches • SMART, CDsearch (domains and more)

Predicting protein localisation

• predicting signal peptides/anchors • 2 methods available • possibility to predict organelle localisation

• transmembrane segments prediction

Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

Alignment: “manual” or automated?

 locally installed, free, for Mac and PC  interactive domain definition  statistical data provided  may produce falsepositive blocks (read the on-line manual!)

 “objective” results  a number of servers available  recommended for wellconserved proteins  empiric parameters (e.g. gap penalties)  bad for divergent sequences

Phylogenetic analyses  Two methods are better than one.  Your phylogeny cannot be better than your alignment.  Gaps are no data.  Allways do bootstrapping (100-500 cycles)  Certain questions cannot be answered from an unrooted tree.

Points to take off... • go to the Bioinformatics page http://www2.rhul.ac.uk/~ujba110/Bioinfo.htm

• select your exercise (A,B,C,D,E) • … and enjoy it! If you mean it seriously: • create your own bookmarks (seed provided on the course web page)

Related Documents

Bioinfo
June 2020 2
Bioinfo
July 2020 3
Bioinfo-skript_2007
June 2020 4
Birth Of Bioinfo
December 2019 4
Using Bioperl For Bioinfo
December 2019 4
Bioinfo Soft Copy
June 2020 8