Gene Sequencing And Analysis

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Gene Sequencing And Analysis as PDF for free.

More details

  • Words: 2,008
  • Pages: 77
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

DNA Sequencing and Sequence Analysis Further Readings: “Genome II” by T.A. Brown, Ch. 6; “Gene Cloning and DNA Analysis” by T.A. Brown, Ch. 10; “Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins” by A.D. Baxevanis and B.F.Francis Ouellette (1998), Ch. 7, 11; www.ncbi.nlm.nih.gov

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Two Major Methods • Chemical degradation (Maxam and Gilbert, 1972) – double-stranded DNA – no primer is need (i.e. prior sequence information is not necessary) – involve toxic chemicals – hard to automate • Chain termination method (Sanger et al., 1977) – single-stranded DNA as template – based on enzymatic synthesis – random chain termination by dideoxynucleotides – relatively easy for automation

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chemical Degradation Double-stranded DNA End labeling Denature into single-stranded DNA; chemical cleavage

C

C C Cleave at C

G Cleave at G

G G

T Cleave at G and A C C C Cleave at C and T T

G G G A A

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chemical Degradation C C&T

G

G&A C T C G G C G T A G T C T G A

3’

Assume 5’ end-labeling

5’

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chain Termination Double-stranded DNA Denature or replicate (in filamentous phage) into single-stranded DNA

5’

3’

Add primer, dNTPs and didexoynucleotides; New chain randomly terminated at A, C, G, or T

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chain Termination 5’

3’ ddA

5’

3’ ddA

5’

3’ ddA

5’

3’ ddC

5’

3’ ddC

5’ ddC

3’

• Add primer, dNTPs and didexoynucleotides; • New chain randomly terminated at A, C, G, or T

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chain Termination A

C

G

Template Sequence

T A C A G G A T C T T C A C G T

3’

5’

5’

3’

T G T C C T A G A A G T G C A

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Thermal Cycle Sequencing ddA ddA

ddA

ddC ddC ddC

• Starting with ds DNA templates; • PCR with just one primer in the presence on ddNTPs; • The number of chainterminated strands increase as more cycles are carried out

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Automated DNA Sequencing ddC ddC ddC ddT ddT

ddG ddG ddG ddA ddA

AGTGCCACGT

• Use fluorescently-labeled ddNTPs in chain termination reactions; • Detect by an imaging system

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Pyrosequencing 5’ +A +C +G +T 5’ +A +C +G

3’

• Rapid, no separation of products, no ddNTPs degraded • Addition of a dNTP releases degraded degraded a pyrophosphate molecule, chemiluminescence reaction with sulfurylase to 3’ form a flash of T chemiluminescence degraded • dNTPs added one by one degraded • Unused dNTPs degraded by chemiluminescence nucleotidase

5’

3’ GT

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

DNA Sequencing by Gene Chips ACTACCGATC CTACCGATCC TACCGATCCG ACCGATCCGA ACTACCGATCCGA

• A gene chip carries every possible 10-mer oligonucleotides • Hybridize with target DNA • Align sequence of oligos that give positive signals • For 10-mer: 1,048,576 spots can sequence 1 Kb • For 8-mer: 65,536 spots can sequence 256 bps

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Analyzing the Sequenced Genes •



• • • •



Structure prediction – Secondary structure of DNA and RNA – Possible 3-D structure of proteins Identity of the encoded gene/gene product – Prediction of general physical properties (e.g. M.W., pI; may be important for proteonomic analysis) – Database (e.g. Genbank) search based on sequence homology Possible function of the encoded gene product – Search for signature domains or function motifs using consensus patterns (based on statistics) Possible location of the encoded gene product – Prediction of subcellular localization by consensus patterns Prediction of evolutionary relationship – Multiple alignment, clustering, etc. Gene prediction from genomic sequences – Prediction for coding regions and location of introns – Prediction for promoter regions Prediction of regulatory sites – Prediction of consensus cis-acting regulatory elements

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Homologues • Paralogues: related genes from gene duplication in the same genome; may diverge to play different roles. • Orthologues: homologues in different species.

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Align and Compare Sequences IAMABIOSTUDENTTAKINGMARINEANIMALPLANT

IAMABIOSTUDENTTAKINGMOLGENANIMALPLANT IAMAMBTSTUDENTTAKINGMOLGENMETHODDIVER

IAMAMBTSTUDENTTAKINGMOLBIOMETHODDIVER

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Align and Compare Sequences IAMABIOSTUDENTTAKINGMARINEANIMALPLANT Gp1 IAMABIOSTUDENTTAKINGMOLGENANIMALPLANT IAMAMBTSTUDENTTAKINGMOLGENMETHODDIVER Gp2 IAMAMBTSTUDENTTAKINGMOLBIOMETHODDIVER Consensus

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BLAST Search • www.ncbi.nlm.nih.gov/ • Basic Local Alignment Search Tool • Uses heuristic algorithm which seeks local (instead of global) alignments; able to detect relationships among sequences which shares similarity only in isolated regions

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641

aagcgaacgt cttgctcttc gttagggttg tttaacttag gttttggtca ctcaaaaatg ttggcgacga ttatctttga aagggaatta gatgctgatg atctatatcg aatcagctaa attcttgttt gcttggatcc atgcaagggg acttcaagat atcggtttac gtaagtggta catagtagat gagacatttg ggtggtagtt agacaccgtg ggaaacaagg gtctctggat gaattcatac gacaagtatg gattttactt caagggatgt ttcttcgtct caaggctctt tttgctcaca tttgtggtgt tctcgcatgc aagctcggat aatcatgtca ttcgtaatga tcggatttgg aaagacatgg gaggaagttg ttgcgcggtt tttcatacgc aaagtgatcc acaaagtaca tcacatgtaa a

tacagagcta tcttttccgg gattggtggt cgctttcaga gagactccca caaaagcgga ttagcgaaaa agaaatacga caagtctcat attggagtga ctcgttctgc ggaagcttaa ctcgtctctt tcactgcaag tcattggttt tgaggaaacg gcgcacacga aagttgaagc tcaagggttt aaatcgtgaa ttagccaaag ttttggcaga tcccgcatct tctgcgtaga cttaccgtgg atgcagcagt tgccgtacac ggactttctt tgaccgggat ggggacaaca gggagaagct taatattgac aattaaatca ccattaatgc tcaatgaaat cagatagagt ttcctaaagt agaaaaaatg cctctaccaa tgttcatcat gtcaagaggt atcattcata cttcgaaaca taatttaact

tacaagaaat agtagtagct tgacttgagt tttctatggc aggagacccg agccattgtt agctaaagtt taactttatt acaagatttc gagtttgcaa ttcttttgca ggtctcaaga ccaatgtgta aaccatgaac caaatcttac tatgggagat tatcgcttgt atcttcgaat gagtggtgac tattggaaga aagacagatt gaaaggtgaa agtgtcggtg ggttttcaag aaacaatgac tggtgatatc tgacattggt tgatcctttt tgttgtttgg acttagtatg acagaaaatg ttcaagttac tcagatggtt agttgaggcc accttatctc gactaatacc atcgcgagaa gtttcaaaag cgacgatgat tgcgggagct atcacgacta agctctacta aatgtcacat cacgtaaata

tatatggaga gctccaagcg tctattcaag atcaacaatg atcattgctc ggtgcacaat ccggtcatat caatggacgc agttgtaaat atattggttg gtctcatcat gcatcggttt gagaagttag tacttggaac atccctgtat gatacagaaa attctagcaa gtatcagctg atccaaatct gaaaaacaga gtttggcctg aagaaggtgc cgtcctgatc acttgcattg aatcttgctt accatcactt attggaatcc gaaaaatcct ttggttgaac atgctctggt tcatcaagat agcgcaaact ttcgggggat tatgcacaac agtatcctta aatggctttg atcgcgaagc ctggattcac gaggcatcta gctcatgttc tgcaccaaac tagcaattga gtaatacttg ctaaagtgat

ttctgttttc acgatgatgt gcaagattct gataccgaac ttgccgccgc cattacaaga ctactttctt atgatactac cggttgtggt agaattttca caggagaaaa ttgtggtgca gtttgatgga attttgcaat ctgaagaagt cagagcattc atgcagtaga atcttctgga ctgacaacaa gaaggatagg gcaggtctcg ttagggtctt ctgaaacagg ctccttttaa atctactttc ccaacagatc tgacagtaaa tgtggctagc ggcccgttaa ttggattctc tcttagtcat tgacatcaac ctacgacgtc ttttgcgaga tcggaaatta gctttatgtt taagatcatt taaatgtaca agcgattcac tcgtactagc ttcaaagctt tggaggactc gttttttttc tcacccaaaa

tatttccatt tttcgaagag ggaaacttct cagagtctct tactgatctt ggcaaagctt gccaaacacg atcagaggct tatatacgag agataaagga tcatatgatg tatgtccgag agaagcgttc aactaggtcg taagaatttt tagtgtaatc gaagttcagt tacaattaga atttatctca attatggagt taagatccca agttaccgca tgttaatact ctacgagctt tactcagaga tttgtatgtt aaagaaaagc gagtggagct tccggagttt taccattgtg agtttgggtt caagaccatt aatgactgcg tggaactctt tccgaatgat ccagaaaggt gggaatgttg ttccaacact cttccgtgag cctacatctc ctataagtaa ataagtaaca ccgtttaaat aaaaaaaaaa

Sample DNA Sequence

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Sample Protein Sequence MEILFSISILALLFSGVVAAPSDDDVFEEVRVGLVVDLSSIQGK ILETSFNLALSDFYGINNGYRTRVSVLVRDSQGDPIIALAAATDLLKNAKAEAIVGAQ SLQEAKLLATISEKAKVPVISTFLPNTLSLKKYDNFIQWTHDTTSEAKGITSLIQDFS CKSVVVIYEDADDWSESLQILVENFQDKGIYIARSASFAVSSSGENHMMNQLRKLKVS RASVFVVHMSEILVSRLFQCVEKLGLMEEAFAWILTARTMNYLEHFAITRSMQGVIGF KSYIPVSEEVKNFTSRLRKRMGDDTETEHSSVIIGLRAHDIACILANAVEKFSVSGKV EASSNVSADLLDTIRHSRFKGLSGDIQISDNKFISETFEIVNIGREKQRRIGLWSGGS FSQRRQIVWPGRSRKIPRHRVLAEKGEKKVLRVLVTAGNKVPHLVSVRPDPETGVNTV SGFCVEVFKTCIAPFNYELEFIPYRGNNDNLAYLLSTQRDKYDAAVGDITITSNRSLY VDFTLPYTDIGIGILTVKKKSQGMWTFFDPFEKSLWLASGAFFVLTGIVVWLVERPVN PEFQGSWGQQLSMMLWFGFSTIVFAHREKLQKMSSRFLVIVWVFVVLILTSSYSANLT STKTISRMQLNHQMVFGGSTTSMTAKLGSINAVEAYAQLLRDGTLNHVINEIPYLSIL IGNYPNDFVMTDRVTNTNGFGFMFQKGSDLVPKVSREIAKLRSLGMLKDMEKKWFQKL DSLNVHSNTEEVASTNDDDEASKRFTFRELRGLFIIAGAAHVLVLALHLFHTRQEVSR LCTKLQSFYK

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Go to the ncbi website; Enter BLAST program

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

blastn: good for high score search; not for comparison of distant relationship blastp: use substitution matrix to find distant relationship; can use SEG to filter low complexity region blastx: use for new DNA sequences and analysis of ESTs tblastn: search for coding regions that are not defined in the database tblastx: use for analysis of ESTs

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Paste your sequence and choose one database

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Nucleotide Database

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Protein Database

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Bit Score The value S' is derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches. E Value Expectation value. The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score.

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Perform blastp search using predicted protein sequence

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

CDD Search Compares protein sequences to the Conserved Domain Database. The CDD is a database containing a collection of functional and/or structural domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI. Matrix A key element in evaluating the quality of a pairwise sequence alignment is the "substitution matrix", which assigns a score for aligning any possible pair of residues.

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

PSI-BLAST Position specific iterative BLAST refers to a feature of BLAST 2.0 in which a profile (or position specific scoring matrix, PSSM) is constructed (automatically) from a multiple alignment of the highest scoring hits in an initial BLAST search. The PSSM is generated by calculating positionspecific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. The profile is used to perform a second (etc.) BLAST search and the results of each "iteration" used to refine the profile. This iterative searching strategy results in increased sensitivity.

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

PSSM Position-specific scoring matrix. Based on a Profile (A table that lists the frequencies of each amino acid in each position of protein sequence. Frequencies are calculated from multiple alignments of sequences containing a domain of interest). The PSSM gives the log-odds score for finding a particular matching amino acid in a target sequence.

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Related Documents

Sequencing
November 2019 24
Gene
June 2020 30
Gene
November 2019 44