BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
DNA Sequencing and Sequence Analysis Further Readings: “Genome II” by T.A. Brown, Ch. 6; “Gene Cloning and DNA Analysis” by T.A. Brown, Ch. 10; “Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins” by A.D. Baxevanis and B.F.Francis Ouellette (1998), Ch. 7, 11; www.ncbi.nlm.nih.gov
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Two Major Methods • Chemical degradation (Maxam and Gilbert, 1972) – double-stranded DNA – no primer is need (i.e. prior sequence information is not necessary) – involve toxic chemicals – hard to automate • Chain termination method (Sanger et al., 1977) – single-stranded DNA as template – based on enzymatic synthesis – random chain termination by dideoxynucleotides – relatively easy for automation
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Chemical Degradation Double-stranded DNA End labeling Denature into single-stranded DNA; chemical cleavage
C
C C Cleave at C
G Cleave at G
G G
T Cleave at G and A C C C Cleave at C and T T
G G G A A
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Chemical Degradation C C&T
G
G&A C T C G G C G T A G T C T G A
3’
Assume 5’ end-labeling
5’
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Chain Termination Double-stranded DNA Denature or replicate (in filamentous phage) into single-stranded DNA
5’
3’
Add primer, dNTPs and didexoynucleotides; New chain randomly terminated at A, C, G, or T
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Chain Termination 5’
3’ ddA
5’
3’ ddA
5’
3’ ddA
5’
3’ ddC
5’
3’ ddC
5’ ddC
3’
• Add primer, dNTPs and didexoynucleotides; • New chain randomly terminated at A, C, G, or T
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Chain Termination A
C
G
Template Sequence
T A C A G G A T C T T C A C G T
3’
5’
5’
3’
T G T C C T A G A A G T G C A
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Thermal Cycle Sequencing ddA ddA
ddA
ddC ddC ddC
• Starting with ds DNA templates; • PCR with just one primer in the presence on ddNTPs; • The number of chainterminated strands increase as more cycles are carried out
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Automated DNA Sequencing ddC ddC ddC ddT ddT
ddG ddG ddG ddA ddA
AGTGCCACGT
• Use fluorescently-labeled ddNTPs in chain termination reactions; • Detect by an imaging system
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Pyrosequencing 5’ +A +C +G +T 5’ +A +C +G
3’
• Rapid, no separation of products, no ddNTPs degraded • Addition of a dNTP releases degraded degraded a pyrophosphate molecule, chemiluminescence reaction with sulfurylase to 3’ form a flash of T chemiluminescence degraded • dNTPs added one by one degraded • Unused dNTPs degraded by chemiluminescence nucleotidase
5’
3’ GT
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
DNA Sequencing by Gene Chips ACTACCGATC CTACCGATCC TACCGATCCG ACCGATCCGA ACTACCGATCCGA
• A gene chip carries every possible 10-mer oligonucleotides • Hybridize with target DNA • Align sequence of oligos that give positive signals • For 10-mer: 1,048,576 spots can sequence 1 Kb • For 8-mer: 65,536 spots can sequence 256 bps
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Analyzing the Sequenced Genes •
•
• • • •
•
Structure prediction – Secondary structure of DNA and RNA – Possible 3-D structure of proteins Identity of the encoded gene/gene product – Prediction of general physical properties (e.g. M.W., pI; may be important for proteonomic analysis) – Database (e.g. Genbank) search based on sequence homology Possible function of the encoded gene product – Search for signature domains or function motifs using consensus patterns (based on statistics) Possible location of the encoded gene product – Prediction of subcellular localization by consensus patterns Prediction of evolutionary relationship – Multiple alignment, clustering, etc. Gene prediction from genomic sequences – Prediction for coding regions and location of introns – Prediction for promoter regions Prediction of regulatory sites – Prediction of consensus cis-acting regulatory elements
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Homologues • Paralogues: related genes from gene duplication in the same genome; may diverge to play different roles. • Orthologues: homologues in different species.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Align and Compare Sequences IAMABIOSTUDENTTAKINGMARINEANIMALPLANT
IAMABIOSTUDENTTAKINGMOLGENANIMALPLANT IAMAMBTSTUDENTTAKINGMOLGENMETHODDIVER
IAMAMBTSTUDENTTAKINGMOLBIOMETHODDIVER
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Align and Compare Sequences IAMABIOSTUDENTTAKINGMARINEANIMALPLANT Gp1 IAMABIOSTUDENTTAKINGMOLGENANIMALPLANT IAMAMBTSTUDENTTAKINGMOLGENMETHODDIVER Gp2 IAMAMBTSTUDENTTAKINGMOLBIOMETHODDIVER Consensus
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BLAST Search • www.ncbi.nlm.nih.gov/ • Basic Local Alignment Search Tool • Uses heuristic algorithm which seeks local (instead of global) alignments; able to detect relationships among sequences which shares similarity only in isolated regions
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641
aagcgaacgt cttgctcttc gttagggttg tttaacttag gttttggtca ctcaaaaatg ttggcgacga ttatctttga aagggaatta gatgctgatg atctatatcg aatcagctaa attcttgttt gcttggatcc atgcaagggg acttcaagat atcggtttac gtaagtggta catagtagat gagacatttg ggtggtagtt agacaccgtg ggaaacaagg gtctctggat gaattcatac gacaagtatg gattttactt caagggatgt ttcttcgtct caaggctctt tttgctcaca tttgtggtgt tctcgcatgc aagctcggat aatcatgtca ttcgtaatga tcggatttgg aaagacatgg gaggaagttg ttgcgcggtt tttcatacgc aaagtgatcc acaaagtaca tcacatgtaa a
tacagagcta tcttttccgg gattggtggt cgctttcaga gagactccca caaaagcgga ttagcgaaaa agaaatacga caagtctcat attggagtga ctcgttctgc ggaagcttaa ctcgtctctt tcactgcaag tcattggttt tgaggaaacg gcgcacacga aagttgaagc tcaagggttt aaatcgtgaa ttagccaaag ttttggcaga tcccgcatct tctgcgtaga cttaccgtgg atgcagcagt tgccgtacac ggactttctt tgaccgggat ggggacaaca gggagaagct taatattgac aattaaatca ccattaatgc tcaatgaaat cagatagagt ttcctaaagt agaaaaaatg cctctaccaa tgttcatcat gtcaagaggt atcattcata cttcgaaaca taatttaact
tacaagaaat agtagtagct tgacttgagt tttctatggc aggagacccg agccattgtt agctaaagtt taactttatt acaagatttc gagtttgcaa ttcttttgca ggtctcaaga ccaatgtgta aaccatgaac caaatcttac tatgggagat tatcgcttgt atcttcgaat gagtggtgac tattggaaga aagacagatt gaaaggtgaa agtgtcggtg ggttttcaag aaacaatgac tggtgatatc tgacattggt tgatcctttt tgttgtttgg acttagtatg acagaaaatg ttcaagttac tcagatggtt agttgaggcc accttatctc gactaatacc atcgcgagaa gtttcaaaag cgacgatgat tgcgggagct atcacgacta agctctacta aatgtcacat cacgtaaata
tatatggaga gctccaagcg tctattcaag atcaacaatg atcattgctc ggtgcacaat ccggtcatat caatggacgc agttgtaaat atattggttg gtctcatcat gcatcggttt gagaagttag tacttggaac atccctgtat gatacagaaa attctagcaa gtatcagctg atccaaatct gaaaaacaga gtttggcctg aagaaggtgc cgtcctgatc acttgcattg aatcttgctt accatcactt attggaatcc gaaaaatcct ttggttgaac atgctctggt tcatcaagat agcgcaaact ttcgggggat tatgcacaac agtatcctta aatggctttg atcgcgaagc ctggattcac gaggcatcta gctcatgttc tgcaccaaac tagcaattga gtaatacttg ctaaagtgat
ttctgttttc acgatgatgt gcaagattct gataccgaac ttgccgccgc cattacaaga ctactttctt atgatactac cggttgtggt agaattttca caggagaaaa ttgtggtgca gtttgatgga attttgcaat ctgaagaagt cagagcattc atgcagtaga atcttctgga ctgacaacaa gaaggatagg gcaggtctcg ttagggtctt ctgaaacagg ctccttttaa atctactttc ccaacagatc tgacagtaaa tgtggctagc ggcccgttaa ttggattctc tcttagtcat tgacatcaac ctacgacgtc ttttgcgaga tcggaaatta gctttatgtt taagatcatt taaatgtaca agcgattcac tcgtactagc ttcaaagctt tggaggactc gttttttttc tcacccaaaa
tatttccatt tttcgaagag ggaaacttct cagagtctct tactgatctt ggcaaagctt gccaaacacg atcagaggct tatatacgag agataaagga tcatatgatg tatgtccgag agaagcgttc aactaggtcg taagaatttt tagtgtaatc gaagttcagt tacaattaga atttatctca attatggagt taagatccca agttaccgca tgttaatact ctacgagctt tactcagaga tttgtatgtt aaagaaaagc gagtggagct tccggagttt taccattgtg agtttgggtt caagaccatt aatgactgcg tggaactctt tccgaatgat ccagaaaggt gggaatgttg ttccaacact cttccgtgag cctacatctc ctataagtaa ataagtaaca ccgtttaaat aaaaaaaaaa
Sample DNA Sequence
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Sample Protein Sequence MEILFSISILALLFSGVVAAPSDDDVFEEVRVGLVVDLSSIQGK ILETSFNLALSDFYGINNGYRTRVSVLVRDSQGDPIIALAAATDLLKNAKAEAIVGAQ SLQEAKLLATISEKAKVPVISTFLPNTLSLKKYDNFIQWTHDTTSEAKGITSLIQDFS CKSVVVIYEDADDWSESLQILVENFQDKGIYIARSASFAVSSSGENHMMNQLRKLKVS RASVFVVHMSEILVSRLFQCVEKLGLMEEAFAWILTARTMNYLEHFAITRSMQGVIGF KSYIPVSEEVKNFTSRLRKRMGDDTETEHSSVIIGLRAHDIACILANAVEKFSVSGKV EASSNVSADLLDTIRHSRFKGLSGDIQISDNKFISETFEIVNIGREKQRRIGLWSGGS FSQRRQIVWPGRSRKIPRHRVLAEKGEKKVLRVLVTAGNKVPHLVSVRPDPETGVNTV SGFCVEVFKTCIAPFNYELEFIPYRGNNDNLAYLLSTQRDKYDAAVGDITITSNRSLY VDFTLPYTDIGIGILTVKKKSQGMWTFFDPFEKSLWLASGAFFVLTGIVVWLVERPVN PEFQGSWGQQLSMMLWFGFSTIVFAHREKLQKMSSRFLVIVWVFVVLILTSSYSANLT STKTISRMQLNHQMVFGGSTTSMTAKLGSINAVEAYAQLLRDGTLNHVINEIPYLSIL IGNYPNDFVMTDRVTNTNGFGFMFQKGSDLVPKVSREIAKLRSLGMLKDMEKKWFQKL DSLNVHSNTEEVASTNDDDEASKRFTFRELRGLFIIAGAAHVLVLALHLFHTRQEVSR LCTKLQSFYK
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Go to the ncbi website; Enter BLAST program
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
blastn: good for high score search; not for comparison of distant relationship blastp: use substitution matrix to find distant relationship; can use SEG to filter low complexity region blastx: use for new DNA sequences and analysis of ESTs tblastn: search for coding regions that are not defined in the database tblastx: use for analysis of ESTs
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Paste your sequence and choose one database
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Nucleotide Database
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Protein Database
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Bit Score The value S' is derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches. E Value Expectation value. The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Perform blastp search using predicted protein sequence
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
CDD Search Compares protein sequences to the Conserved Domain Database. The CDD is a database containing a collection of functional and/or structural domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI. Matrix A key element in evaluating the quality of a pairwise sequence alignment is the "substitution matrix", which assigns a score for aligning any possible pair of residues.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
PSI-BLAST Position specific iterative BLAST refers to a feature of BLAST 2.0 in which a profile (or position specific scoring matrix, PSSM) is constructed (automatically) from a multiple alignment of the highest scoring hits in an initial BLAST search. The PSSM is generated by calculating positionspecific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. The profile is used to perform a second (etc.) BLAST search and the results of each "iteration" used to refine the profile. This iterative searching strategy results in increased sensitivity.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
PSSM Position-specific scoring matrix. Based on a Profile (A table that lists the frequencies of each amino acid in each position of protein sequence. Frequencies are calculated from multiple alignments of sequences containing a domain of interest). The PSSM gives the log-odds score for finding a particular matching amino acid in a target sequence.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam