Genomics and Bioinformatics The "new" biology Brijesh Singh Yadav Bioinformatics Research Cell United Research Center Allahabad, India.
06/10/09
URC,Allahabad
1
06/10/09
URC,Allahabad
2
What is genomics
Genome All the DNA contained in the cell of an organism
Genomics The comprehensive study of the interactions and functional dynamics of whole sets of genes and their products. (NIAAA, NIH) A "scaled-up" version of genetics research in which scientists can look at all of the genes in a living creature at the same time. (NIGMS, NIH)
06/10/09
URC,Allahabad
3
Genome sequencing chronology
06/10/09
Genome size (bp)
Number of genes
Year
Organism
Significance
1977
Bacteriophage fX174
First genome ever!
5,386 11
1981
Human mitochondria
First organelle
16,500 37
1995
Haemophilus influenzae Rd
First free-living organism
1,830,137 ~3,500
1996
Saccharomyces cerevisiae
First eukaryote
12,086,000 ~6,000
URC,Allahabad
4
Genome sequencing chronology
06/10/09
Organism
Significance
1998
Caenorhabditis elegans
First multicellular organism
97,000,000 ~19,000
1999
Human chromosome 22
First human chromosome
49,000,000 673
2000
Arabidopsis thaliana
First plant genome
2001
Human
First human genome
URC,Allahabad
Genome size (bp)
Number of genes
Year
150,000,000 ~25,000
3,000,000,000 ~30,000
5
Genome sequencing projects (as of 1/26,2007)
06/10/09
URC,Allahabad
6
06/10/09
URC,Allahabad
7
Genome sequencing helps in: • Identifying new genes (“gene discovery”) • Looking at chromosome organization and structure • Finding gene regulatory sequences • Comparative genomics These in turn lead to advances in: •Medicine •Agriculture •Biotechnology •Understanding evolution and other basic science questions
06/10/09
URC,Allahabad
8
Information contents in a genome
06/10/09
Gene Protein coding genes RNA genes
Regulatory elements Gene expression control Chromatin remodeling Matrix attachment sites
“Non-functional” elements Selfish elements “Junk” DNA ?? URC,Allahabad
9
The “central dogma” of molecular biology
Central dogma Replication
DNA Transcription
RNA Translation
Protein
06/10/09
URC,Allahabad
10
Expanded “central dogma” of molecular biology
A more comprehensive view Replication
DNA Transcription
RNA Translation
Phenotype
Protein
Metabolite 06/10/09
URC,Allahabad
11
New disciplines due to the advance in genomics
Omics Replication
DNA
Genomic DNA sequences
Structural genomics
Transcript seq Microarray data Cis-elements TF binding sites Epigenetic regulation
Transcriptomics
Transcription
RNA Translation
Phenotype Genetic interactions Systematic KO Disease information
06/10/09
Protein
Shotgun protein seq Subcellular location Post-translational mod Protein interaction Protein structure
Metabolite
Metabolite concn Metabolic flux URC,Allahabad
Proteomics
Metabolomics 12
Transcription factors, binding sites, and target genes identify transcription genetic screens factors
onehybrid assays sequence motifs/homology
identify binding motif
find all motifs in genome
computational searching ChIPchip
bioinformatics (e.g., Gibbs sampling on microarray data) molecular biology using purified protein or protein extracts
identify target genes computational searching microarrays genetic screens 06/10/09
URC,Allahabad
13
Nature omics gateway
06/10/09
URC,Allahabad
14
Three perspectives of our biological world
The cellular level, the individual, the tree of life
~3x104 genes
06/10/09
~1014 cells per individual
URC,Allahabad
2-100x106 species
15
Further complications
06/10/09
Cell-cell interactions
Cell types
Environmental conditions
Developmental programming
Interactions at the organismal level
Interactions at the population, ecosystem level URC,Allahabad
16
Impact of Genomics on Medicine
06/10/09
How to characterize new diseases? What new treatments can be discovered? How do we treat individual patients? Tailoring treatments?
URC,Allahabad
17
Bioinformatics
Conceptualizing biology in terms of molecules and then applying “informatics” techniques from math, computer science, and statistics to understand and organize the information associated with these molecules on a large scale 06/10/09
URC,Allahabad
18
How do we use Bioinformatics?
• Store/retrieve biological information (databases) • Retrieve/compare gene sequences • Predict function of unknown genes/proteins • Search for previously known functions of a gene • Compare data with other researchers • Compile/distribute data for other researchers
06/10/09
URC,Allahabad
19
Example: Sequence alignment
Align retinol-binding protein and b-lactoglobulin >RBP MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRL LNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPN GLPPEAQKIVRQRQEELCLARQYRLIV >lactoglobulin MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWEN GECAQKKIIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALEKFDKALKA LPMHIRLSFNPTQLEEQCHI
1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP . ||| | . |. . . | : .||||.:| : 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin 51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP : | | | | :: | .| . || |: || |. 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin 98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP || ||. | :.|||| | . .| 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin
06/10/09
137 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV 185 RBP . | | | : || . | || | 136 QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI....... 178 lactoglobulin URC,Allahabad
20
Microarray data analysis
A simplified pipeline
06/10/09
URC,Allahabad
21
Example: Microarray
A solid support (e.g. a membrane or glass slide) on which DNA of known sequence is deposited in a grid-like fashion
06/10/09
URC,Allahabad
22
Example: Identification of cis-elements
The on-off switches and rheostats of a cell operating at the gene level. They control whether and how vigorously that genes will be transcribed into RNAs.
06/10/09
URC,Allahabad
23
Motif model: Position Frequency Matrix (PFM)
fb,i : freuqnecy of a base b occurred at the i-th position
06/10/09
URC,Allahabad
24 D’haeseleer (2006) Nature Biotech. 24:42
Final example: Relationships between sequences
Sanger and colleagues (1950s): 1st sequence
Insulin from various mammals
06/10/09
URC,Allahabad
25
The END
...
06/10/09
URC,Allahabad
26