Bioinformatics

  • Uploaded by: KamleshGolhani
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Bioinformatics as PDF for free.

More details

  • Words: 1,582
  • Pages: 33
CCS HAU

Bioinformatics

Bio(-)informatics

Dr. Sudhir Kumar CCS HAU, Hisar [email protected]

CCS HAU

Bioinformatics

Bio = Biology/biological

Informatics = Information Science including technology

CCS HAU

Bioinformatics

What is Bioinformatics? Mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information. Bioinformatics is conceptualizing biology in terms of macromolecules and then applying “informatics” techniques to understand and organize the information associated with these molecules, on a large scale.

CCS HAU

Bioinformatics

Bioinformatics • Bioinformatics is the application of information technology to analyze, process, and manage biological data. • Bioinformatics provides computational tools to facilitate the process of Data

Information

Knowledge

Discovery

CCS HAU

Bioinformatics

Suggestive Biology-Language Homologies • Cell Nucleotide Bases Amino Acids Exons Folding Proteins Protein Circuits Biological Functions Regulation of gene expression

• Human Language Alphabet Words Phrases Syntax Word Senses Sentences Semantics Language generation

CCS HAU

Bioinformatics

Overview • Biological databases are being produced at a phenomenal rate • As a result computers are becoming indispensable for biological research • Aims 1- organize data 2- develop tools 3- use tools to apply to biology

CCS HAU Bioinformatics

-Genome and protein databases -aligning sequences -searching -visualizing protein structure -homology modeling -molecular mechanics and molecular dynamics -structure prediction -docking -drug design -metabolic pathways -NMR and x-ray crystallography and many more ….

Bioinformatics

CCS HAU

Bioinformatics

Definitions: Biocomputing and computational biology are synonyms and describe the use of computers and computational techniques to analyze any type of a biological system, from individual molecules to organisms to overall ecology. Bioinformatics describes using computational techniques to access, analyze, and interpret the biological information in any type of biological database. Sequence analysis is the study of molecular sequence data for the purpose of inferring the function, interactions, evolution, and perhaps structure of biological molecules. Genomics analyzes the context of genes or complete genomes (the total DNA content of an organism) within the same and/or across different genomes. Proteomics is the subdivision of genomics concerned with analyzing the complete protein complement, i.e. the proteome, of organisms, both within and between different organisms.

CCS HAU First “Behind the Screen” •

Biological databases are largely devoted to search.

– Also, integrity, security, etc. •

Search means taking a query and retrieving some database entry that matches it.



Efficiency is a key; want to find things fast, regardless of how big the database gets.

Bioinformatics

CCS HAU

Bioinformatics

Rate of growth

CCS HAU

Bioinformatics

Bioinformatics: post-genomic era 

High-throughput technologies generate petabytes of data Sequencing, Microarray, Recombinatory chemistry, High throughput screening, Mass spectroscopy, …



Rapid growth of data and databases in the public and private domains Genomics, Gene expression profiles, Proteomics, Pharmacogenomics, Clinical trials, Literature, …



Proliferation of computational tools for data analysis and processing Statistical analysis tools for sequence analysis and gene finding, Clustering algorithms, Protein folding and structure predictions,Drug docking, Visualization tools, Data mining tools, …

CCS HAU

Bioinformatics

The Promises • Digitization of the biological systems and processes Simulation and Modeling of protein-protein interactions, protein pathways, genetic networks, biochemical and cellular processes, normal and disease physiological states,…

• Blurring of the boundary between experimentally generated data and computational data search and analysis • In silico discovery in complement with wet lab experiments

The Landscape of Biological Data Sources PRINTS

Patent USPTO PFAMB

BLOCKS

PIR

PFAMA PROSITEDOC

LOCUS LINK

NRL3D

DOMO

Patent JPO

SWISSFAM PROSITE

GENEPEPT

Patent PCT

TFCLASS

Medline

TREEMBL

TFMATRIX

PRODOM

UNIGENE

TFSITE

EMBL DSSP

DDBJ

DBSTS

GSDB

TFCELL

TIGR

SWISSPROT Entrez PDB GENBANK

RHDB

TAXONOMY

EBI

Celera

GENETICCODE HUGO

GDB

SNP

WIT

Fly Base

OMIM Clinical DB

KEGG dbSNP Contact

SNP Consortium

Microbial Genomes

STKE ENZYME

FASTA BLAST

dbSNP Population

SSEARCH

C. Elegans

CLUSTALW

CCS HAU

Bioinformatics

Databases are of two types - Primary & Secondary PRIMARY DATABASES



• •

Primary source of information and can be consider as reservoir of sequence information. Primary repository for the newly discovered sequence. e.g. Genbank at NCBI, EMBL, DDBJ

SECONDARY DATABASES









These databases derives the information by resolving the primary databases. They express any particular attribute of the primary databases. ( like motif, pattern etc.) They add the value to the information present in the primary databases. Eg., pfam, BLOCK, prints etc.

CCS HAU

Bioinformatics

Primary Nucleotide Repository • NCBI • EMBL • DDBJ

( http://www.ncbi.nlm.nih.gov) (http:// www.ebi.ac.uk/embl) (http://www.ddbj.nig.ac.jp/)

Primary Protein Repository • PIR • Swissprot/Uniprot • Protein Data Bank

(http://pir.georgetown.edu) (http:// www.ebi.ac.uk/swissprot) (http://www.rcsb.org/pdb)

CCS HAU

Bioinformatics

Secondary ‘pattern’ databases PROSITE PRINTS Pfam Profiles BLOCKS IDENTIFY

SWISS-PROT SWISS-PROT/TrEMBL SWISS-PROT/TrEMBL SWISS-PROT PRINTS/InterPro/Domo PRINTS/InterPro

Regular expressions (patterns) Aligned motifs (fingerprints) Hidden Markov Models (HMMs) Weight matrices (profiles) Weighted motifs (blocks) Permissive regular expressions

CCS HAU

Bioinformatics

NUCLEOTIDE REPOSITORY • • •

EMBL- European Molecular Biology Laboratory, at Cambridge, UK. GENBANK- at NCBI, a division at NIH campus, USA. DDBJ- DNA Data Bank of Japan, Mishima, Japan

• Since 1982 Work in collaboration. • Collect information from their region. • Automatically update each other every 24 hours. To organize huge amount of information, the database has been split into numerous divisions (17) and each division has specific 3-letter code. e.g.

Human Virus Fungi

HUM VRL FUN

CCS HAU

Bioinformatics

NCBI EMBL

Bioinformatics Centre, BISR, Jaipur

DDBJ

18

CCS HAU

Bioinformatics

The Biological data and databases 

Complex data types range from protein and nucleic acid sequences, texts, 3-dimensional molecular structures, images of cells and tissues



Hierarchical data organizations range from molecules, biochemical pathways, cells, tissues, organisms, populations



Heterogeneous database locations, storage formats, and access methods



Dynamic data contents and database schema are constantly changing

CCS HAU

Bioinformatics

The computational tools and algorithms 

Input/Output data formats Each application program requires specific I/O data formats that may impede data flow from one program to the next



Rapidly evolving New algorithms development and improvement of old ones



Require graphical display or presentation of results viewers for sequence alignments, 3-D structures, multidimensional plots,…

Integration Data Data Bases Bases and and Scientific Scientific Algorithms Algorithms Medline Medline (Asn.1) (Asn.1)

Microarray Data (RDBMS, Excel)

BLAST BLAST (FASTA) (FASTA)

OMIN (Text File)

Integration Integration

BioInformatics BioInformatics KEGG (HTML Text, Binary Images)

Entrez/NCBI Entrez/NCBI (Asn.1) (Asn.1)

ClustalW (FASTA)

PDB PDB (Oracle, (Oracle,3D 3Dimages) images)

CCS HAU

Bioinformatics

Examples of Bioinformatics • • • • • • •

Database interfaces – Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, … Sequence alignment – BLAST, FASTA Multiple sequence alignment – Clustal, MultAlin, DiAlign Gene finding – Genscan, GenomeScan, GeneMark, GRAIL Protein Domain analysis and identification – pfam, BLOCKS, ProDom, Pattern Identification/Characterization – Gibbs Sampler, AlignACE, MEME Protein Folding prediction – PredictProtein, SwissModeler

CCS HAU

Bioinformatics

Five websites that all biologists should know • NCBI (The National Center for Biotechnology Information; – http://www.ncbi.nlm.nih.gov/ • EBI (The European Bioinformatics Institute) – http://www.ebi.ac.uk/ • The Canadian Bioinformatics Resource – http://www.cbr.nrc.ca/ • SwissProt/ExPASy (Swiss Bioinformatics Resource) – http://expasy.cbr.nrc.ca/sprot/ • PDB (The Protein Databank) – http://www.rcsb.org/PDB/

CCS HAU

Bioinformatics

Database Growth (cont.) The Human Genome Project and numerous smaller genome projects have kept the data coming at alarming rates. As of February 2001 45 complete, finished genomes are publicly available for analysis, not counting all the virus and viroid genomes available. The International Human Genome Sequencing Consortium announced the completion of a "Working Draft" of the human genome in June 2000.

CCS HAU

Bioinformatics

What is bioinformatics , genomics, sequence analysis, computational molecular biology . . . ? The Reverse Biochemistry Analogy. Biochemists no longer have to begin a research project by isolating and purifying massive amounts of a protein from its native organism in order to characterize a particular gene product. Rather, now scientists can amplify a section of some genome based on its similarity to other genomes, sequence that piece of DNA and, using sequence analysis tools, infer all sorts of functional, evolutionary, and, perhaps, structural insight into that stretch of DNA!

The computer and molecular databases are a necessary, integral part of this entire process.

Vaccine development In Post-genomic era: Reverse Vaccinology Approach.

CCS HAU

Bioinformatics

CCS HAU

Bioinformatics COMPND HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM

123.PDB 1 O 2 C 3 N 4 N 5 C 6 C 7 C 8 O 9 C 10 C 11 O 12 C

HETATM HETATM HETATM HETATM HETATM HETATM HETATM CONECT CONECT CONECT CONECT

24 25 26 27 28 29 30 1 2 3 4

H H H H H H H 2 1 2 2

CONECT 29 15 CONECT 30 17 END

-1.250 -2.964 0.008 -0.398 -2.223 0.438 -0.056 -1.110 -0.255 0.215 -2.505 1.614 -0.732 -0.857 -1.489 0.943 -0.166 0.171 1.170 -1.673 2.096 -0.192 0.337 -2.121 -2.208 -0.564 -1.230 1.548 -0.444 1.330 1.716 -1.925 3.144 -1.205 1.278 -2.349 -2.768 3.574 2.610 2.407 -1.351 -0.176 -2.056

3 4 5 6 7 18

0.082 0.173 0.443 1.487 1.949 2.831 4.016

-3.214 1.498 2.943 1.544 -0.315 -1.281 -0.887

CCS HAU

Bioinformatics

CCS HAU

Bioinformatics

Challenges in bioinformatics

• Explosion of information – Need for faster, automated analysis to process large amounts of data – Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…) – Need for “smarter” software to identify interesting relationships in very large data sets • Lack of “bioinformaticians” – Software needs to be easier to access, use and understand – Biologists need to learn about the software, its limitations, and how to interpret its results

CCS HAU

Bioinformatics

New areas in Bioinformatics •Microarrays •Functional Genomics •Structural Genomics •Comparative Genomics •Pharmacogenomics •Medical Informatics

What is bioinformatics?

CCS HAU

Bioinformatics

Your Turn: ANY Question(s)

Related Documents

Bioinformatics
October 2019 29
Bioinformatics
June 2020 12
Bioinformatics Companies
November 2019 26
Basics Of Bioinformatics
April 2020 11

More Documents from "Salam Pradeep Singh"