1
IN-SILICO TARGET IDENTIFICATION AND VALIDATION
Report Submitted for the final project entitled as “IN SILICO TARGET IDENTIFICATION AND VALIDATION” in VIIth SEM of our B.tech program. Submitted to: Dr. P.K. Nayak, Dept. of Bioinformatics, J.U.I.T Submitted by: Hari Krishna.Y 31550 Praveen.G 031559
Sign of acceptance: (Dr. P.K. Naik)
1
2
About in silico target identification and validation: The pharmaceutical industry relies on numerous well-designed experiments involving high-throughput techniques and in silico approaches to analyze potential drug targets. These in silico methods are often predictive, yielding faster and less expensive analyses than
traditional
in
vivo
or
in
vitro
procedures.
In Silico Technologies in Drug Target Identification and Validation addresses the challenge of testing a growing number of new potential targets and reviews currently available in silico approaches for identifying and validating these targets. This project emphasizes computational tools, public and commercial databases, mathematical methods,
and
software
for
interpreting
complex
experimental
data.
Covering issues that range from prescreening target selection to genetic modeling and valuable data integration, In Silico Technologies in Drug Target Identification and Validation is a self-contained and practical guide to the various computational tools that can accelerate the identification and validation stages of drug target discovery and determine the biological functionality of potential targets more effectively.
Need of new targets: • The continuous use of similar drugs (chemicals) has lead to adaptation by the target organisms to gain resistance to the drugs used. • Present targets have led to side effects due to similarity between the host and the pathogen.
2
3
Advantages of in silico methods: • In traditional drug designing approach mostly we do hit and trail method which consumes a lot of time as well as recourses so to keep pace and reduce the cost as well as the development time we take the advantage of high computational computers nothing but in silico approach. • Short-gun and Clone by Clone method revolutionized the world of molecular biology. As a result large genomic data has been digitalized and stored in various public domain databases but we don’t have the best ways to mine the data available so to make the advantage of digital genomic data available we use in silico methods and more over digital manipulations, storage are quick and cheep.
Primary AIM: Identification of therapeutic targets and their validation using in-silico methods.(Bacterial pathogens)
Secondary AIM: Generalizing the process for various types of pathogenic organisms and developing the tool by simulating all the steps.
Databases Required: • • •
Human genome from Human genomic Database(H.G.D) Pathogen genome from TIGR and NCBI Data base of Essential Genes (DEG) for bacteria. http://tubic.tju.edu.cu/deg1 • Genomic Data base of Nematodes (nematode.net) and WORM Base. • KEGG Database for unique poly peptides w.r.t organism. • Database of essential Genes for fungi, nematodes, and protozoan’s had to be developed by our own.
Databases need to be developed: Essential Genes Database for: • Fungi • Nematodes • Protozoan’s
Tools required:
3
4
• • • • • • • • •
Clustal X for Multiple sequence alignment. BlastX server for searching the DEG’S developed. Molecular modeling tools (MOE or Insight II). Castp for Pocket and active site identification. Targetp, Psort Tmpred to predict the transmembrane proteins. 3D alignment tools like VAST. Promoter prediction tools. Genes to protein and protein to gene translators.
Programming languages required: • • •
Java for coding and tool development. HTML for presentation. Piping languages Perl/Java.
Database software required: • •
MySql/Oracle PHP for connectivity.
Approaches for target identification: •
Subtractive genomics: Availability of genome sequences of pathogens has provided a tremendous amount of information that can be useful in drug target and vaccine target identification. One of the recently adopted strategies is based on a subtractive genomics approach, in which the subtraction dataset between the host and pathogen genome provides information for a set of genes that are likely to be essential to the pathogen but absent in the host. This approach has been used successfully in Pseudomonas aeruginosa recently.
Ref: In Silico Biology 6, 0005 (2006); ©2006, Bioinformation Systems e.V. {In silico identification of potential therapeutic targets in the human pathogen Helicobacter pylori} •
Comparative Metabolomics(Unique polypeptides): 4
5
KEGG database has an unique option whereby it provides the user with unique proteins in an organism upon subsequent querying with respect to an organism. Thereby these proteins after being retrieved from the KEGG database could be further targeted as potential drug targets. After filtering these proteins these are further validated for the drug target process.
Screening and validation: As a result of Subtractive genomics and Comparative Metabolomics we will get a huge set of probable targets but we cant effort to process all the targets till the final step of drug development so we screen out highly potential and most probable targets by processing whole set through various screening steps and we can further reduce the number of resulting set by validating them.
Screening includes: • • •
•
Promoter analysis (search of strong promoters) Presence in symbiotic organisms (E.coli, Lacto bacillus) Screening for GPCR receptors and enzymes Screening for Membrane bound proteins
Validation includes: • • • • •
Secondary structure feasibility 3D superimposition Cavity and pocket analysis Binding residues We can even use gene networks
Protocol: 5
6
Step #1: Take the protein sequence (Of the essential gene)
Unique proteins (KEGG Database)
Unique w.r.t organism Compare with human genes Unique Putative therapeutic targets • The essential genes we can get from Data base of Essential Genes (DEG) for bacteria. http://tubic.tju.edu.cu/deg1 •
We translate these genes into proteins by translation tools in expasy server.
•
Then we do BLAST search with human genome for unique proteins in the pathogen.
•
Same kind of search is also available in the KEGG database server to find the unique proteins w.r.t organism.
Step #2: Screening Presence in Symbiotic organisms
Promoter analysis of the sequence
GPCR and Enzyme Identification
Membrane proteins
6
7
Presence in Symbiotic organisms (E.coli, lactobacillus): This screening method ensures that those selected are absent in symbiotic organisms as we need those organisms for our normal functioning.
Promoter analysis of the sequence: Promoters are conserved regions of any sequence. These are of use to us because they represent the functional sites of the sequence and the sequence containing the promoters is those which can express. Promoters can be classified into two types: • Strong promoter: Expression is high • Weak promoter: Expression is low So we will select the strong promoter as the expression rate will be higher in them and these can be used further for validation. GPCR Identification: Since 80% of the drugs are targeted to the GPCR or the membrane receptors, the targets which contain these receptors are more susceptible to drugs and hence we can screen them to reduce our task load. The major classes of targets are: • GPCR (80%) • Enzymes • DNA • Regulatory Out of these classes, only GPCR receptors can be considered for validation. The DNA molecules can be neglected as potential targets for drugs because DNA are present inside the nucleus or well inside the cytosol and drugs can hardly find them as potential targeting sites. Membrane proteins: Screening for Membrane bound proteins is carried out in order to ensure that the screened portion only includes the proteins that are membrane bound. These are the potential targets as they are easily susceptible to the drug attack. Screening
Validation
Final set of probable targets
Validation:
7
8 Validation is an essential part of the protocol as it reduces the load of dealing with unwanted data. We have chosen four methods to perform validation. They are: Secondary structure feasibility
3D superimposition
Cavity and pocket analysis
Binding residues
Final set of probable targets
Method of proceeding with user input sequence: User input sequence
Small sequence Search only with validated Therapeutic Targets
Complete genome Search with putative therapeutic protein
Scheduling: 8
9 Start of 7th Sem. 17th July 2006 End of Sem. 15th Nov 2006 Slot #1: July 20th to Aug 20th • • • •
Planning phase Needs (things to be added) Target Specifications Concept generation
Slot #2: Aug 20th to Sep 20th • • •
Modeling and prototyping Discussion with faculty Approval
Slot #3: Sep 20th to Oct 20th • •
System level design Detail design I
Slot #4: Sep20th to Nov 15th • •
Detail design II Detail design III
9