CHAPTER - 1
INTRODUCTION C hro nic Myelogenous Leukaemia is a myelo-proliferative disorder. It is characterized by a biphasic or triphasic clinical course in which a benign chronic phase is followed by transformation into an accelerated and blast phase. On a cytogenetic and molecular level, most patients with Chronic Myelogenous Leukaemia demonstrate BCR-ABL fusion genes in hematopoietic progenitor cells, which result from a reciprocal translocation between chromosomes 9 and 22; this translocation leads to a shortened chromosome 22, called the Philadelphia chromosome. Translation of the fusion products yields chimeric proteins of variable size that have increased tyrosine kinase activity.
2
Conventional chemotherapy with hydroxyurea or busulfan can achieve hematologic control but cannot modify the natural disease course, which inevitably terminates in a rapidly fatal blastic phase. Since its introduction in the 1980s, allogeneic stem-cell transplantation has provided the groundwork for a cure of Chronic Myelogenous Leukaemia. However, few patients are eligible for this treatment because of donor availability and age restrictions. Therapy with interferon-a alone or in combination with cytarabine suppresses the leukemic clone, produces cytogenetic remissions, and prolongs survival. It is an effective alternative first-line treatment for patients ineligible for transplantation. New drugs active against CML may show increased activity in the transformed phases of the disease. Novel therapies and concepts are developing rapidly; targeted molecules are tyrosine kinases, ras, and messenger RNA through antisense oligonucleotides. Alternative transplantation options, such as stem cells from autologous sources and matched unrelated donors, are expanding. Immunomodulation by adoptive immunotherapy and vaccine strategies hold significant promise for the cure of Chronic Myelogenous Leukaemia. The development of the Bcr-Abl–targeted Imatinib represents a paradigm shift in the treatment of CML, because treatment with Imatinib resulted in significantly better patient outcome, response rates, and overall survival compared with previous standards. Despite this advance, not all patients benefit from Imatinib because of resistance and intolerance. Resistance to Imatinib can develop from a number of mechanisms that can be defined as BcrAbl– dependent (e.g., most commonly resulting from point mutations in the Abl kinase domain) and Bcr-Abl independent mechanisms (including the constitutive activation of downstream signalling molecules, e.g., Src family kinases), which could result in the activation of the pathway regardless of Bcr-Abl inhibition. Clearly, new treatment approaches are required for patients resistant to or intolerant of Imatinib, which can be dose escalated in patients who demonstrate resistance.
In this study, we performed an in silico approach to study the effect of the point mutations on the evident drug resistance. First of all, a data of the point mutations was collected from a client [who derived the data by PCR amplification and sequencing of the nucleotide sequence of some patients]. Then with the help of this data, mutant protein models
3
were created by homology modelling. And finally, docking of these mutant protein molecules against the drug Imatinib was done. The results were then compared with that of the pure protein [wild protein].
1. Figures of peripheral blood (left) and bone marrow (right) smears of a CML patient in chronic phase, showing leukocytosis in the peripheral blood, and hypercellularity in the bone marrow due mainly to neutrophils in different stages of maturation. In CML bone marrow, typical megakaryocytes are smaller than normal and have hypolobulated nuclei.
4
CHAPTER - 2
REVIEW OF LITERATURE 2.1 CHRONIC MYELOID LUEKEMIA- THE DISEASE: Chronic Myelogenous (or myeloid) Leukaemia (CML), also known as chronic granulocytic leukaemia (CGL), is a form of leukaemia characterized by the increased and unregulated growth of predominantly myeloid cells in the bone marrow and the accumulation of these cells in the blood. CML is a clonal bone marrow stem cell disorder in which proliferation of mature granulocytes (neutrophils, eosinophils, and basophils) and their precursors is the main finding. It is a type of myeloproliferative disease associated with a characteristic chromosomal translocation called the Philadelphia chromosome. Historically, it has been treated with chemotherapy, interferon and bone marrow transplantation, although
5
targeted therapies introduced at the beginning of the 21st century have radically changed the management of CML. Normally, the bone marrow makes blood stem cells (immature cells) that develop into mature blood cells over time. A blood stem cell may become a myeloid stem cell or a lymphoid stem cell. The lymphoid stem cell develops into a white blood cell. The myeloid stem cell develops into one of three types of mature blood cells: •
Red blood cells that carry oxygen and other materials to all tissues of the body.
•
Platelets that help prevent bleeding by causing blood clots to form.
•
Granulocytes (white blood cells) that fight infection and disease
Figure 2. Normal condition- blood stem cell development.
In CML, too many blood stem cells develop into a type of white blood cell called granulocytes. These granulocytes are abnormal and do not become healthy white blood cells. They may also be called leukemic cells. The leukemic cells can build up in the blood and bone marrow so there is less room for healthy white blood cells, red blood cells, and platelets. When this happens, infection, anaemia, or easy bleeding may occur. Most people with CML have a gene mutation (change) called the Philadelphia chromosome. Every cell in the body contains DNA (genetic material) that determines how the cell looks and acts. DNA is contained inside chromosomes. In CML, part of the DNA from
6
one chromosome moves to another chromosome. This change is called the “Philadelphia chromosome.” It results in the bone marrow making an enzyme, called tyrosine kinase that causes too many stem cells to develop into white blood cells (granulocytes or blasts). The Philadelphia chromosome is not passed from parent to child.
\figure 3. Formation of Philadelphia chromosome
7
4[a and b].Structure of the c-Bcr, c-Abl and Bcr-Abl proteins. c-Bcr comprises an oligomerization domain, a domain thought to mediate binding to SH2-domain-containing proteins, a serine/threonine kinase domain, a region with homology to Rho guanine-nucleotide-exchange factor (Rho-GEF), a region thought to facilitate calcium-dependent lipid binding (CaLB) and a region showing homology to Rac GTPase activating protein (Rac-GAP). The main phosphorylation site of Bcr (Tyr 177) is indicated. c-Abl comprises an SH3 and SH2 domain, an SH1 tyrosine kinase domain, several proline-rich domains (P), a nuclear localization signal (NLS), several DNA-binding domains (DNA BD) and an actin-binding domain. The Bcr-Abl fusion protein comprises the first four domains of c-Bcr and all the c-Abl domains except the N-terminal SH3 domain.
8
5. Mechanisms responsible for Bcr-Abl-induced malignant transformation in Ph+cells. As a consequence of the t(9;22) translocation, the regulatory regions at the NH2-terminus of cAbl are lost and replaced by the oligomerization domain of c-Bcr. This induces constitutive dimerization and autophosphorylation of Bcr-Abl, whose uncontrolled activity is responsible for alterations in the physiological processes regulated by c-Abl – proliferation, apoptosis and adherence to marrow stroma
Symptoms of CML: •
• • • •
Splenomagaly Susceptibility to infections Anaemia Thrombocytopenia Enlargement of liver etc.
Diagnosis of CML: •
Physical exam and history: An exam of the body to check general signs of health, including checking for signs of disease such as an enlarged spleen. A history of the patient’s health habits and past illnesses and treatments will also be taken.
•
Complete blood count (CBC): A procedure in which a sample of blood is drawn and checked for the following:
The number of red blood cells, white blood cells, and platelets.
The amount of haemoglobin (the protein that carries oxygen) in the red blood cells.
The portion of the sample made up of red blood cells
9 2Blood
chemistry studies: A procedure in which a blood sample is checked to measure
the amounts of certain substances released into the blood by organs and tissues in the body. An unusual (higher or lower than normal) amount of a substance can be a sign of disease in the organ or tissue that makes it. 3Cytogenetic
analysis: A test in which cells in a sample of blood or bone marrow are
viewed under a microscope to look for certain changes in the chromosomes, such as the Philadelphia chromosome. 4Bone
marrow aspiration and biopsy: The removal of bone marrow, blood, and a small
piece of bone by inserting a needle into the hipbone or breastbone. A pathologist views the bone marrow, blood, and bone under a microscope to look for abnormal cells Small proportion of patients has a clinical picture consistent with CML, but no Ph chromosome can be cytogenetically observed. In these cases the chromosomal aberrations are sub-microscopic and in conventional cytogenetic studies the cases seem to be Ph chromosome negative. These may also be called as cryptic translocations or masked Ph chromosomes. However, even though cytogenetically no abnormality may be observed, at the molecular level the pathogenic BCR-ABL fusion gene characteristic for CML is detectable. This condition is called Ph negative, BCR-ABL positive CML. The Ph negative, BCR-ABL positive cases do not otherwise differ from standard Ph positive patients except that the chromosomal mechanism of the fusion gene formation is instead of translocation most often insertion of 3´ABL or 5´BCR sequences to chromosome 22 or 9, respectively . The “real” Ph negative cases that are also lacking BCR-ABL molecular rearrangement are regarded as separate entities: as chronic neutrophilic leukaemia or atypical CML. These disorders are classified as either other chronic myeloproliferative or myelodysplastic/ myeloproliferative diseases according to WHO classification. Usually these diseases are unresponsive to tyrosine kinase inhibitors and have a poor prognosis. Because of unresponsiveness to these inhibitors the name (regardless of the prefix “atypical”) CML is slightly misleading. CML is often divided into three phases based on clinical characteristics and laboratory findings. In the absence of intervention, CML typically begins in the chronic phase, and over the course of several years progresses to an accelerated phase and ultimately to a blast crisis. Blast crisis is the terminal phase of CML and clinically behaves like an acute leukemia. One of the drivers of the progression from chronic phase through acceleration and blast crisis is
10
the acquisition of new chromosomal abnormalities (in addition to the Philadelphia chromosome). Some patients may already be in the accelerated phase or blast crisis by the time they are diagnosed.
Chronic phase Approximately 85% of patients with CML are in the chronic phase at the time of diagnosis. During this phase, patients are usually asymptomatic or have only mild symptoms of fatigue or abdominal fullness. The duration of chronic phase is variable and depends on how early the disease was diagnosed as well as the therapies used. Ultimately, in the absence of curative treatment, the disease progresses to an accelerated phase.
Accelerated phase Criteria for diagnosing transition into the accelerated phase are somewhat variable; the most widely used criteria are those put forward by investigators at M.D. Anderson Cancer Centre, by Sokal et al, and the World Health Organization. The WHO criteria are perhaps most widely used, and include: •10–19% myeloblasts in the blood or bone marrow •>20% basophils in the blood or bone marrow •Platelet count <100,000, unrelated to therapy
•Platelet count >1,000,000, unresponsive to therapy •Cytogenetic evolution with new abnormalities in addition to the Philadelphia chromosome •Increasing splenomegaly or white blood cell count, unresponsive to therapy
The patient is considered to be in the accelerated phase if any of the above are present. The accelerated phase is significant because it signals that the disease is progressing and transformation to blast crisis is imminent.
11
Blast crisis Blast crisis is the final phase in the evolution of CML, and behaves like an acute leukaemia, with rapid progression and short survival. Blast crisis is diagnosed if any of the following are present in a patient with CML: •>20% myeloblasts or lymphoblasts in the blood or bone marrow •Large clusters of blasts in the bone marrow on biopsy •Development of a chloroma (solid focus of leukaemia outside the bone marrow)
2.2 IMATINIB- THE DRUG: Imatinib (Glivec®, Gleevec™, formerly STI571 or CGP57148B, also called Imatinib Mesylate) is a selective small molecule tyrosine kinase inhibitor used in targeted treatment of CML and Ph chromosome positive ALL. Imatinib is a 2-phenylaminopyrimidine compound that in preclinical studies showed a 92-98% decrease in the number of BCRABL positive colony formation but had no inhibition on normal colonies. This observation suggested the potential utility of the compound in the treatment of BCR-ABL-positive leukaemia. The high specificity of Imatinib in inhibiting the tyrosine kinases mentioned above is achieved by its ability to bind the kinase molecule in its closed (inactive) conformation. In the closed conformation the centrally located activation loop of the kinase is not phosphorylated and therefore inactive. When phosphorylated, the activation loop extends to the open (active) conformation which enables binding of substrate molecules to the kinase and subsequently their phosphorylation. The active conformation is very similar in all known kinases. In contrast, the inactive conformation has great diversity among protein kinases, explaining the specificity of Imatinib. Imatinib occupies the ATP binding site of the BCR-ABL kinase domain and acts as a competitive inhibitor of BCR-ABL with respect to ATP. The side chain of threonine residue at position 315 (T315) forms a hydrogen bond with the Imatinib molecule. This residue is replaced by methionine in many kinases which is not able to form such a bond, which makes T315 a key element for Imatinib to inhibit BCR-ABL. When Imatinib occupies the ATP binding pocket it stabilizes the inactive form of BCR-ABL, thus preventing autophosphorylation of the kinase itself and subsequently phosphorylation of its
12
substrates. This consequently results in inhibition of the signalling cascades downstream of BCR-ABL, inhibition of cell proliferation, and eventually apoptosis.
Some facts about Imatinib: Primary Accession Number Secondary Accession Number Name Drug Type
DB00619 •
APRD01028
Imatinib • Approved •
Synonyms
Small Molecule 1. Imatinib Mesylate
Brand Names
2. Imatinib Methansulfonate 1. Gleevec
Chemical IUPAC Name Chemical Formula
2. Glivec 4-[(4-methylpiperazin-1-yl)methyl]-N-[4-methyl-3-[(4-pyridin-3ylpyrimidin-2-yl)amino]phenyl]benzamide C29H31N7O
13 Chemical Structure
CAS Registry Number Average Molecular Weight Monoisotopic Molecular Weight State Melting Point Experimental Water Solubility Predicted Water Solubility Absorption Toxicity Protein Binding Biotransformation
Half Life
152459-95-5 493.6027 493.2590 Solid 226 oC (mesylate salt) Very soluble in water at pH < 5.5 (mesylate salt) Source: PhysProp 1.46e-02 mg/mL Calculated using ALOGPS Imatinib is well absorbed with mean absolute bioavailability is 98% with maximum levels achieved within 2-4 hours of dosing Side effects include nausea, vomiting, diarrhea, loss of appetite, dry skin, hair loss, swelling (especially in the legs or around the eyes) and muscle cramps Very high (95%) Primarily hepatic via CYP3A4. Other cytochrome P450 enzymes, such as CYP1A2, CYP2D6, CYP2C9, and CYP2C19, play a minor role in its metabolism. The main circulating active metabolite in humans is the N-demethylated piperazine derivative, formed predominantly by CYP3A4. 18 hours for Imatinib, 40 hours for its major active metabolite, the Ndesmethyl derivative
14 Dosage Forms
Form Route Capsul e Oral Tablet Oral
Food Interactions
Organisms Affected Phase 1 Metabolizing Enzymes Targets
•
Take with food to reduce the incidence of gastric irritation. Follow with a large glass of water. A lipid rich meal will slightly reduce and delay absorption. Avoid grapefruit and grapefruit juice throughout treatment, grapefruit can significantly increase serum levels of this product. • Humans and other mammals 1. Cytochrome P450 3A4 (CYP3A4) • • • • • • • • •
Proto-oncogene tyrosine-protein kinase ABL1 Beta platelet-derived growth factor receptor Mast/stem cell growth factor receptor Alpha platelet-derived growth factor receptor Macrophage colony-stimulating factor 1 receptor Multidrug resistance protein 1 High affinity nerve growth factor receptor ATP-binding cassette sub-family G member 2 RET proto-oncogene
•
Epithelial discoidin domain-containing receptor 1
Oxygen
15
2.3 BIOLOGICAL DATABASES: 2.3.1 NATIONAL CENTRE FOR BIOLOGICAL INFORMATION: The
National
Centre
for
Biotechnology
Information
(NCBI)
provides
a
comprehensive website for biologists that includes biology-related databases, and tools for viewing and analyzing the data inherent in the databases. A division of the National Library of Medicine at the National Institutes of Health, NCBI is the agency responsible for creating automated systems for storing and analyzing the rapidly growing profusion of genetic and molecular data. One of the most difficult challenges faced in the field of bioinformatics is how to store, in an easily accessible manner, the overwhelming abundance of new information, including the sequences of entire genomes, the ongoing discoveries of new genes and gene products, and the determinations of their functions and structures. NCBI was established as the government's response to the need for more and better information processing methods to deal with this challenge.
16
View the NCBI home page. A relatively good overview of the tools and databases that can be accessed through NCBI is provided in the list along the left border of the home page. Clicking on the link entitled "About NCBI" produces a second menu containing the topics "A Science Primer", and "Databases and Tools", among others. Selecting "A Science Primer" yields access to general definitions and introductory information regarding the branches of science included in bioinformatics. Many bioinformatics terms are defined in this section in a clear-cut and basic manner, making this Primer an excellent first resource. Selecting "Databases and Tools" from the "About NCBI" webpage menu yields a complete and wellordered listing of accessible information. This web page containing the databases and tools menu is a good choice for those who are inclined toward bookmarking. The first item under the "Databases and Tools" menu is "Literature Databases". PubMed is the most heavily used of the literature databases and can be used to access MEDLINE biological and medical scientific journal citations dating back to articles written in the mid-1960's. The second item under the "Databases and Tools” menu is "Entrez Databases". Entrez is a search and retrieval system developed by NCBI that is capable of accessing integrated information by searching many of the NCBI databases with just one query (instead of searching only one database per query, then having to repeat the query to find information on the same topic from another NCBI database). The NCBI databases that are included in the search when you launch an Entrez query are shown when you click on this link. The "Nucleotide Databases" link under the "Databases and Tools" menu lists all the sequence databases available through NCBI. These sequence databases contain annotated collections of publicly available DNA, RNA and protein sequences. The evolution of bioinformatics data mining methods has been largely driven by the prodigious amount of sequence information collected by scientists in recent years. New sequences of unknown function can be compared with sequences of wellcharacterized genes and proteins. Similarities can be identified between the new, unknown sequences and the well-characterized sequences, and used to postulate theories regarding function or structure. Among the tools listed under the NCBI "Databases and Tools" menu, are "Tools for Data Mining". Selecting the "Tools for Data Mining" topic will show a list of data retrieval tools, including Entrez, mentioned above, and BLAST, the Basic Local Alignment Search Tool. Blast is the predominant sequence alignment tool for performing rapid searches of nucleotide
17
and protein sequence databases and detecting local, as well as global, sequence alignments between the query sequence and the database sequences. This is a brief glimpse at some of the more widely used tools and databases presented by NCBI, presented with the intention of helping the novice get some feel for the number and types of bioinformatics tools that are available on the internet today. Several of these tools are covered in more detail in subsequent modules included in this bioinformatics course. Before proceeding to the next module, take a moment to return to the "About NCBI" webpage menu and glance through some of the interesting web pages linked under the topics "A Science Primer", "Outreach and Education", and "News". The NCBI has had responsibility for making available the GenBank DNA sequence database since 1992. GenBank coordinates with individual laboratories and other sequence databases such as those of the European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ). Since 1992, NCBI has grown to provide other databases in addition to GenBank. NCBI provides Online Mendelian Inheritance in Man, the Molecular Modeling Database (3D protein structures), dbSNP a database of Single Nucleotide Polymorphisms, the Unique Human Gene Sequence Collection, a Gene Map of the Human genome, a Taxonomy Browser, and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy Project. The NCBI assigns a unique identifier (Taxonomy ID number) to each species of organism
2.3.2 PROTEIN DATA BANK:
18
The Protein Data Bank (PDB) is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. (See also crystallographic database). The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, can be accessed at no charge on the internet. The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB. The PDB is a key resource in areas of structural biology, such as structural genomics. Most major scientific journals, and some funding agencies, such as the NIH in the USA, now require scientists to submit their structure data to the PDB. If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. For example, both SCOP and CATH categorize structures according to type of structure and assumed evolutionary relations; GO categorize structures based on genes. The PDB originated as a grassroots effort. In 1971, Walter Hamilton of the Brookhaven National Laboratory agreed to set up the data bank at Brookhaven. Upon Hamilton's death in 1973, Tom Koeztle took over direction of the PDB. In January, 1994, Joel Sussman was appointed head of the PDB. In October, 1998 the PDB was transferred to the Research Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June, 1999. The new director was Helen M. Berman of Rutgers University (one of the member institutions of the RCSB). In 2003, with the formation of the wwPDB, the PDB became an international organization. Each of the four members of wwPDB can act as deposition, data processing and distribution centres for PDB data. The data processing refers to the fact that wwPDB staff review and annotates the each submitted entry. The data are then automatically checked for plausibility
The PDB database is updated weekly. Likewise, the PDB Holdings List is also updated weekly. As of 28 April 2009, the breakdown of current holdings was as follows:
19
Experimental Method
Proteins Nucleic Acids
Protein/Nucleic Acid complexes
Other Total
X-ray diffraction
45825
1141
2110
17
49093
NMR
6815
850
144
7
7816
Electron microscopy 155
16
59
0
230
Other
110
4
4
9
127
Total:
52905
2011
2317
33
57266
38,249 structures in the PDB have a structure factor file. 4,496 structures have an NMR restraint file. These data show that most structures are determined by X-ray diffraction, but about 15% of structures are now determined by protein NMR, and a few are even determined by cryoelectron microscopy. The significance of the structure factor files, mentioned above, is that, for PDB structures determined by X-ray diffraction that have a structure file, the electron density map may be viewed. The data of such structures is stored on the "electron density server", where the electron maps can be viewed. In the past, the number of structures in the PDB has grown nearly exponentially. In 2007, 7263 structures were added. However, in 2008, only 7073 structures were added, so the rate of production of structures has started to decrease. The file format initially used by the PDB was called the PDB file format. This original format was restricted by the width of computer punch cards to 80 characters per line. Around 1996, the "macromolecular Crystallographic Information file" format, mmCIF, started to be phased in. An XML version of this format, called PDBML, was described in 2005. The structure files can be downloaded in any of these three formats. In fact, individual files are easily downloaded into graphics packages using web addresses: •
For PDB format files, use, e.g., http://www.pdb.org/pdb/files/4hhb.pdb.gz
•
For PDBML (XML) files, use, e.g., http://www.pdb.org/pdb/files/4hhb.xml.gz
20
The "4hhb" is the PDB identifier. Each structure published in PDB receives a four-character alphanumeric identifier, its PDB ID. (This cannot be used as an identifier for biomolecules, because often several structures for the same molecule (in different environments or conformations) are contained in PDB with different PDB IDs.) The structure files may be viewed using one of several open source computer programs. Some other free, but not open source programs include VMD, MDL Chime, Swiss-PDB Viewer, StarBiochem (a Java-based interactive molecular viewer with integrated search of protein databank) and Sirius. The RCSB PDB website contains an extensive list of both free and commercial molecule visualization programs and web browser plugins.
2.3.3 DRUGBANK: The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs. Additionally, more than 2,500 non-redundant protein (i.e. drug target) sequences are linked to these FDA approved drug entries. Each DrugCard entry contains more than 100 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. DrugBank is supported by David Wishart, Departments of Computing Science & Biological Sciences, University of Alberta. Users may query DrugBank in any number of ways. The simple text query (above) supports general text queries of the entire textual component of the database. Clicking on the Browse button (on the DrugBank navigation panel above) generates a tabular synopsis of DrugBank's content. This browse view allows users to casually scroll through the database or re-sort its contents. Clicking on a given DrugCard button brings up the full data content for the corresponding drug. A complete explanation of all the DrugCard fields and sources is given here.
21
The PharmaBrowse button allows users to browse through drugs as grouped by their indication. This is particularly useful for pharmacists and physicians, but also for pharmaceutical researchers looking for potential drug leads. The ChemQuery button allows users to draw (using MarvinSketch applet or a ChemSketch applet) or write (SMILES string) a chemical compound and to search DrugBank for chemicals similar or identical to the query compound. The TextQuery button supports a more sophisticated text search (partial word matches, case sensitive, misspellings, etc.) of the text portion of DrugBank. The SeqSearch button allows users to conduct BLASTP (protein) sequence searches of the 18,000 sequences contained in DrugBank. Both single and multiple sequence (i.e. whole proteome) BLAST queries are supported. The Data Extractor button opens an easy-to-use relational query search tool that allows users to select or search over various combinations of subfields. The Data Extractor is the most sophisticated search tool for DrugBank. Users may download selected text components and sequence data from DrugBank and track the latest DrugBank statistics by clicking on the Download button.
2.4 WORKING TOOLS: 2.4.1 ORF FINDER: The ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a user's sequence or in a sequence already in the database. This tool identifies all open reading frames using the standard or alternative genetic codes. The deduced amino acid sequence can be saved in various formats and searched against the sequence database using the WWW BLAST server. The ORF Finder helps in preparing complete and accurate sequence submissions. It is also packaged with the Sequin sequence submission software. Link: www.ncbi.nlm.nih.gov/orf_finder.html
22
2.Click orf 1.Paste sequence here
To use ORF Finder, enter the accession or GI number of the sequence of interest, or enter your query sequence directly into the text box in FASTA format. ORF Finder will identify all open reading frames using the standard genetic code or an alternative one for translation. Users can limit the search for open reading frames to a portion of the query sequence by specifying the positions (in base pairs) in the "From" and "To" boxes. Press the ORF Find button to retrieve a graphic display of ORFs and their location in the sequence in 6 reading frames. Users have the option to change the minimum ORF length to 50 or 300 nucleotides (in base pairs) and Redraw the query sequence. The Six Frames option features a graphic of all start and stop codons. Select a particular ORF by clicking on it to see the amino acid sequence with all alternative start codons. After selecting a particular ORF of interest, click on the Accept button and have the option to view the ORF in various formats: GenBank flatfile, FASTA nucleotide, or FASTA amino acid sequence. Selecting View retrieves the full GenBank record with its annotated sequence information. For those scientists submitting sequence data, ORF Finder is also packaged with the Sequin sequence submission software. ORF Finder can be used in conjunction with Sequin’s Sequence Editor to annotate new coding regions on the record, perform basic editing, and translate nucleotide sequences. The Sequin program can be downloaded from NCBI’s FTP site accessible from the NCBI WWW home page. 2.4.2 BASIC LOCAL ALIGNMENT SEARCH TOOL:
23
The BLAST algorithm was developed as a new way to perform a sequence similarity search by an algorithm that is faster than FASTA while being as sensitive. A powerful computer system dedicated to running BLAST has been established at NCBI, National Library of Medicine. Access to this BLAST system is possible through the Internet (http://www.ncbi.nlm.nih.gov/) as a Web site and through a BLAST E-mail server. There are also numerous other Web sites that provide a BLAST database search. In addition to the BLAST programs developed at the NCBI, an independent set of BLAST programs has been developed at Washington University. These programs perform similarity searches using the same methods as NCBI-BLAST and produce gapped local alignments. The statistical methods used to evaluate sequence similarity scores are different, and thus WU-BLAST and NCBI-BLAST can produce different results. The BLAST Web server at http://www.ncbi.nlm.nih.gov/ is the most widely used one for sequence database searches and is backed up by a powerful computer system so that there is usually very little wait. Like FASTA, the BLAST algorithm increases the speed of sequence alignment by searching first for common words or k-tuples in the query sequence and each database sequence. Whereas FASTA searches for all possible words of the same length, BLAST confines the search to the words that are the most significant. For proteins, significance is determined by evaluating these word matches using log odds scores in the BLOSUM62 amino acid substitution matrix. For the BLAST algorithm, the word length is fixed at 3 (formerly 4) for proteins and 11 for nucleic acids (3 if the sequences are translated in all six reading frames). This length is the minimum needed to achieve a word score that is high enough to be significant but not so long as to miss short but significant patterns. FASTA theoretically provides a more sensitive search of DNA sequence databases because a shorter word length may be used. Like FASTA, the BLAST algorithm has gone through several developmental stages. The most recent gapped BLAST, or BLAST2, is recommended, as older versions of BLAST are reported to overestimate the significance of database matches (Brenner et al. 1998). The most important recent change is that BLAST reports the significance of a gapped alignment of the query and database sequences. Former versions reported several ungapped alignments, and it was more difficult to evaluate their overall significance. Steps for searching a protein sequence database by a query protein sequence include the following:
24
The sequence is optionally filtered to remove low-complexity regions that are not useful for producing meaningful sequence alignments. A list of words of length 3 in the query protein sequence is made starting with positions 1, 2, and 3; then 2, 3, and 4, etc.; until the last 3 available positions in the sequence are reached (word length 11 for DNA sequences, 3 for programs that translate DNA sequences). Using the BLOSUM62 substitution scores, the query sequence words in step 1 are evaluated for an exact match with a word in any database sequence. The words are also evaluated for matches with any other combination of three amino acids, the object being to find the scores for aligning the query word with any other three-letter word found in a database sequence. A cut-off score called neighbourhood word score threshold (T) is selected to reduce the number of possible matches to PQG to the most significant ones. The above procedure is repeated for each three-letter word in the query Sequence. The remaining high-scoring words that comprise possible matches to each three letter position in the query sequence are organized into an efficient search tree for comparing them rapidly to the database sequences. Each database sequence is scanned for an exact match to one of the 50 words corresponding to the first query sequence position, for the words to the second position, and so on. If a match is found, this match is used to seed a possible ungapped alignment between the query and database sequences. The next step is to determine whether each HSP score found by one of the above methods is greater in value than a cut-off score S. A suitable value for S is determined empirically by examining the range of scores found by comparing random sequences, and by choosing a value that is significantly greater. The high scoring pairs (HSPs) matched in the entire database are identified and listed. BLAST next determines the statistical significance of each HSP score. A probability that two random sequences, one the length of the query sequence and the other the entire length of the database (which is approximately equal to the sum of the lengths of all of the database sequences), could achieve the HSP score is calculated. Sometimes, two or more HSP regions that can be made into a longer alignment will be found, thereby providing additional evidence that the query and database sequences are related. In such cases, a combined assessment of the significance will be made.
25
Smith-Waterman local alignments are shown for the query sequence with each of the matched sequences in the database. The score of the alignment is obtained and the expect value for that score is calculated. When the expect score for a given database sequence satisfies the user-selectable threshold parameter E, the match is reported. 2.4.3 ICM MOLSOFT PRO: Easy-to-use and complete desktop-modeling environment for a biologist or a chemist interested in molecular structure and function. Platforms Available : Windows Vista/XP/NT/2000, Linux/i386/AMD64, SGI IRIX, Mac OS X ICM empowers a biologist or chemist with lightning fast access and high quality interactive 3D views to the entire sturctural database. In just a few seconds you can browse hundreds of structures of interest load them, analyze and visualize sequences, structures, alignments, sites, study pockets and bound ligands and drugs, study surfaces, electrostatics, mutations, pockets, sequence conservations, perform docking of small molecules as well as protein-protein docking. ICM supports multiple input formats. You can search structural database by field, sequence pattern and get an interactive table for instant viewing. ICM offers a rich graphical environment and powerful views for professional quality of images and molecular animation videos.
The ICM ('Internal Coordinate Mechanics') software project was originally designed around a new molecular mechanics approach and optimization algorithm for peptide prediction, homology modelling, loop simulations, flexible macromolecular docking and refinement, and then was extended to graphics, molecular animations, chemistry, sequence
26
analysis, database searches, mathematics, statistics and plotting. ICM-Pro contains an all atom internal coordinate force field and efficient algorithm to perform local and global energy optimization of small or large molecules with respect to an arbitrary subset of variables. In addition, ICM contains MMFF94 force field for energy optimization in Cartesian space for any organic molecule. ICM-Pro allows users to read, build, convert, refine, analyze and superimpose molecules. Includes graphics tools for diverse molecular rendering, perspective viewing, depth cueing, etc. Uses both hardware and side-by-side stereo. Allows saving and printing a screen image as a compact vectorized postscript file in addition to a compressed bitmap. Molecular graphics: It utilizes a full and robust array of graphics tools all accessible from a GUI interface. Displays your molecules in wire, CPK, ball&stick, worm, ribbon, accessible surface, transparent molecular surface, perspective, depth cueing, smooth and rugged solid surfaces. Uses both hardware and side-by-side stereo. Save and print a screen image as a compact vectorized postscript file (also in stereo) in addition to a compressed bitmap. Painlessly create movies featuring molecules dressed in solid representations such as CPK, smooth molecular surface, ball-and-stick read, display, reshape and write any 3D object in the Wavefront format. Key molecular graphics features of ICM pro: •
Export publication quality molecular images at high resolution and vector images (metafile)
•
Annotate, atoms, residues and sites
•
2D and 3D user-defined labels
•
Hydrogen bond and distance labels
•
Display atom clashes, distance restraints
•
High quality molecular surface representation, skin, wire, xstick and ribbon representations
•
Easy control of thickness, colour and type in molecular graphics. Colour by atom type, residue side-chain, molecule, unique carbon atom colouring for multiple objects, bfactor, occupancy, accessibility, hydrophobicity, polarity, secondary structure, paint structure by alignment colour, colour by user-defined values
27
Visual effects: dynamic shadows, fog, hardware and side-by-side stereo, clipping planes,
•
full screen •
Export coloured and annotated sequence alignments.
•
Easy to use and control animation effects: rotations, rocking, zooming
•
Store current views/viewpoints, layers and slides
•
Two kinds of stereo, including a high quality “in-window” mode, as well as a stereo mode which does not require any special hardware.
Protein Structure Analysis can be done. ICM-Pro provides a direct link to the PDB. Once you have downloaded a structure you can analyse the structure - flagging problem regions, superimpose multiple structures, analyse distances and electrostatic properties. Key protein structure analysis features of ICM pro: •
Dynamic link to the PDB
•
One click search and download PDB structures
•
Tabulated PDB data for easy manipulation, sorting and searching
•
Extract PDB sequence
•
PDB file preparation, detecting and fixing problems, optimization of H, His, Asn, Gln and Pro
•
Superimpose multiple structures and calculate RMSD
•
Calculate contact area, surface area
•
Measure and display distances and angles
•
Fully-linked and dynamic structure-sequence environment
•
Drug binding pocket prediction
28 •
Protein-protein interaction prediction
•
One click ligand pocket display and h-bond optimization with ligand
•
One click analysis of protein-ligand interactions
•
Predict protein flexibility
•
Build electrostatic surfaces
•
Interactive Ramachandran plots
Crystallographic Analysis Tools: The key to understanding a protein structure is to fully evaluate the underlying crystallographic information contained within a PDB file. For example it is important to understand the full biological unit of a protein to identify if crystal-crystal contacts have influenced the structure. The crystallographic analysis features include: •
View crystallographic cell
•
Generate crystallographic neighbors
•
Build
biological
units
and
apply
transformations •
Direct link to electron density map server
•
Contour electron density maps
•
Convert electron density map to grid energy map for real space refinement
Protein Structure Prediction Predicting
low energy
conformations for
chemical compounds, peptides, nucleic acids etc.: Take a peptide sequence and predict its threedimensional structure. Of course, the success is not guaranteed, especially if the peptide is longer than about 25 residues but some preliminary tests are
encouraging.
Evaluate
local
secondary
structure preferences directly from the simulation. Watch a movie with your peptide folding.
29
Protein Modelling: ICM-Pro has a good record in building protein modelling. There are procedures which will regularize or build the backbone, shake up the side-chains and loops by global energy optimization. You can also colour the model by local reliability to identify the potentially wrong parts of the model. This does not include, however, the fast routine for building a complete model by homology with loops combined with the database search (ICM-Homology is a separate add-on to ICM-Pro). Loop modelling and protein design: ICM-Pro was used to design two new 7 residue loops and in both cases the designs were successful. Moreover, the predicted conformations turned out to be exactly right (accuracy of 0.5A) after the crystallographic structures of the designed proteins were determined by Rik Wierenga and his co-workers.
Key structure prediction features: •
A variety of different energy terms and grids are available
•
Define distance restraints and tethers
•
Local minimization
•
Protein structure prediction and optimization
•
Prediction of the effect of a mutation
•
Generation of multiple receptor conformations
•
Model using restrainsts and symmetry
Bioinformatics Tools ICM-Bioinformatics is included in the ICM-Pro package allows users to search a sequence database with high-quality global pairwise and multiple alignment algorithms. Also allows pattern searches, prosite and profile searches. Multiple sequence alignments are fast, the algorithm produces evolutionary trees, principal component view, annotation transfer from sequence to structures, threading and alignment visualization tools.
30
Sequence Analysis: Find alternative alignments and repeats using filtered and probability based dot-plot.
Make
accurate
pairwise
sequence
alignment with a double affine gap penalty and evaluate the probability that the two aligned sequences share the same structural fold. Build multiple sequence alignments, construct and plot evolutionary trees, visualize sequence clustering in two and three dimensions, predict protein secondary structure with a set of powerful algorithms. Search your sequence interactively or in batch through any database and generate a list of possible homologues that are sorted and evaluated by probability of structural significance. The sensitive and rigorous Zega alignment is used for each comparison. This search may give you more homologues that a BLAST search! The output may presented in a linked table form. The text sequence databases can be indexed and queried with ICM. Key bioinformatics features include: •
Read in sequence and alignments in FASTA and other formats
•
Fast sequence searching in Blast databases
•
High quality pairwise and multiple alignment generation
•
Interactive alignment editing
•
Predict sequence secondary structure content
•
Structure-linked sequence alignments and alignment annotation
•
Drag-and-drop alignment generation
Small Molecule Docking ICM-Docking provides a unique set of tools for the modeling of protein/ligand interactions. Performs fast and accurate docking of fully continuously flexible small molecule ligands to a protein represented by grid interaction potentials. Allows users to dock the ligand to the explicit full-atom representation of the receptor with arbitrarily selected subset of flexible sidechains. Performs docking by the ICM stochastic global optimization procedure which combines pseudo-Brownian positional and torsional steps with fast local gradient
31
minimization. Uses continuously differentiable grid potentials to ensure rapid convergence of local minimizations. Contains a sophisticated algorithm for tracking the simulation trajectory to avoid trapping in sub-optimal conformations and allows efficient search of the conformational space. Provides tools for automatic conversion of 2D chemical structures to 3D, sophisticated atom type assignment, charge assignment and recognition of rotatable bonds. Allows parts of the ligand to be automatically constrained to a pre-defined position during docking. Generates multiple conformations of the free or docked ligand. Special Monte-Carlo steps allow sampling of stereo isomers for racemic compounds. Analyzes protein surface for potential binding pockets and displays the interaction properties on the 'skin' representation of the surface. Uses graphical user interface for easy set up of the simulations. Provides maximum flexibility to user by allowing the docking scripts, which are written in intuitive ICM molecular modeling scripting language, to be modified to best meet specific project requirements. Performs protein-protein docking with fast global rigid-body search with grid potentials. Refines best docked configurations with flexible side chains to allow for the induced fit. Small molecule docking features include: •
Drug pocket identification, analysis and visualization tools
•
Small molecule docking
•
Sample racemic centers and double bond cis/trans
•
Relax covalent geometry
•
Keep carboxyls neutral and set charges for amino groups
•
Template docking
•
Incorporation of flexibility into the ligand and receptor side chains/backbone
•
Multiple-receptor conformation docking
•
Automated model building into density - docking to electron density
•
Tabulated and easy to visualize docking results
•
Multiple solutions ranked by energy values
Protein-Protein Docking The
ICM-Protein-Protein
docking
procedues
has
continually lead the pack in docking accuracy in the
32
worldwide CAPRI protein- protein docking competition. In the past ICM has been used to dock ab initio a full-atom model of lysozyme to an antibody with 1.6A accuracy (Nature Struc.Biol., 1994, 1,259). Later, Maxim Totrov and Ruben Abagyan correctly predicted the association of beta-lactamase and its protein inhibitor in the Docking Challenge (Nature Struc.Biol., 1996,3,290) using the ICM pseudo-Brownian docking with subsequent ICM sidechain refinement.
2.4.4 MOLEGRO VIRTUAL DOCKER: Molegro Virtual Docker (MVD) is an integrated environment for studying and predicting how ligands interact with macromolecules. The identification of ligand binding modes is done by iteratively evaluating a number of candidate solutions (ligand conformations) and estimating the energy of their interactions with the macromolecule. The highest scoring solutions are returned for further analysis. MVD requires a three-dimensional structure of both protein and ligand (usually derived from X-ray/NMR experiments or homology modelling). MVD performs flexible ligand docking, so the optimal geometry of the ligand will be determined during the docking. The system requirements for Molegro Virtual Docker are: Windows Vista, 2003, XP, or 2000. Linux: Most standard distribution.. Mac OS X 10.4 (and later versions). Molegro Virtual Docker contains a built-in version checker making it easy to check for new program updates including new features and bug fixes. To check for new updates, select Help | Check for Updates. A window showing available updates and details about changes made will appear.
33
The MolDock scoring function (MolDock Score) used by MVD is derived from the PLP scoring functions originally proposed by Gehlhaar et al. [GEHLHAAR 1995,1998] and later extended by Yang et al. [YANG 2004]. The MolDock scoring function further improves these scoring functions with a new hydrogen bonding term and new charge schemes. The docking scoring function, Escore, is defined by the following energy terms:
Escore = Einter + Eintra Where, Einter is the ligand-protein interaction energy. After MVD has predicted one or more promising poses using the MolDock score, it calculates several additional energy terms. All of these terms are stored in the 'DockingResults.mvdresults' file at the end of the docking run. The 'rerank score' is a linear combination
of
these
terms,
weighted
by
the
coefficients
given
in
the
'RerankingCoefficients.txt'. A '.mvdresults' file is not meant to be interpreted or inspected manually. Instead it should be opened in MVD (either by dragging it onto the workspace or by selecting 'File | Import Docking Results (*.mvdresults)...'. It is also possible to open the file in the Data Analyzer in order to create new regression models based on the energy terms in the file.
34
The following table explains the different terms in a '.mvdresults' file: Textual Information •
Ligand: The name of the ligand the pose was created from.
•
Name: The internal name of the pose (a concatenation of the pose id and ligand name).
•
Filename: The file containing the pose.
•
Workspace: The workspace (.mvdml-file) containing the protein.
•
Run: When running multiple docking runs for each ligand, this field contains the docking run number.
Energy terms (total): •
Energy: The MolDock score (arbitrary units).
•
RerankScore: The reranking score (arbitrary units).
•
PoseEnergy: The score actually assigned to the pose during the docking.
•
SimilarityScore: Similarity Score (if docking templates are enabled).
•
LE1 Ligand Efficiency 1: MolDock Score divided by Heavy Atoms count.
•
LE3 Ligand Efficiency 3: Rerank Score divided by Heavy Atoms count.
Energy terms (contributions) •
E-Total: The total MolDock Score energy is the sum of internal ligand energies, protein interaction energies and soft penalties.
•
E-Inter total: The total MolDock Score interaction energy between the pose and the target molecule(s).
•
E-Inter (cofactor - ligand): The total MolDock Score interaction energy between the pose and the cofactors. (The sum of the steric interaction energies calculated by PLP, and the electric and hydrogen bonding terms below).
•
Cofactor (VdW): The steric interaction energy between the pose and the cofactors calculated using a LJ12-6 approximation.
•
Cofactor (elec): The electrostatic interaction energy between the pose and the cofactors.
•
Cofactor (hbond): The hydrogen bonding interaction energy between the pose and the cofactors (calculated by PLP).
35
•
E-Inter (protein - ligand): The MolDock Score interaction energy between the pose and the protein. (Equal to Steric+HBond+Electro+ElectroLong below)
•
Steric: Steric interaction energy between the protein and the ligand (calculated by PLP).
•
HBond: Hydrogen bonding energy between protein and ligand (calculated by PLP).
•
Electro: The short-range (r<4.5Å) electrostatic protein-ligand interaction energy.
•
ElectroLong: The long-range (r>4.5Å) electrostatic protein-ligand interaction energy.
•
NoHBond90: This is the hydrogen bonding energy (protein-ligand) as calculated if the
•
Directionality of the Hbond was not taken into account.
•
VdW (LJ12-6): Protein steric interaction energy from a LJ 12-6 VdW potential approximation.
•
E-Inter (water - ligand): The MolDockScore interaction energy between the pose and the water molecules.
•
E-Intra (tors, ligand atoms): The total internal MolDockScore energy of the pose.
•
E-Intra (steric): Steric self-interaction energy for the pose (calculated by PLP).
•
E-Intra (hbond): Hydrogen bonding self-interaction energy for the pose (calculated by PLP).
•
E-Intra (elec): Electrostatic self-interaction energy for the pose.
•
E-Intra (tors) Torsional energy for the pose.
•
E-Intra (sp2-sp2) Additional sp2-sp2 torsional term for the pose.
•
E-Intra (vdw) Steric self-interaction energy for the pose (calculated by a LJ12-6 VdW approximation).
•
E-Solvation The energy calculated from the implicit solvation model.
•
E-Soft Constraint Penalty The energy contributions from soft constraints.
Static terms •
Torsions: The number of (chosen) rotatable bonds in the ligand.
•
HeavyAtoms: Number of heavy atoms.
•
MW Molecular weight (in dalton).
•
C0 Obsolete constant term: This value is always 1.
•
CO2minus: Number of Carboxyl groups in ligand.
•
Csp2: Number of Sp2 hybridized carbon atoms in ligand.
36 •
Csp3: Number of Sp3 hybridized carbon atoms in ligand.
•
DOF Degrees of internal rotational freedom:. As of now this is the number of chosen rotatable bonds in the ligand and is thus equal to the 'Torsions' term. It is supposed to reflect how many rotational degrees of freedom are lost upon binding. Future work may include a more advanced model where the actual conformation is inspected in order to determine whether rotational degrees of freedom are lost.
•
N: Number of nitrogen atoms in ligand.
•
Nplus: Number of positively charged nitrogen atoms in ligand.
•
OH: Number of hydroxyl groups in ligand.
•
OPO32minus: Number of PO4
•
2-- Groups in ligand.
•
OS Number of ethers and thioethers in ligand.
•
Carbonyl: Number of Carbonyl groups in ligand.
•
Halogen: Number of Halogen groups in ligand.
Other terms RMSD: The RMS deviation from a reference ligand.
The docking search algorithm (MolDock Optimizer) used in MVD is based on an evolutionary algorithm [MICHALEWICZ 1992, 2000]. Evolutionary algorithms (EAs) are iterative optimization techniques inspired by Darwinian evolution theory. In EAs, the evolutionary process is simplified and thus it has very little in common with real world evolution. Nevertheless, during the last fifty years EAs have proved their worth as powerful optimization techniques that can assist or replace traditional techniques when these fail or are inadequate for the task to be solved. Basically, an EA consists of a population of individuals (candidate solutions), which is exposed to random variation by means of variation operators, such as mutation and recombination. The individual being altered is often referred to as the parent and the resulting solution after modification is called the offspring. Sometimes more than one parent is used to create the offspring by recombination of solutions, which is also referred to as crossover.
37
The guided differential evolution algorithm (MolDock Optimizer) used in MVD is based on an EA variant called differential evolution (DE). The DE algorithm was introduced by Storn and Price in 1995 [STORN 1995]. Compared to more widely known EA-based techniques (e.g. genetic algorithms, evolutionary programming, and evolution strategies), DE uses a different approach to select and modify candidate solutions (individuals). The main innovative idea in DE is to create offspring from a weighted difference of parent solutions. The DE works as follows: First, all individuals are initialized and evaluated according to the MolDock Score (fitness function). Afterwards, the following process will be executed as long as the termination condition is not fulfilled: For each individual in the population, an offspring is created by adding a weighted difference of the parent solutions, which are randomly selected from the population. Afterwards, the offspring replaces the parent, if and only if it is more fit. Otherwise, the parent survives and is passed on to the next generation iteration of the algorithm). Additionally, guided differential evolution may use a cavity prediction algorithm to constrain predicted conformations (poses) during the search process. More specifically, if a candidate solution is positioned outside the cavity, it is translated so that a randomly chosen ligand atom will be located within the region spanned by the cavity.
38
Naturally, this strategy is only applied if a cavity has been found. If no cavities are reported, the search procedure does not constrain the candidate solutions. One of the reasons why DE works so well is that the variation operator exploits the population diversity in the following manner: Initially, when the candidate solutions in the population are randomly generated the diversity is large. Thus, when offspring are created the differences between parental solutions are big, resulting in large step sizes being used. As the algorithm converges to better solutions, the population diversity is lowered, and the step sizes used to create offspring are lowered correspondingly. Therefore, by using the differences between other individuals in the population, DE automatically adapts the step sizes used to create offspring as the search process converges toward good solutions. Only ligand properties are represented in the individuals since the protein remains rigid during the docking simulation. Thus, a candidate solution is encoded by an array of realvalued numbers representing ligand position, orientation, and conformation as Cartesian coordinates for the ligand translation, four variables specifying the ligand orientation (encoded as a rotation vector and a rotation angle), and one angle for each flexible torsion angle in the ligand (if any). Each individual in the initial population is assigned a random position within the search space region (defined by the user). Initializing the orientation is more complicated: By just choosing uniform random numbers for the orientation axis (between -1.0 and 1.0 followed by normalization of the values to form a unit vector) and the angle of rotation (between -180° and +180°), the initial population would be biased towards the identity orientation (i.e. no rotation). To avoid this bias, the algorithm by Shoemake et al. [SHOEMAKE 1992] for generating uniform random quaternions is used and the random quaternions are then converted to their rotation axis/rotation angle representation. The flexible torsion angles (if any) are assigned a random angle between -180° and +180°.
In MVD, the following default parameters are used for the guided differential evolution algorithm: population size = 50, crossover rate = 0.9, and scaling factor = 0.5. These settings have been found by trial and error, and are generally found to give the best results across a test set of 77 complexes.
39
In order to determine the potential binding sites, a grid-based cavity prediction algorithm has been developed. The cavity prediction algorithm works as follows: First, a discrete grid with a resolution of 0.8 Å covering the protein is created. At every grid point a sphere of radius 1.4 Å is placed. It is checked whether this sphere will overlap with any of the spheres determined by the Van der Waals radii of the protein atoms. Grid points where the probe clashes with the protein atom spheres will be referred to as part of the inaccessible volume, all other points are referred to as accessible. Second, each accessible grid point is checked for whether it is part of a cavity or not using the following procedure: From the current grid point a random direction is chosen, and this direction (and the opposite direction) is followed until the grid boundaries are hit, checking if an inaccessible grid point is hit on the way. This is repeated a number of times, and if the percentage of lines hitting an inaccessible volume is larger than a given threshold, the point is marked as being part of a cavity. By default 16 different directions are tested, and a grid point is assumed part of a cavity if 12 or more of these lines hit an inaccessible volume. The threshold can be tuned according to how enclosed the found cavities should be. A value of 0% would only be possible far from the protein as opposed to a value of 100% corresponding to a binding site buried deeply in the protein. The final step is to determine the connected regions. Two grid points are connected if they are neighbours. Regions with a volume below 10.0 Å3 are discarded as irrelevant (the volume of a connected set of grid points is estimated as the number of grid point times the volume of a unit grid cell). The cavities found are then ranked according to their volume. Clustering Algorithm The multiple poses returned from a docking run are identified using the following procedure: During the docking run, new candidate solutions (poses) scoring better than parental solutions are added to a temporary pool of docking solutions. If the number of poses in the pool is higher than 300, a clustering algorithm is used to cluster all the solutions in the pool. The clustering is performed on-line during the docking search and when the docking run terminates. Because of the limit of 300 poses, the clustering process is fast. The members of the pool are replaced by the new cluster representatives found (limited by the Max number of poses returned option).
40
The clustering procedure works as follows: 1. The pool of solutions is sorted according to energy scores (starting with the best-scoring pose). 2. The first member of the sorted pool of solutions is added to the first initial cluster and the member is assigned to be the cluster representative. 3. The remainder of the pool members are added to the most similar cluster available (using the common RMSD measure) if and only if the RMSD between the representative of the most similar cluster and the member is below a user-specified RMSD threshold. Otherwise, a new cluster is created and the member is assigned to be the cluster representative. 4. The clustering procedure is terminated when the total number of clusters created exceeds Max number of poses returned (user-defined parameter) or when all members of the pool have been assigned to a cluster. 5. When the cluster procedure has terminated, the set of representatives (one from each cluster) is returned. MVD accepts the following molecular structure formats: PDB (Protein Data Bank). Supported file extensions: pdb/ent. Mol2 (Sybyl Mol2 format). Supported file extensions: mol2. SDF (MDL format). Supported file extensions: sdf/sd (for multiple structures) and mol/mdl (for a single molecular structure). Additionally, Molegro Virtual Docker uses its own MVDML file format. MVDML is a shorthand notation for Molegro Virtual Docker Markup Language and is an XML-based file format. In general, MVDML can be used to store the following information: Molecular structures (atom coordinates, atom types, partial charges, bond orders, hybridization states, ...) Constraints (location, type, and constraint parameters) Search space (center and radius) State information (workspace properties) Cavities (location, cavity grid points)
41
2.5 HOMOLOGY MODELLING: Prediction of a three-dimensional structure of a given protein sequence (target) based on an alignment to one or more known protein structures (templates). If similarity between the target sequence and the template sequence is detected, structural similarity can be assumed. In general, 30% sequence identity is required to generate a useful model. It can be used to understand function, activity, specificity, etc. It is of interest to drug companies wishing to do structure-aided drug design.
Structure prediction by homology modelling Homology modelling makes two fundamental assumptions: The structure of a protein is determined by its primary amino acid sequence (Anfinsen). During evolution, the structure of protein has changed much slower than its sequence. Similar sequences adopt identical structures and distantly related sequences fold into similar structures.
42
Homology Modelling Steps: 1)
Template recognition & initial alignment Select the best template from a library of known protein structures derived from the PDB Templates can be found using the target sequence as a query for searching using FASTA or BLAST.
To find a template or templates structures from protein data base:
43
2)
Alignment correction Alignments are scored (substitution score) in order to define similarity between 2 aa residues in the sequences A substitutions score is calculated for each aligned pair of letters. Substitution matrices: - Reflect the true probabilities of mutations occurring through a period of evolution -PAM family: based on global aligments of closely related proteins. Mutation probability matrix. - BLOSUM family: based on observed alignments, no extrapolation of sequences that are related.
3)
Backbone generation Uses known structurally conserved regions to generate coordinates for the unknown For SCRs - copy coordinates from known structures.
For variable regions (VR) - copy from known structure, if the residue types are similar; otherwise, use databases for fragmented loop sequences.
44
45
4)
Loop modelling: Loops are created as a result of substitutions, insertions and deletions in the same family.
Loop modelling is done by: Database search for segments from known protein structures fitting fixed endpoints Molecular mechanics/molecular dynamics Combination of 1+2 For missing loops, Ab initio rebuilding is done. 5)
Side-chain modelling Use of rotamer libraries (backbone dependent)
46
Molecular mechanics optimization - Dead-end elimination (heuristic) - Monte Carlo (heuristic) - Branch & Bound (exact) Mean-field methods 6)
Model optimization Done by molecular mechanics methods. Model validation:
7)
Model should be evaluated for: - Correctness of the overall fold/structure - Errors over localized regions - Stereo chemical parameters: bond lengths, angles, etc Some softwares for model verification: - Procheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html -WHAT IF http://swift.cmbi.kun.nl/whatif -PROSA II http://www.came.sbg.ac.at/Services/prosa.html -Profile 3D & Verify 3D http://shannon.mbi.ucla.edu/DOE/Services Frequently used servers and softwares for homology modelling: Online servers: CPH protein model server; PS2 protein model server; 3D JIGSAW; PHYRE Offline servers: SPDBV Commercial tool: ICM Molsoft Application of homology modelling: Structure-based assessment of target drugability Structure-guided design of mutagenesis experiments Tool compound design for probing biological function Homology model based ligand design Design of in vitro test assays.
47
Structure-based prediction of drug metabolism and toxicity.
2.6 DOCKING: Docking is nothing but computer simulation of binding interaction between two molecules. These two molecules may be: 1. Two proteins 2. A protein and a drug 3. A nucleic acid and a drug. The first docking program was given by Kuntz[1982].
Docking strategy:
48
Types of docking: 1. Rigid docking: In this type of docking, both the molecules are kept rigid. That is,
their side chains are not movable. This type of docking is not natural and is done only in the softwares. 2. Semi-flexible docking: in this type of docking, the larger protein molecule is made
rigid, whereas the smaller ligand is kept flexible. This usually done in protein and drug docking. It is also known as quasi-flexible docking. 3. Flexible docking: in this type of docking, both the molecules are kept flexible in
nature and this is the only type of docking which is seen in natural conditions also.
Search algorithm in docking: Every docking software, follows a method i.e an algorithm to perform the docking process and to give the best drug for a particular protein. This process or method is known as search algorithm or search strategy. There are four types of search algorithms: 1. Random search algorithm: Genetic algorithm: e.g. - AUTODOCK, GOLD. Monte Carlo method: e.g. - PRODOCK, MC-DOCK, ICM DOCKVISION, GLIDE Tabu search
2. Systematic search algorithm: Fragment based method: e.g. - DOCK, FLEXX, ADAM Point complementary method/ conformational method Distance geometry method
49
Database method: e.g.-FLOG, EUDOC
3. Simulation search algorithm: Molecular dynamics Energy minimization
4. Multiple methods algorithm
Scoring functions used in docking: Are the functions which are used to score the proteins and ligand complexes an give us that complex which is having the least energy value. So, for every docking run, a particular score is given by the scoring functions. Scoring functions vary from tool to tool. There are three types of scoring functions: 1. Force field based scoring function: atomic structure, valency, bond angle, bond length etc. •
GOLD score
•
G score
•
D score
•
AMBA score
•
CHARM
•
GROMOS
2. Emperical based scoring functions: statistics-of regression coefficients etc. •
Chem score
50
•
Bȍhm’s scoring function
•
F score
•
X core
3. Knowledge based scoring functions: experimental data-x-ray crystallography, rvalue, MNR values. •
Drug score
•
SMOG score
•
Potential of mean score[pmm]
51
3.1 TABLE OF DATABASES USED NAME NCBI
URL
USED FOR
www.ncbi.nlm.nih.gov
PDB
mRNA and protein sequence of abl1 gene of human www.rcsb.org/pdb/home/home Blastp against PDB can search for structural similarity of the protein www.drugbank.ca Download the drug Imatinib in mol format Clients’ research Preparing the mutant nucleotides[mRNA]
CHAPTER - 3
DrugBank Exonic mutation information
MATERIALS
3.2
AND METHODS TABLE OF TOOLS, THEIR SOURCES AND THE WORKING METHOD THEY ARE USED IN: NAME
URL/ SOURCE
USED FOR/ WORKING METHOD
ORF Finder
http://www.ncbi.nlm.nih.gov/gorf/gorf.html
Blast p
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Finding the correct reading frame of a given nucleotide sequence Pair-wise sequence alignment to find similarity against
52
ICM Molsoft Pro Purchased from ICM makers and installed the software on the system
Molegro Virtual Docker
Purchased from Molegro makers and installed the software on the system
Argus lab 4.0.1
Free download from argus lab site
PDB and find the structure of abl1 protein Homology modelling of the mutant protein sequences using the template searched by blastp Docking the drug imatinib on to the mutant protein sequences Open and view PDB files
NOTE: The protocol followed while using the above databases and tools to for the above mentioned working methods are detailed in the next chapter.
53
4.1 mRNA and Protein sequence of ABL1 c-abl Oncogene 1, receptor
CHAPTER - 4
THE EXPERIMENT tyrosine kinase Download: Opened the NCBI website by entering the following url in the internet explorer address bar:
www.ncbi.nlm.nih.gov Chose “nucleotide” from the database scroll list and typed < abl1 AND human> in the search box and clicked on search.
54
Among the many results returned, the required result was chosen and the mRNA and the protein sequence of the same were opened through the link given [marked by arrow] Both the sequences were opened in the FASTA format and saved. Here there are two transcripts shown we chose the second one in random.
4.2 Creating mutant mRNA sequence: The client has sequenced Abl tyrosine kinase domain in CML patients and provided us with the point mutation data. Using this data, we created mutant mRNA sequences by simply editing the base at which the mutation has occurred in the mRNA sequenced downloaded from NCBI in a notepad. And while doing so, spaces if any were removed. Each mutant nucleotide sequence was saved as a separate file.
4.3 ORF finder: Now, to convert these mutant nucleotide sequences into a protein sequence, we used the ORF Finder [Open Reading Frame Finder] provided by the NCBI, at the following link:
http://www.ncbi.nlm.nih.gov/gorf/gorf.html Copied and pasted the mutant nucleotide sequence in the box given and clicked on ORF find button. A screen shot of the same is shown below.
55
The result of the ORF find retrieved is shown below. We chose that sequence and frame in which our mutation is likely to be present.
56
The screen shot below shows the result of the protein sequence of the chosen frame and length.
57
After clicking on the accept button, we chose to view the sequence in Fasta protein from the view scroll list and saved the result in a notepad.
4.4 BLASTp: Now that we have got the mutant protein sequences, we needed to find their structures by homology modelling. But for that reason, we needed a template. And this template was obtained by using Basic Local Alignment Search Tool. The BLASTp feature, provided by the NCBI, allows us to search a protein database using a protein query.
For this on the www, the following link was opened:
http://blast.ncbi.nlm.nih.gov/Blast.cgi
58
The link to BLASTp, [encircled in red] was then opened. This is shown below in two screen shots. In the first , the sequence in its fasta format is pasted and in the second, the job title, the database and the algorithm are specified and BLAST button was clicked.
59
The result was then obtained.
60
The results have shown that the query sequence shows 100% identity with the protein 2e2b [Crystal structure of the c-Abl kinase domain in complex with INNO-406 ] in the PDB database. The pdb file of this protein was downloaded from the PDB site.
4.5 HOMOLOGY MODELLING BY USING ICM MOLSOFT PRO: After opening the application, from the tool bar, was initiated. This opens up a dialogue box “new molecule/sequence/grob” in this, the protein sequence of the mutant nucleotide obtained from the ORF finder was pasted in its Fasta format and a sequence name was given and then clicked ok.
Now the protein sequence is uploaded in the work space. From the tool bar was chosen and the template i.e 2e2b was imported. And the screen shot of the same is shown below.
61
In this case, the template was already in object format. If the template is not in the object format, it won’t show itself in the work space. In such a case, the template can be converted into an object using under MolMechanics in the tool bar.
This will open a dialogue box “convert molecular object to.....” choose the options as shown and click ok.
62
When in the status window shows: that means the convert is completed.
Next step is to build the homology model. For this, “build model” under homology in the tools bar is clicked.
This opens up a dialogue box “build model by homology”
63
The sources: fields were chosen by us. The preferences: fields are default settings and the options were chosen as shown. Building the model takes a few minutes. But then the result is retrieved as shown below.
64
The mid box[marked with red] shows alignment between the template and the protein sequence uploaded for homology modelling. The molecule can be viewed properly by using the viewing tools on the right [marked with red]. The loop beneath the alignment box gives information about the loops. The work space is saved by right clicking the “icm” icon next to the protein sequence in the selection space and then “save as”. The file is saved as a PDB file.
4.6 DOCKING-MOLEGRO VIRTUAL DOCKER: After opening the application, the mutant protein model was imported by and then browsing and selecting the molecule from the folder.
65
This opened a dialogue box “import molecules” and the required options were chosen.
This imported the molecule into the model / docking visualization box. Now the protein molecule is to be prepared for docking. For this <preparation—prepare molecule> from tool bar was initiated. The screen shot of the same is shown below.
66
This opened a dialogue box “prepare molecules”. The appropriate fields were chosen.
67
Next, the protein surface is to be created. For this, a right click on the protein icon in the work space and subsequently choosing the “create surface” opened a dialogue box “create surface”. The appropriate fields were chosen. And the create surface was initiated. The screen shots of these steps are shown below.
68
Once the “probing of grid points” is done, the protein molecule’s surface is created. Now, the cavities [in green] of the protein molecule were detected as shown below.
69
70
While performing cavity detection, the application opens up a dialogue box “cavity prediction”. The fields of which were chosen as below.
Next, the drug “imatinib” was imported in the same way as of the protein [first step]. Now the docking wizard was opened.
71
Now a series of dialogue boxes were opened and the appropriate fields were selected as shown.
72
73
The fields that have been shown by arrows were set by us and the fields not marked are default settings. When start button was clicked, the docking was initiated. Some screen shots were taken during the process and are shown below.
74
75
The result obtained is shown as under.
Now the MVD batchjob dialogue box is closed. And the results are imported into the work space as shown below.
76
Each pose dock, can be now visualized by selecting the pose of interest.
77 This now opens up as:
Drug docked in the cavity.
78
CHAPTER - 5
LI T’
RESULTS AND DISCUSSIONS
RESEARCH DATA ON THE EXONIC MUTATIONS IN ABL1 TYROSINE KINASE GENEIN HUMANS: POSITIO N
Type of SNP
165970 deletion -G 166015 T/G 166024 C/T 166109 G/A 166200 A/C
5' flanking sequenc e
Patient’s codes used for sequencing
GGGG
3' flankin g sequen ce TTTT
AAAT ATGT TCTA TGGA
CTTA TATA ACTT TCCC
P268, P114,P 81, P171,P 98 P79 ON1, P56, P7, ON4 P30,
P264,P 69,P56, P265,P 106
5 .1C EN S
79 166238 166248 166307 16 6307
G/C C/A T/G Insertion T 166369 A/C
CCAA GAAA GCAG CAGT
CCTT AATG GGGG GGGG
0N1, ON7, P136, P47 P274, P169
TGAA
GTTC
C/A C/A G/A A/C A/C G/A C/A, G/A
AGTT TTCA GGCA CACC CATT TCCA ACTA
ACAG AGAC AGGT CGCT TCCA CCCC ACAA
P95, ON9, P265, P259, P130, P88, P66, P257, P97, P250, P119, P257, 0N38, P115, P260, P119, P58 0N41, P260, P263, P56, ON21 P276, P25 P66 P102 P143
C/A G/A
CGGA GAAA
ATCA AAAT
149294 C/A 149251 C/A 149285 C/A
CCGT CCTG TTGA
TTAT TGTC TTTT
CCTG
TCTG
TTTT GCGG TCCC
CCTT TCAC CACG
158806 T/C 158896 C/T
GCCA AATC
CTCC TTCA
159110 G/A 159178 C/A
CTCA GCAG
ATCT CTGC
159229 C/A 160838 G/T 160847 G/T
AAAG CACT AATT
CCCC TTTT CCGT
160938 A/C 160935 T/G 160987 G/A
GTGA TTTG CTTA
GTGG GAAG AAAT
160983 C/T
CTTT
TTAG
166373 166375 166410 148967 148977 148982 14,90,29, 030 149056 149119
15 8108 158238 158417 158816
Insertion -A T/A C/A A/G
P179 P273, P208, P25, P242, P246, HO1*, FO1 *E1 FORWARD PRIMER P3, P265, P174, P256, P182, P255* (HETEROZYGOUS FAMILY DETAILS/HISTORY, P62, P110, P237 P70 P278,P214 P81, P20,P16,P83, P183, P200, P81, P59, P277, P104, P42 P116, P135, P115 ON48, ON46, P260, P117, P168, P38, P173 P115, P76, P43, P72, P9, P45, P194, P33, P92, P70, P90, P91 P173, P262, P261, ON45, P179, P182, ON35, ON7, P179, ON16, P128, P137, P257 ON45 P247, P79, P175, P201, P245, P158, P163, P246, P89, P128, P182, P100, P243, P176, P226, P214, ON45, P244, P184, P11, P174, P257, 0N30, P276 P235, P159, P233, P42, P244, P59, P51, P56, P76, P128, P126, P44, P7, P37, P74, ON50, P62, P176, P52, P280, P179, P133, P275, 0N45, P134, P261, ON21, P212, P228, P11, P119, P14,
80 P135, P21, P66, P29, ON38, P243, P86, P10, P41, ON17, P45
161068 161141 161136 161153 161196 161227
C/A C/A G/T C/A G/A T/A
ATGA CCTA CCTG TCTC TGAA TTTT
AGGG AACA CCTA ATCA TGGT CTGC
164404 A/C 164443 C/T
TGGT CCTT
AAAT TGAG
164469 164490 164488 164493 164530
T/G A/G A/C C/G T/G
CTGA TGAA AGTG AATG TTCT
TTTA TGCT AATG TACA TCAG
164607 164760 164762 164764 164818
T/G C/A C/A A/C A/G
CAGG TGTA TACA ACAC AGCT
GTAT ACAA AAAG AAGT ATGT
P200, P210 P213, P156 AO4 WELL E4 FORWARD P201, P182, P33, P72, P247,P43, P136, P200, P137, ON41, P174, P214, P102, P26, P148, P225,P214, P154, P123 P242, P261, P183, P137, P138, P207,P141, P182, P269, P158, P277, P195, P129, P133, P132, P124, P120, P119 ON9 P131 P132 0N38, ON35, ON16,P141, P142, P150, P153, P260, P122, P261, P142 P142 P146 P148, P260, P175, P245, P136, P36, P185
In the above table, the ones that are marked with stars belong to exonic region and the others are intronic. Since post transcription, splicing of introns occur, the intronic mutations are not taken into account here. So in all there are 16 exonic mutations. Since p25 has a double mutation [148967+149119], we can say there are 17 different mutation cases as per the data given.
5.2 Database search for wild type mRNA and protein sequence: mRNA sequence : >gi|62362411|ref|NM_007313.2| Homo sapiens c-abl oncogene 1, receptor tyrosine kinase (ABL1), transcript variant b, mRNA
81 GGTTGGTGACTTCCACAGGAAAAGTTCTGGAGGAGTAGCCAAAGACCATCAGCGTTTCCT TTATGTGTGAGAATTGAAATGACTAGCATTATTGACCCTTTTCAGCATCCCCTGTGAATATTT CTGTTTAGGTTTTTCTTCTTGAAAAGAAATTGTTATTCAGCCCGTTTAAAACAAATCAAGA AACTTTTGGGTAACATTGCAATTACATGAAATTGATAACCGCGAAAATAATTGGAACTCCT GCTTGCAAGTGTCAACCTAAAAAAAGTGCTTCCTTTTGTTATGGAAGATGTCTTTCTGTGA TTGACTTCAATTGCTGACTTGTGGAGATGCAGCGAATGTGAAATCCCACGTATATGCCATTT CCCTCTACGCTCGCTGACCGTTCTGGAAGATCTTGAACCCTCTTCTGGAAAGGGGTACCTA TTATTACTTTATGGGGCAGCAGCCTGGAAAAGTACTTGGGGACCAAAGAAGGCCAAGCTT GCCTGCCCTGCATTTTATCAAAGGAGCAGGGAAGAAGGAATCATCGAGGCATGGGGGTCC ACACTGCAATGTTTTTGTGGAACATGAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAG CCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCC AGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACA CTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATG GTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAG TCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGT ATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCAGTCCTG GCCAGAGGTCCATCTCGCTGAGATACGAAGGGAGGGTGTACCATTACAGGATCAACACTG CTTCTGATGGCAAGCTCTACGTCTCCTCCGAGAGCCGCTTCAACACCCTGGCCGAGTTGGT TCATCATCATTCAACGGTGGCCGACGGGCTCATCACCACGCTCCATTATCCAGCCCCAAAG CGCAACAAGCCCACTGTCTATGGTGTGTCCCCCAACTACGACAAGTGGGAGATGGAACGC ACGGACATCACCATGAAGCACAAGCTGGGCGGGGGCCAGTACGGGGAGGTGTACGAGGG CGTGTGGAAGAAATACAGCCTGACGGTGGCCGTGAAGACCTTGAAGGAGGACACCATGG AGGTGGAAGAGTTCTTGAAAGAAGCTGCAGTCATGAAAGAGATCAAACACCCTAACCTG GTGCAGCTCCTTGGGGTCTGCACCCGGGAGCCCCCGTTCTATATCATCACTGAGTTCATGA CCTACGGGAACCTCCTGGACTACCTGAGGGAGTGCAACCGGCAGGAGGTGAACGCCGTG GTGCTGCTGTACATGGCCACTCAGATCTCGTCAGCCATGGAGTACCTGGAGAAGAAAAAC TTCATCCACAGAGATCTTGCTGCCCGAAACTGCCTGGTAGGGGAGAACCACTTGGTGAAG GTAGCTGATTTTGGCCTGAGCAGGTTGATGACAGGGGACACCTACACAGCCCATGCTGGA GCCAAGTTCCCCATCAAATGGACTGCACCCGAGAGCCTGGCCTACAACAAGTTCTCCATC AAGTCCGACGTCTGGGCATTTGGAGTATTGCTTTGGGAAATTGCTACCTATGGCATGTCCC CTTACCCGGGAATTGACCTGTCCCAGGTGTATGAGCTGCTAGAGAAGGACTACCGCATGG AGCGCCCAGAAGGCTGCCCAGAGAAGGTCTATGAACTCATGCGAGCATGTTGGCAGTGGA ATCCCTCTGACCGGCCCTCCTTTGCTGAAATCCACCAAGCCTTTGAAACAATGTTCCAGGA ATCCAGTATCTCAGACGAAGTGGAAAAGGAGCTGGGGAAACAAGGCGTCCGTGGGGCTG TGAGTACCTTGCTGCAGGCCCCAGAGCTGCCCACCAAGACGAGGACCTCCAGGAGAGCT GCAGAGCACAGAGACACCACTGACGTGCCTGAGATGCCTCACTCCAAGGGCCAGGGAGA GAGCGATCCTCTGGACCATGAGCCTGCCGTGTCTCCATTGCTCCCTCGAAAAGAGCGAGG TCCCCCGGAGGGCGGCCTGAATGAAGATGAGCGCCTTCTCCCCAAAGACAAAAAGACCA ACTTGTTCAGCGCCTTGATCAAGAAGAAGAAGAAGACAGCCCCAACCCCTCCCAAACGC AGCAGCTCCTTCCGGGAGATGGACGGCCAGCCGGAGCGCAGAGGGGCCGGCGAGGAAG AGGGCCGAGACATCAGCAACGGGGCACTGGCTTTCACCCCCTTGGACACAGCTGACCCA GCCAAGTCCCCAAAGCCCAGCAATGGGGCTGGGGTCCCCAATGGAGCCCTCCGGGAGTC CGGGGGCTCAGGCTTCCGGTCTCCCCACCTGTGGAAGAAGTCCAGCACGCTGACCAGCA GCCGCCTAGCCACCGGCGAGGAGGAGGGCGGTGGCAGCTCCAGCAAGCGCTTCCTGCGC TCTTGCTCCGCCTCCTGCGTTCCCCATGGGGCCAAGGACACGGAGTGGAGGTCAGTCACG CTGCCTCGGGACTTGCAGTCCACGGGAAGACAGTTTGACTCGTCCACATTTGGAGGGCAC AAAAGTGAGAAGCCGGCTCTGCCTCGGAAGAGGGCAGGGGAGAACAGGTCTGACCAGG TGACCCGAGGCACAGTAACGCCTCCCCCCAGGCTGGTGAAAAAGAATGAGGAAGCTGCT GATGAGGTCTTCAAAGACATCATGGAGTCCAGCCCGGGCTCCAGCCCGCCCAACCTGACT CCAAAACCCCTCCGGCGGCAGGTCACCGTGGCCCCTGCCTCGGGCCTCCCCCACAAGGA AGAAGCTGGAAAGGGCAGTGCCTTAGGGACCCCTGCTGCAGCTGAGCCAGTGACCCCCA CCAGCAAAGCAGGCTCAGGTGCACCAGGGGGCACCAGCAAGGGCCCCGCCGAGGAGTC CAGAGTGAGGAGGCACAAGCACTCCTCTGAGTCGCCAGGGAGGGACAAGGGGAAATTGT CCAGGCTCAAACCTGCCCCGCCGCCCCCACCAGCAGCCTCTGCAGGGAAGGCTGGAGGA AAGCCCTCGCAGAGCCCGAGCCAGGAGGCGGCCGGGGAGGCAGTCCTGGGCGCAAAGA
82 CAAAAGCCACGAGTCTGGTTGATGCTGTGAACAGTGACGCTGCCAAGCCCAGCCAGCCG GGAGAGGGCCTCAAAAAGCCCGTGCTCCCGGCCACTCCAAAGCCACAGTCCGCCAAGCC GTCGGGGACCCCCATCAGCCCAGCCCCCGTTCCCTCCACGTTGCCATCAGCATCCTCGGCC CTGGCAGGGGACCAGCCGTCTTCCACCGCCTTCATCCCTCTCATATCAACCCGAGTGTCTC TTCGGAAAACCCGCCAGCCTCCAGAGCGGATCGCCAGCGGCGCCATCACCAAGGGCGTG GTCCTGGACAGCACCGAGGCGCTGTGCCTCGCCATCTCTAGGAACTCCGAGCAGATGGCC AGCCACAGCGCAGTGCTGGAGGCCGGCAAAAACCTCTACACGTTCTGCGTGAGCTATGTG GATTCCATCCAGCAAATGAGGAACAAGTTTGCCTTCCGAGAGGCCATCAACAAACTGGAG AATAATCTCCGGGAGCTTCAGATCTGCCCGGCGACAGCAGGCAGTGGTCCAGCGGCCACT CAGGACTTCAGCAAGCTCCTCAGTTCGGTGAAGGAAATCAGTGACATAGTGCAGAGGTAG CAGCAGTCAGGGGTCAGGTGTCAGGCCCGTCGGAGCTGCCTGCAGCACATGCGGGCTCG CCCATACCCGTGACAGTGGCTGACAAGGGACTAGTGAGTCAGCACCTTGGCCCAGGAGCT CTGCGCCAGGCAGAGCTGAGGGCCCTGTGGAGTCCAGCTCTACTACCTACGTTTGCACCG CCTGCCCTCCCGCACCTTCCTCCTCCCCGCTCCGTCTCTGTCCTCGAATTTTATCTGTGGAG TTCCTGCTCCGTGGACTGCAGTCGGCATGCCAGGACCCGCCAGCCCCGCTCCCACCTAGT GCCCCAGACTGAGCTCTCCAGGCCAGGTGGGAACGGCTGATGTGGACTGTCTTTTTCATTT TTTTCTCTCTGGAGCCCCTCCTCCCCCGGCTGGGCCTCCTTCTTCCACTTCTCCAAGAATG GAAGCCTGAACTGAGGCCTTGTGTGTCAGGCCCTCTGCCTGCACTCCCTGGCCTTGCCCG TCGTGTGCTGAAGACATGTTTCAAGAACCGCATTTCGGGAAGGGCATGCACGGGCATGCA CACGGCTGGTCACTCTGCCCTCTGCTGCTGCCCGGGGTGGGGTGCACTCGCCATTTCCTCA CGTGCAGGACAGCTCTTGATTTGGGTGGAAAACAGGGTGCTAAAGCCAACCAGCCTTTGG GTCCTGGGCAGGTGGGAGCTGAAAAGGATCGAGGCATGGGGCATGTCCTTTCCATCTGTC CACATCCCCAGAGCCCAGCTCTTGCTCTCTTGTGACGTGCACTGTGAATCCTGGCAAGAA AGCTTGAGTCTCAAGGGTGGCAGGTCACTGTCACTGCCGACATCCCTCCCCCAGCAGAAT GGAGGCAGGGGACAAGGGAGGCAGTGGCTAGTGGGGTGAACAGCTGGTGCCAAATAGCC CCAGACTGGGCCCAGGCAGGTCTGCAAGGGCCCAGAGTGAACCGTCCTTTCACACATCTG GGTGCCCTGAAAGGGCCCTTCCCCTCCCCCACTCCTCTAAGACAAAGTAGATTCTTACAAG GCCCTTTCCTTTGGAACAAGACAGCCTTCACTTTTCTGAGTTCTTGAAGCATTTCAAAGCC CTGCCTCTGTGTAGCCGCCCTGAGAGAGAATAGAGCTGCCACTGGGCACCTGCGCACAGG TGGGAGGAAAGGGCCTGGCCAGTCCTGGTCCTGGCTGCACTCTTGAACTGGGCGAATGTC TTATTTAATTACCGTGAGTGACATAGCCTCATGTTCTGTGGGGGTCATCAGGGAGGGTTAG GAAAACCACAAACGGAGCCCCTGAAAGCCTCACGTATTTCACAGAGCACGCCTGCCATCT TCTCCCCGAGGCTGCCCCAGGCCGGAGCCCAGATACGGGGGCTGTGACTCTGGGCAGGG ACCCGGGGTCTCCTGGACCTTGACAGAGCAGCTAACTCCGAGAGCAGTGGGCAGGTGGC CGCCCCTGAGGCTTCACGCCGGGAGAAGCCACCTTCCCACCCCTTCATACCGCCTCGTGC CAGCAGCCTCGCACAGGCCCTAGCTTTACGCTCATCACCTAAACTTGTACTTTATTTTTCTG ATAGAAATGGTTTCCTCTGGATCGTTTTATGCGGTTCTTACAGCACATCACCTCTTTGCCCC CGACGGCTGTGACGCAGCCGGAGGGAGGCACTAGTCACCGACAGCGGCCTTGAAGACAG AGCAAAGCGCCCACCCAGGTCCCCCGACTGCCTGTCTCCATGAGGTACTGGTCCCTTCCTT TTGTTAACGTGATGTGCCACTATATTTTACACGTATCTCTTGGTATGCATCTTTTATAGACGC TCTTTTCTAAGTGGCGTGTGCATAGCGTCCTGCCCTGCCCCCTCGGGGGCCTGTGGTGGCT CCCCCTCTGCTTCTCGGGGTCCAGTGCATTTTGTTTCTGTATATGATTCTCTGTGGTTTTTTT TGAATCCAAATCTGTCCTCTGTAGTATTTTTTAAATAAATCAGTGTTTACATTAGAA
wild type protein sequence: >gi|62362412|ref|NP_009297.2| c-abl oncogene 1, receptor tyrosine kinase isoform b [Homo sapiens] MGQQPGKVLGDQRRPSLPALHFIKGAGKKESSRHGGPHCNVFVEHEALQRPVASDFEPQGLSEAARWN SKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYIT PVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVS
83 SESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGE VYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLL DYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDT YTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGVRGAVSTLLQAPELPT KTRTSRRAAEHRDTTDVPEMPHSKGQGESDPLDHEPAVSPLLPRKERGPPEGGLNEDERLLPKDKKTNL FSALIKKKKKTAPTPPKRSSSFREMDGQPERRGAGEEEGRDISNGALAFTPLDTADPAKSPKPSNGAGVP NGALRESGGSGFRSPHLWKKSSTLTSSRLATGEEEGGGSSSKRFLRSCSASCVPHGAKDTEWRSVTLPR DLQSTGRQFDSSTFGGHKSEKPALPRKRAGENRSDQVTRGTVTPPPRLVKKNEEAADEVFKDIMESSPG SSPPNLTPKPLRRQVTVAPASGLPHKEEAGKGSALGTPAAAEPVTPTSKAGSGAPGGTSKGPAEESRVRR HKHSSESPGRDKGKLSRLKPAPPPPPAASAGKAGGKPSQSPSQEAAGEAVLGAKTKATSLVDAVNSDAA KPSQPGEGLKKPVLPATPKPQSAKPSGTPISPAPVPSTLPSASSALAGDQPSSTAFIPLISTRVSLRKTRQPP ERIASGAITKGVVLDSTEALCLAISRNSEQMASHSAVLEAGKNLYTFCVSYVDSIQQMRNKFAFREAINK LENNLRELQICPATAGSGPAATQDFSKLLSSVKEISDIVQR
5.3 ORF finder results: With the help of the ORF finder, the reading frames of the mutant [edited] nucleotide sequences were found out. The results are as shown below: MUTANT NAME 148967 148977 148982 148929+148930 149056 149119 148967+149119[P25] 159110 161068 161136 161141 161153 164607 166200 166238 166248
A.A.R. CHANGE WITH POSITION T231P Y234S A236T STOP CODON D260E Q [NO CHANGE IN A.A.R] T231P; Q[NO CHANGE] N [NO CHANGE IN A.A.R] STOP CODON STOP CODON STOP CODON STOP CODON V467G N498T A511P T514K
VALIDITY Protein created Protein created Protein created Protein truncated Protein created Protein created Protein created Protein created Protein truncated Protein truncated Protein truncated Protein truncated Protein created Protein created Protein created Protein created
T=;P=Phenylalanine;Y=;S=Serine;A=Alanine;D=;E=;Q=;N=;V=Valine;G=Glycine;K=;
Here in this table and here after, T514K means that T in the wild type has been replaced by K in the mutant type at position 514.The above table shows that there are 5 mutation cases where the mutant protein is truncated. A truncated protein cannot form an active protein and hence these mutation cases have been omitted in the next steps. The above table also shows that there are 3 mutation cases, though there is a change in the nucleotide level, at the protein level, the changed codon codes for same A.A and hence no change.
84
5.4 BLAST p results: The blast p result of wild type protein of abl tyrosine kinase Vs PDB database gave the following result. Only the top few searches have been shown here.
gi|62362412|ref|NP_009297.2| c-abl oncogene... Query ID: lcl|64796 Description: gi|62362412|ref|NP_009297.2| c-abl oncogene 1, receptor tyrosine kinase isoform b [Homo sapiens] Molecule type: amino acid Query Length: 1149 Database Name: pdb Description: PDB protein database Program: BLASTP 2.2.20+
Search Parameters Program blastp Word size 3 Expect value 10 Hitlist size 100 Gapcosts 11,1 Matrix BLOSUM62 Threshold 11 Composition-based stats 2 Filter string F Genetic Code 1 Window Size 40 Database Posted date
May 17, 2009 5:41 PM 9,422,204 41,234
Number of letters Number of sequences Entrez query none Karlin-Altschul statistics Params Ungapped Gapped Lambda 0.311071 0.267 K 0.12901 0.041 H 0.377932 0.14 Results Statistics Length adjustment 106
85
Effective length of query Effective length of database Effective search space Effective search space used
1043 5051400 5268610200 5268610200
Descriptions Sequences producing significant alignments:
(Bits) Value E value
pdb|1OPL|A Chain A, Structural Basis For The Auto-Inhibition ... 1128 pdb|1OPK|A Chain A, Structural Basis For The Auto-Inhibition ... 1033 pdb|2FO0|A Chain A, Organization Of The Sh3-Sh2 Unit In Activ... 1021
0.0 0.0 0.0
86
pdb|2E2B|A Chain A, Crystal Structure Of The C-Abl Kinase Dom... 617 pdb|2QOH|A Chain A, Crystal Structure Of Abl Kinase Bound Wit... 611 pdb|1FPU|A Chain A, Crystal Structure Of Abl Kinase Domain In... 611 pdb|2G1T|A Chain A, A Src-Like Inactive Conformation In The A... 611 pdb|2F4J|A Chain A, Structure Of The Kinase Domain Of An Imat... 610
9e-177 5e-175 5e-175 5e-175 8e-175
>pdb|1OPL|A Kinase
Chain A, Structural Basis For The Auto-Inhibition Of C-Abl Tyrosine
pdb|1OPL|B Kinase Length=537
Chain B, Structural Basis For The Auto-Inhibition Of C-Abl Tyrosine
Score = 1128 bits (2917), Expect = 0.0, Method: Compositional matrix adjust. Identities = 528/534 (98%), Positives = 533/534 (99%), Gaps = 0/534 (0%) Query
1
Sbjct
1
Query
61
Sbjct
61
Query
121
Sbjct
121
Query
181
Sbjct
181
Query
241
Sbjct
241
Query
301
Sbjct
301
Query
361
Sbjct
361
Query
421
Sbjct
421
Query
481
Sbjct
481
>pdb|1OPK|A Kinase Length=495
MGQQPGKVLGDQRRPSLPALHFIKGAGKKESSRHGGPHCNVFVEHEALQRPVASDFEPQG MGQQPGKVLGDQRRPSLPALHFIKGAGK++SSRHGGPHCNVFVEHEALQRPVASDFEPQG MGQQPGKVLGDQRRPSLPALHFIKGAGKRDSSRHGGPHCNVFVEHEALQRPVASDFEPQG
60
LSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCE LSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCE LSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCE
120
AQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQR AQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQR AQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQR
180
SISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRN SISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRN SISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRN
240
KPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVE KPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVE KPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVE
300
EFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLL EFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLL EFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLL
360
YMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKF YMATQISSAMEYLEKKNFIHR+LAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKF YMATQISSAMEYLEKKNFIHRNLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKF
420
PIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP PIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP PIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERP
480
EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGK+ + EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKENL
60
120
180
240
300
360
420
480
534 534
Chain A, Structural Basis For The Auto-Inhibition Of C-Abl Tyrosine
Score = 1033 bits (2670), Expect = 0.0, Method: Compositional matrix adjust. Identities = 484/488 (99%), Positives = 488/488 (100%), Gaps = 0/488 (0%) Query
46
Sbjct
7
EALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGE EALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGE EALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGE
105 66
87 Query
106
Sbjct
67
Query
166
Sbjct
127
Query
226
Sbjct
187
Query
286
Sbjct
247
Query
346
Sbjct
307
Query
406
Sbjct
367
Query
466
Sbjct
427
Query
526
Sbjct
487
KLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGIN KLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGIN KLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGIN
165
GSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVA GSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVA GSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVA
225
DGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSL DGLITTLHYPAPKRNKPT+YGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSL DGLITTLHYPAPKRNKPTIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSL
285
TVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY TVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY TVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDY
345
LRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSR LRECNRQEV+AVVLLYMATQISSAMEYLEKKNFIHR+LAARNCLVGENHLVKVADFGLSR LRECNRQEVSAVVLLYMATQISSAMEYLEKKNFIHRNLAARNCLVGENHLVKVADFGLSR
405
LMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS LMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS LMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS
465
QVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEV QVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEV QVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEV
525
EKELGKQG EKELGK+G EKELGKRG
126
186
246
306
366
426
486
533 494
>pdb|2E2B|A With Inno-406
Chain A, Crystal Structure Of The C-Abl Kinase Domain In Complex
pdb|2E2B|B With Inno-406 Length=293
Chain B, Crystal Structure Of The C-Abl Kinase Domain In Complex
Score = 617 bits (1590), Expect = 9e-177, Method: Compositional matrix adjust. Identities = 287/287 (100%), Positives = 287/287 (100%), Gaps = 0/287 (0%) Query
248
Sbjct
7
Query
308
Sbjct
67
Query
368
Sbjct
127
Query
428
Sbjct
187
Query
488
Sbjct
247
>pdb|2QOH|A
SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA
307
VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS
367
SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP
427
ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV
487
YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQGV
534 293
Chain A, Crystal Structure Of Abl Kinase Bound With Ppy-A
66
126
186
246
88
pdb|2QOH|B Length=288
Chain B, Crystal Structure Of Abl Kinase Bound With Ppy-A
Score = 611 bits (1575), Expect = 5e-175, Method: Compositional matrix adjust. Identities = 284/286 (99%), Positives = 286/286 (100%), Gaps = 0/286 (0%) Query
248
Sbjct
2
Query
308
Sbjct
62
Query
368
Sbjct
122
Query
428
Sbjct
182
Query
488
Sbjct
242
SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA SPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAA
307
VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQIS VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEV+AVVLLYMATQIS VMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLLYMATQIS
367
SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP SAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAP
427
ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV ESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKV
487
YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKQG YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGK+G YELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKRG
61
121
181
241
533 287
Out of the above alignment results, the protein 2E2B [Crystal Structure Of The C-Abl Kinase Domain In Complex With Inno-406] gave 100% identity. This protein PDB was downloaded from PDB and then used as a template for Homology Modelling in ICM Molsoft.
5.5 Molegro Virtual Docker – Docking Results: The mutant models created in ICM were Docked with Imatinib by MVD. The table below lists the best docking energies of the mutant models and the energy deviation of the same from the wild type. WILD TYPE DOCKING ENERGY: -5271.97
MUTANT NAME
DOCKING ENERGY
DEVIATION
T231P
-5515.16
-243.19
Y234S
-5762.53
-490.56
A236T
-5504.64
-232.67
89
D260E
-5167.09
+104.88
Q271Q
-5271.41
+0.56
N355N
-5269.39
+2.58
V467G
-5153.63
+118.34
N498T
-5147.55
+124.42
A511P
-5156.21
+115.76
T514K
-5136.71
+135.26
T231P+Q271Q[P25]
-5515.16
-243.19
The energy deviations in the above table show that, among the 11 mutant types there is high deviation in Y234S but this deviation is in negative, which suggests that this mutation in fact “might” help in Imatinib drug binding and consequently effective drug action. On the other hand, the highest positive energy deviation is for the mutant model V467G. This indicates that, if the patient with this kind of mutation is given Imatinib to combat CML, he “may” be required to keep a check on the drug resistance. The two mutations, N355N and Q271Q showed no effective docking energy deviation from the wild type since there was no change in the amino acid residue in these two cases.
90
6.1
CHAPTER - 6
CONCLUSIONS AND SCOPE CONCLUSIONS: Bioinformatics has led to an approach where certain assumptions can be made for a particular case in very time, cost and labour efficient way. This in silico approach has helped many a researchers to eliminate certain instances in a big project, to finish it in lesser time and cost. Of course, the results of an in silico approach cannot be taken as final. They need to be tested to some extent in vitro. In this project, the situation is quite similar and the results here are just an assumption, to help the researchers to choose the direction in which the project must further proceed.
91
After evaluating the mutation data sent by the client; creating mutant nucleotide sequences; finding its reading frame; modelling the mutant protein by ICM molsoft; docking the drug in question, onto these mutant models by Molegro Virtual Docker; we have come to a conclusion that, out of the 16 mutation cases, 5 mutant proteins are truncated and can’t form active protein. In the remaining 11, there are 2 mutant cases which do not show any change in the protein level and hence no appreciable docking energy deviation; there are 4 mutant cases where the mutation “might” act favourably for the drug Imatinib to bind and act effectively against CML; while there are 5 mutant cases where, the mutations “might” cause slight, if not severe Imatinib drug resistance.
6.4 SCOPE OF THE PROJECT: The scope of this project remains large enough. In this project, about 250 patients were screened and their DNA was sequenced for the mutation data. The scope of this project lies in screening more individuals. And also, we have considered only one mutation in each case [except P25], as it was given in the individual details list. We can further test the mutation Vs drug resistance with combination of these mutations. One more aspect that may be looked into for obvious is designing a new drug or testing other drugs, in silico and in clinical trials for those who have confirmed Imatinib Drug resistance.
REFERENCE 1. Brain J. Druker, Moshe Talpaz, Debra J. Resta: Efficacy and Safety of a specific
Inhibitor of the BCR-ABL Tyrosine Kinase in CML: N England J Med, Vol. 344, No. 14
92 2. Christopher Fausel, PharmD, BcPS, BcOP Targeted Chronic Myeloid Leukemia
Therapy: Seeking a Cure : JMCP Supplement to Journal of Managed Care Pharmacy 3. Hagop, Kantarjian, Charles, Sawyers: Hematologic and cytogenic responses to
Imatinib Myselate in CML: N England J Med, Vol. 346, No. 9 4. Karl Peggs, M.A., and Stephen Mackinnon, M.D. Imatinib Mesylate — The New
Gold Standard for Treatment of Chronic Myeloid Leukemia : New England j med 348;11 5. Marin, John M. Jamshid S. Khorashad, Dragana Milojkovic, Puja Mehta, Mona
Anand, Sara Ghorashian, Alistair G. In vivo kinetics of kinase domain mutations in CML patients treated with dasatinib after failing Imatinib doi:10.1182/blood2007-06-096396 6. Michael W.N. Deininger, John M. Goldman, Nicholas Lydon and Junia V. Melo of
BCR-ABL-Positive Cells The Tyrosine Kinase Inhibitor CGP57148B Selectively Inhibits the Growth : Blood 1997 90: 3691-3698 7. Neil P.Shah, Brian J. Skaggs, Susan Branford, Timothy P. Hughes, John M. Nicoll,:
Sequential ABL kinase inhibitor therapy selects for compound drug-resistant BCR-ABL mutations with altered oncogenic potency: PUBMED 8. Pablo Ramirez, John F. Dipersio: Therapy Options in Imatinib Failures: The
Oncologist 9. Simona sovereni tesi di dottoratto: ABL Kinase Domain Mutations a Mechanism
of Resistance to Tyrosine Kinase Inhibitors in Ph positive leukaemia Biological, Clinical and prognostic relevance 10. Stefan Faderl, MD; Moshe Talpaz, MD; Zeev Estrov, MD; and Hagop M. Kantarjian,
MD: Chronic Myelogenous Leukemia: Biology and Therapy
93 11. Susan Branford, Zbigniew Rudzki, Sonya Walsh, Ian Parkinson, Andrew Grigg, Jeff
Szer, Detection of BCR-ABL mutations in patients with CMLtreated with imatinib is virtually always accompanied by clinical resistance, and mutations in theATP phosphate-binding loop (P-loop) are associated with a poor prognosis: Blood, July 2003 12. Thomas O'Hare, Christopher A. Eide and Michael W. N. Deininger: Bcr-Abl kinase
domain mutations, drug resistance, and the road to a cure for chronic myeloid leukemia: doi:10.1182/blood-2007-03-066936 13. Tuija Lundán Novel prognostic factorsin chronic myeloid leukemia
14. Molegro Virtual Docker- Manual 15. ICM Molsoft – Manual 16. www.cancer.gov 17. www.clinicalcancerresearch.gov 18. www.drugbank.ca 19. www.ncbi.nlm.nih.gov 20. www.rcsb.org/pdb/home/home 21. www.wikipedia.org 22. www.pdfcoke.com