Prediction of Catalytic residue through ANN Research Associate) Associate)
Allahabad.
K.P.Mishra (Director) Brijesh Singh Yadav (Senior Sweta Gupta
(Research
United Research Center, UIT campus, e.Mail.
[email protected]
AIM Develop a new method which help in identifying the surface chemistry of active site residue in a protein. This method help in ligand designing, molecular docking, de novo drug designing and structural identification, and comparison of functional site.
Introduction Computer aided drug designing are two types Ligand designing Active site drug designing
About Neural Network Neural network is a set of connected input/output units where each connection has a weight associated with it. Applications of Neural network arespeech recognition
About Neural Network Neural network method can be used for : classification, clustering, modelling and prediction of biological data.
• Neural networks can learn by two methods: • Supervised learning
WORKING OF NEURAL NETWORK Weights
x0 x1
xn Inputs
w0j w1j
Bias
∑
θj
f
wnj
Weighted sum
Activation function
Output
How Does the Neural Network Learn 1. The network gets a training example and, using the existing weights in the network, it calculates the output. 2. Backpropagation then calculates the error, by taking the difference between the calculated result and the expected (actual result). 3. The error is fed back through the network and the weights are adjusted to minimize the error.
About Active Site of Proteins Proteins are polymers of amino acids linked by peptide bonds. The region of a protein that interacts with a ligand is generally referred to as the “active site.” Ligands can be proteins, DNA or smaller molecules, such as pharmaceutical compounds. The active site generally lies on the surface of the protein. In some cases, the active site is buried within the protein. Residues with reactive groups (Asp, Glu, Ser, Cys, His, Lys, Arg) tend to be abundant in protein active sites. The Ser-His- Asp (sometimes Ser-HisGlu)“catalytic triad” is a motif commonly found in enzyme active site.
Levels of Protein Property Residue properties: which responsible for their function and structure • Polar or No polar • Aromatic or Aliphatic • Acidic or Basic • Charged ( either positive or negative) or uncharged • Contain Sulfur • Making H bonding • Essential or Nonessential • Cyclic
Why predict protein function and structure? Protein's structural property helps to identify the various anomalies & diseases and rectify them at genetic level. Identifying the surface chemistry of ligand binding sites residues in a protein . Help in ligand designing, molecular docking, de novo drug
Early methods Statistical method Homology Modeling method Physio-chemical methods Evolutionary conservation Sequence patterns method
Approach to structure prediction Input is encoding as binary form in15 different property of catalytic or no catalytic triad.(3X15=45 X122)where 70 active and 52 nonactive site residues of protein. Create network training program using Backpropagation method where input, output, hidden layer, learning rate and epoch are fixed within the code.
Network Design
Methodology used Collect structural proteins containing active site residues from PDB. 2. Searching active site Residues through Ligplot. 3. Searching nonactive site Residues through Surface Racer. 4. Mapping of protein residues in binary digit with their 15 properties 5. Create a neural network (a computer program) 6. “Train” it by using proteins with known Active site and non active residues property . 7. Testing the network with unknown protein residues.
Methodology used PDB database
Collect protein-ligand complex hetro atom
Searching Active Site residues with Ligplot Searching non active site residues using Surface racer
Distinguished amino Acid with 15 different properties
Mapping All 20 amino acid in the form of binary digit
Result checked and verified using MATLAB
Testing the neural network on the unknown protein residues
Created and Trained the neural network on the above data set using MATLAB
PDB data selected We select about 100 protein from pdb . some example showing the below Protein-ligand complex 1a4k.pdb 1a4q.pdb 1a5g.pdb 1a42.pdb 1a46.pdb 1a50.pdb
Protein-ligand interaction showing by Ligplot
Some protein with their active site residues Protein 1a4k 1a5g 1a42 1a46 1a50 1a94
Active site residue Glu 81A Asp102 His199 Gly216 His86 Ala6c
Asn35B His57 Ser195 Gly216 Gln137 Thr199 Glu205 Lys375 Ser227 Asn236 Gly230 Asp29A
Amino Acid Encoding Scheme Active site Residue (output 1 0) 1dih.pdb Arg,Gly,Thr 000001010010000,000010001010001,100000100101000 1ecv.pdb Arg,Gln,His 000001010010000,100010100010000,000001010010000 1fkb.pdb Asp,Glu,Ile 000010001010000,000010001010000,011000000100000
Non active site residues (output 0 1) 9rub.pdbARG,GLU,PHE000001010010000,000010001010000,010100000100000 8cpa.pdb ARG,ASN,LYS000001010010000,100000100010000,000001010100000 8atc.pdb ALA,ASN,GLu011000000010000,100000100010000,000010001010000
training & testing data Network Create a program using Matlab Function for the training of neural network. The program develops through Backpropagation method which contain the variable like train data, train output, all node, epoch, learning rate, and Error. A typical architecture is a fully-connected network (122 inputs,5hiddenlayer, 2 outputs). We train the network giving different value of learning rate and hidden layer when we obtain minimum error then stop the training. For the testing of result we also generate
Results and Data Performance of Training set>>116/122)*100 Result- =95.0820% correct prediction Performance of Testing set>> 38/40*100 Result=95.00% correct prediction Total no. of epoch- 100 Learning rate- .05 False positive- 2 out of 122 False negative-3 out of 122
Performance Measurement p = Number of correctly classified catalytic residues. n = Number of correctly classified noncatalytic residues. o = Number of non-catalytic residues incorrectly predicted to be catalytic (over-predictions). u = Number of catalytic residues incorrectly predicted to be noncatalytic (under-predictions). t = Total residues (p + n + o + u).
Discussion and Conclusion Neural network architecture developed predicts Active site structure of protein with a performance of almost 95% which is far above as reported so far.
The analysis of the optimal subset selected from the initial 15 residue properties indicates that the algorithm learns to distinguish catalytic from non-catalytic residues based on structural &functional protein residues. This method help in ligand
Reference[1] - R. A. LASKOWSKI, N. M. LUSCOMBE, M. B. SWINDELLS and J.M.THORNTON Protein clefts in molecular recognition and function Protein Sci.1996 5: 2438-2452 [2] - Martin Stahl, Chiara Taroni and Gisber Schnei:Mapping of protein surface cavities and prediction of enzyme class by a selforganizing neural network . [3] - Bartlett GJ, Porter CT, Borkakoti NThornton JM., ] Analysis of catalytic residues in enzyme active sites. Department of Biochemistry and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK. J Mol Biol. 2002 Nov 15;324(1):105-21 [4]-Campbell SJ, Gold ND, Jackson RM, Westhead DR.: Ligand binding: functional site location, similarity and docking.School of Biochemistry and MolecularBiology, University of Leeds, Leeds, LS2 9JT, UK. Current Opinion
Acknowledgement I would like to express my sincere thanks to
Dr. (Smt.) Navita Shrivastava, Head, Dept. of Computer Science, A.P.S. University, Rewa (MP) Mr.Pritish Kumar Varadwaj Lecturer Indian Institute of Information Technology Allahabad (UP) Mr.Rajeev Prithyani Lecturer Dept. of Computer Science, A.P.S. University, Rewa (MP) Mr.Sandeep Kushwaha Lecturer Dept. of Computer Science, A.P.S. University, Rewa (MP) for their kind supervision and keen interest during preparation of this project.