Quantum Chemistry in Drug Design and Discovery: Where We are and Where We are Going Motivation Linear-Scaling QM QM based protein/small molecule scoring function Spectroscopy NMR Electron Density and X-Ray
Status of Theoretical Approaches/Problems in Biology Fundamental problems remain unsolved
Water Hydrophobic effect Protein folding Protein/small molecule interactions (drug design) etc.
Hence, current theoretical approaches are insufficient
Current Theoretical Approaches to Problems in Biology Classical Mechanics (standard approach)
Molecular mechanical potentials Purely empirical potentials QSAR analysis
Statistical Mechanics (standard approach)
Analyze trajectories (g(r), correlation functions, etc.) Free energy methods
Mathematical tools (standard for all potentials)
Energy minimization Molecular dynamics etc.
Quantum Mechanics (less common approach)
Cluster models (continuum solvation) QM/MM Linear-scaling QM
Strengths and Weaknesses of Classical and Quantum Potentials Classical Mechanics (standard approach)
Highly approximate models (Coulombic electrostatics) Rapidly evaluated Good approach for ensemble generation Quality of potentials highly dependent on parameterization
Quantum Mechanics (less common approach)
Fewer approximations (in the limit very accurate models) Expensive calculations Good for examining single snapshots Quality of potentials are well understood Used to build classical models Highly successful in organic and inorganic chemistry
Hence, applying a QM approach to biological problems is the logical next step
What are the Hurdles to a QM Model in (Structural) Biology? Very computationally expensive
Linear-scaling algorithms Parallel computing
What model to use
Exploit model chemistries Semiempirical Hamiltonians Density Functional Theory Hartree-Fock Theory Quantum Monte-Carlo
Ensemble generation
Novel sampling approaches Use classical models to generate ensembles
Spectroscopy
NMR X-ray etc.
Computational biology approach
Leverage the repetitive nature of biology Bioinformatics databases
Our Vision of Quantum Biology Exploit
Linear-scaling algorithms Parallel computing Model chemistries Semiempirical Hamiltonians Density Functional Theory Hartree-Fock Theory Quantum Monte-Carlo
Exploit ensemble generation protocols Novel sampling approaches Use classical models to generate ensembles
Spectroscopy NMR X-ray
Exploit statistical approaches Leverage the repetitive nature of biology Bioinformatics databases
Why Can We Think About Using Quantum Mechanics?
Divide and Conquer Divides QM system into a set of smaller subsystems. “Solves” matrix diagonalization problem. Parallelizable. Uses standard energy expressions. Obtain gradients using standard methods. S. L.Dixon and K. M. Merz, Jr. J. Chem. Phys. 104, 66436649 (1996) S. L. Dixon and K. M. Merz, Jr. J. Chem. Phys. 107, 879893 (1997) A. van der Vaart, D. Suarez, K. M. Merz, Jr. J. Chem. Phys. 113, 1051210523 (2000)
Divide and Conquer “Onion-Skin” Strategy Buffer Region 1
Buffer Region 2
Core Region
LYSASPGLYPROCYSASNTRPGLYALAVALGLN GLUALALEUGLYCYSARGLYSSERASNGLUTYR
Subsystem k
Subsystem k+4
Divide and Conquer “Onion-Skin” Strategy
P
µ ν
=
∑
α
µ ν
α
α
µ ν = χ µ ∈ χ ν ∈
µ ν
µν
α =
α
µ ν
=
χ ∈ χ ∈ µ ν
α
∑
α
( µ )
α
ν
α
= +
α
[(
ε
− ε )
]
∑ µ =
µµ =
∑ ∑ µ = α =
α µµ
∑
α
α
µ
=
Divide & Conquer ("DivCon") vs Standard Calculation (Seconds required to complete one SCF Cycle)
Linear vs. Exponential Scaling
CPU Resources Required
3000
2500 Current Standard Scales Exponentially, Rendering It Unsuitable for Routinely Analyzing Large Biomolecules
2000
1500
1000
500
"Divide & Conquer" Scales Linearly
0 0
100 Small molecule drug candidates (50-150 atoms)
200
300
400
Number of Atoms Per Molecule
500
600
Drug targets Large Biomolecules (Proteins ~2,500 atoms)
Errors in Heat of Formation Using D&C
Implicit Solvation in Biological Systems • Use PoissonBoltzmann Theory in conjunction with Divide and Conquer. • CM1/CM2 charges were key to making this approach sucessful. • Model fit (nonpolar term) to simultaneously reproduce solvation free energies of small molecules and LogP values of a wide range of compounds. PB: Tannor, Marten, Murphy, Friesner, Sitkoff, Nicholls, Honig, Rignalda, Goddard J. Am. Chem. Soc. 1994, 116(26), 1187511882. CM1 and CM2: Li, Zhu, Cramer, Truhlar J. Phys. Chem. 1998, 102, 18201831. Storer, Giesen, Cramer, Truhlar J. ComputerAided Molecular Design 1995, 9, 87110.
Implicit Solvation in Biological Systems - Proteins Solvation Free Energies of Proteins in Water Calculated by DivConPB Methodology. Protein Crambin
Atoms/Res/q
GRF
Greorg
Gnp
Gsol
642/46/0
316.7
23.4
19.7
273.5
BPTI
888/58/+6
1336.3 69.7
26.6
1239.8 14
CspA
1010/69/0
1175.5 109.3
28.6
1073.5 15
Lysozyme
1960/129/+8
1936.3 129.3
45.3
1761.7 13
Subtilisin E
3854/275/2
1856.3 166.8
74.8
1614.7 15
Gogonea and Merz J. Phys. Chem. A. 1999, 103, 51715188
SCRF iterats 11
Do We Understand Intermolecular Interactions between Biomolecules? Current understanding is at the classical level, but Intermolecular (and intramolecular in biomolecules) interactions are inherently quantum in nature. Can we use quantum chemistry to better understand interactions in biomolecular systems?
O Glutamic Acid
N
Min Max O (Ð) O (0.716, 0.413)
(0.135, +0.140)
Histidine
N
O N
N O (0.543, 0.292)
Glycine
N
O Asparagine
N (0.574, 0.332) O N O (0.566, 0.337)
Aspartic Acid
N O (Ð)
O O (0.569, 0.342)
Alanine
N
Variations in Point Charges • Variation of on polar atoms is +/0.3e (Mulliken, CM1 or CM2) • Arises due to variations in the local environment of the atoms
Charge Transfer Effects : HIV-1 Protease 0.1
0.05
0
HIVP A76889
HIVP A76982
HIVP A78791
HIVP AcePep
HIVP Indinavir
HIVP SB203386
HIVP XK263
-0.05
∆ -0.1
-0.15
-0.2
-0.25
+ve ∆q => Charge transferred from Inhibitor to Protease
Inhibitor
-ve ∆q => Charge transferred from Protease to Inhibitor
How well do We Understand Biomolecular Intermolecular Interactions? Current understanding has limitations due to the neglect of polarization and charge transfer effects. Thus, QM models can significantly contribute to increasing our understanding of these effects
QM Based Protein/Ligand Scoring Function A quantum mechanics based approach for more fundamental understanding of ligand/drug-protein interaction. Score function includes CT and polarization effects which are generally ignored by standard score functions. Score function can be systematically improved via appropriate parameterization. Pose generation via empirical or classical approaches. Primary screen via empirical or classical approaches. QM based scoring for final selection of compounds - i.e., secondary computational screen.
• Medicinal Chemistry Feedback: Validate
Validate
and Validate some more
Protein-Ligand Binding (Docking) I. The Unbound State
II. Ligand Recognition
L P
P
L
L
L
III. The Protein Ligand Complex
L P
Methodology: Thermodynamic Cycle to Calculate Free Energy of Binding PS P S Gas Phase +
+
Binding Free Energy calculated as: PS P S ∆Gbs = DGbg + DGsolv - DGsolv - DGsolv
DGbg = DHbg - TDSbg g g DHb = DH f + ( 1 6 )LJ R DSg = DSAC ,N,O,S + num(rot _ bonds)
Solvent
40 Protein-ligand Complexes 60%
56%
53%
50%
52%
51% 47%
44%
44%
43%
R
2
40%
30%
25%
20%
23% 17%
17%
14% 8%
10%
Xs
(u co np re ar (a am ) et er iz D ed ru ) gS SY co BY re L/ (b SY D ) -S BY c L/ or Ch e (b em ) Sc SY or BY e (b L/ ) Ce G sc riu or s2 e /L (b ig ) Sc or Ce e riu (b ) s2 /P M Ce F (b riu ) s2 / P Ce LP riu (b s2 ) /L SY U D BY I L/ (b F) Sc or e (b Au ) to do ck (b )
or e Sc
M Q
M
Sc
or e
Q
Q M
Sc
or e
(a
)
(b )
0%
Score Function
(a) parameterized on this data set; (b) parameterized on other data sets Source: Renxiao Wang, Yipin Lu and Shaomeng Wang, Comparative Evaluation of 11 Scoring Functions for Molecular Docking J.Med.Chem. 2003, 46, 2287-2303. For QMScore date, Kaushik Raha, Merz lab at Pennsylvania State University, unpublished study.
HIV-1 Protease - XK263 (1hvr) -1000 -1200
0
5
10
15
20
-1400 -1600 -1800
TotalScore -2000 -2200
RMSD (Ao)
-2400 5 4 3 2 Rank, RMSD 1 0
Xscore Autodock DrugScore TotalScore Cerius2/PLP Cerius2/PMF Cerius2/LUDI SYBYL/Gscore SYBYL/F-Score SYBYL/D-Score Cerius2/LigScore Score Function SYBYL/ChemScore Native Rank
Best Rank RMSD
FKBP - Rapamycin (1fkb) 0 0
5
10
15
20
-200 -400 -600 -800 TotalScore -1000 -1200
RMSD (Ao)
12 10 8 6 Rank,4 RMSD 2 0
Xscore Autodock DrugScore TotalScore Cerius2/PLP Cerius2/PMF Cerius2/LUDI SYBYL/Gscore SYBYL/F-Score SYBYL/D-Score Cerius2/LigScore SYBYL/ChemScore Score Function Native Rank
Best Rank RMSD
Conclusions and Future Directions • First generation (AM1 based) results are very promising and can be readily refined.
• Explore further parameterization to improve predictive capability. • QM geometry optimization (ligand only) to further refine structures.
Preliminary Studies of Semiempirical Electron Densities of Biomolecules and Potential Applications Can we compute reasonable electron densities (EDs) of biomolecules using semiempirical Hamiltonians? How good are they with respect to experimental EDs? Ab initio computed EDs? What are their potential uses in Xray studies of macromolecules?
Experimental X-Ray Crystallography Xray experiments measure the intensities I(h k l) of the diffraction peaks and derive the structure factors F(h k l).
I(h k l) = F(h k l)
2
Fourier transformation is used to obtain the electron density distributions ρ(x y z) in molecule crystals.
1 ρ(x y z) = V
ååå h
k
F(h k l) exp [- 2pi(hx + ky + lz) + ia (h k l)]
l
Because of the lack of phase angles α(h k l), special techniques have to be applied (heavyatom methods, anomalous scattering, and molecular replacement, etc.) and structure determination involves an iterative process called refinement.
A Typical Diffraction Spectrum from an XRD Experiment
Reflections only appear at discrete angles (h k l). Peak intensities are related to structure factors by:
I(h k l) µ F(h k l)
2
Theoretical Studies of Electron Density Distributions Ab initio or semiempirical calculation of electron density.
ò Y(r , r ,K ,r ,s , s ,K , s ) = å å P f ( r )f (r )
ρ(r) =
1
mn
m
n
2
m
1
2
n
2
dr2 L drn ds1 L dsn
v
n
Theoretical structure factors can be simulated by Fourier transformation of theoretical densities. Methods have been described to handle/model temperature factors. Periodic HartreeFock and density functional calculations of small molecules now feasible with, for example, the program CRYSTAL. With our linearscaling technologies we can evaluate the ED of macromolecules. CRYSTAL: de Vries, Feil and Tsirelson Acta. Cryst. 1999, B56, 118123
QMED Calculations of Macromolecules with Semiempirical Hamiltonians Typical semiempirical models employ the core approximation, but we need the core electron density in order to match with experiment. Full EDs can be obtained by augmenting the QM derived valence EDs with spherical core EDs. The main question remains, though How good are these EDs?
AM1 EDs: Ho, Schmider, Edgecombe and Smith, Jr. Int. J. Quantum Chem.1994, S28, 215 Core model: Cioslowski and Piskorz Chem. Phys. Lett. 1996, 255, 315319
Quantum Mechanical Electron Densities of p-Nitropyridine-N-Oxide AM1 (DIVCON)
HF/631G* (G98)
Quantum Mechanical Electron Densities of a Protein Crambin Ultrahigh resolution structure (0.54Å, Teeter et al., 2000). 46 residues, 648 atoms. The QM ED map currently contains only the electron distribution for a static structure as opposed to a time and space average, but otherwise agrees well with the experimental map.
A Small Molecule Test Case Recent work by Perpetuo et al (Acta Cryst. B55, 7077, 1999). 3 molecules studied: N(trifluomethyl) formamide, N(2,2,2trifluoethyl) formamide, and 2,2,2trifluoethyl isocyanide. 1170 independent reflections. 70 parameters used in refinement. R=0.0498
Preliminary Results Structure Factors (QM w/o T fac v.s. Raw) 50 45
y = 0.6991x R 2 = 0.8753
40 35 30 25 20 15 10 5 0 0
10
20
30
40
50
60
70
Preliminary Results -- Cont’d Structure Factors (QM v.s. Raw)
Structure Factors (Atomic v.s. Raw) 70
45
40
60
y = 0.5594x
35
y = 0.8213x
50
R 2 = 0.9221
2
R = 0.9291
30
40
25
20
30
15 20 10 10
5
0
0 0
10
20
30
40
R=0.196
50
60
70
0
10
20
30
40
R=0.173
50
60
70
Current Status and Future Directions Currently further validating computed ED on small molecules. Application areas we are pursuing by providing aspherical ED descriptions: Aid the macromolecular refinement process by introducing another constraint. Allow for deconvolution of anisotropic density distributions from the anisotropic temperature factors. Study macromolecules with the Atoms in Molecules (AIM) theory.
Summary Our Vision of Quantum Biology Exploit
Linear-scaling algorithms Parallel computing Model chemistries Semiempirical Hamiltonians Density Functional Theory Hartree-Fock Theory Quantum Monte-Carlo
Exploit ensemble generation protocols Use classical models to generate ensembles Novel sampling approaches
Spectroscopy NMR X-ray
Exploit statistical approaches Leverage the repetitive nature of biology Bioinformatics databases
General Conclusions
• Application of QM to large biomolecular systems are opening up new avenues to aid in our understanding of biomolecular solvation, inhibition, etc. • QM gives a better account of electrostatic interactions than typical classical models. • Quantum mechanics and classical mechanics can work synergistically to achieve our desired goal of understanding biomolecular structure, function and inhibition.
Acknowledgements • Steve Dixon • Arjan van der Vaart • Dimas Suarez • Lance Westerhoff • Martin Peters • Kaushik Raha • Ed Brothers • Andrew Wollacott • Ken Ayers • Bryan Op’t Holt • Ning Liao • Xiadong Zhang • Bing Wang • Guille Estiu
Acknowledgements • DOE • NIH • NSF • AMBER Development Team • Pharmacopeia, Inc. • QuantumBio Inc.