Analysis of Binding Interactions: Obtaining Meaningful Values with Monte Carlo Simulations Simon Chiang
Thursday, April 16, 2009
Theoretical Studies of Protein-DNA Binding Interactions: Goal of studies:
Measurement of Energetic Parameters
Model Effect of Proteins on gene Expression
“Questionable” Resolution: -Poorly reproducible values -Large standard deviations
“Meaningful” Resolution: -Reproducible values -Small standard deviations
Identification
Rational Design
Thursday, April 16, 2009
Monte Carlo Simulations
Resolving Parameters from Data
Fractional Saturation
•
Results of a single site protein-DNA binding experiment
1.0 0.8 0.6 0.4 0.2 0.0 -12
-10
-8
Log [Protein]
Single Site Binding Isotherm Thursday, April 16, 2009
+
Keq ΔG = -RT ln Keq
Parameter to Resolve : ΔGbinding
Fitting Program Detail Fitting Program Evaluate Fit of Data to Curve (Least-Squares)
Calculate Curve Refine Guess Values
1.0 0.8 0.6
ΔGguess
0.4 0.2 0.0 -12
-10 Log [X]
Thursday, April 16, 2009
-8
Fit Value ± Uncertainty Fractional Saturation
Fractional Saturation
Model for Data 1.0 0.8 0.6
ΔGfit±σGfit
0.4 0.2 0.0 -12
-10 Log [X]
-8
Notes on fitting
Fit Value only makes the fit curve optimized to a single, imperfect data set. Data Set 1
ΔGfit1 = -10.2 (kcal/mol)
Different Data Set, Different Curve, Different Fit Value
Data Set 2
Thursday, April 16, 2009
ΔGfit2 = -10.1 (kcal/mol)
x5000
Fit Program ΔGfit1 ΔGfit2 ΔGfit3 ΔGfit4 ΔGfit5
Number of Occurences
Criteria for a Meaningfully Resolved Parameter
σ Fit ΔG Value
Single Peak, resolution of a single value
Fit Values over many data sets follow a distribution with a single peak in probability Distribution limited near this value (small uncertainty)
Thursday, April 16, 2009
Number of Occurences
Ideally Distributions are Gaussian
• •
σσ Fit Value
Fit programs generally report uncertainties in σ Standard deviations are only rigorously accurate for Gaussian Distributions, and can be wildly inaccurate for other distributions.
Thursday, April 16, 2009
Fit Program
x5000
Number of Occurences
Not all parameters follow Gaussian Distributions
Fit Value
No single peak in probability
Perilous to trust the results of a fit program: • No resolution of a single value •Reported σ could be very inaccurate (non-gaussian) Thursday, April 16, 2009
Results of 3 experiments given by dots
Number of Occurences
Triplicate is not enough.
Fit Value
Suggests a gaussian with low σ, resolving a single value
Repeating an experiment many times is necessary to establish that a parameter is meaningfully resolved (ie reproducible).
Thursday, April 16, 2009
Monte Carlo Simulations:
Repeating experiments computationally
Simulated Data Sets Experimental Data Sets and Model for Data
Number of Occurences
Basic Procedure:
Fit Parameter Value
Parameter Distributions
•Simulated data points are made to resemble and have uncertainties equal in magnitude to the experimental data points Thursday, April 16, 2009
Generating Simulated Data A non-trivial task
Several methods exist; subtle in approach, sometimes long in statistical acceptance.
A Universal Requirement: Uncertainties on experimental data points must be well-resolved
Generally requires repeating an experiment several times. Well-resolved uncertainties on the data
Thursday, April 16, 2009
vs σ
Fit Parameter Value
Number of Occurences
Number of Occurences
Utility of Monte Carlo Simulations
Fit Parameter Value
Simulations allow an assessment of the parameter distributions from several rather than thousands of repetitions. Generally ~ few hrs CPU time for 10k data sets
Thursday, April 16, 2009
Overview of my program: •Able to study multi-site, arbitrarily configured systems binding multiple ligands undergoing linked rxns
+
Binding Simulator Program Binding Curves & Derivatives •
Experiments that monitor Fractional Saturation
Monte Carlo Simulations •Experimental/Computer generated Data
Thursday, April 16, 2009
Bacteriophage λ Right Operator (OR) λ repressor dimer cro gene (lytic state) OR1
OR2
(Ptashne, Mark A Genetic Switch 1986)
Thursday, April 16, 2009
OR3
cI gene (lysogenic state)
OR Statistical Mechanical Model •Complex set of equations: determined by five free energy parameters. (Ackers, G. et al PNAS (1982) 79, 1129-1133) 3 Intrinsic site binding free energies:
ΔG1 ΔG2 ΔG3
ΔG12
Thursday, April 16, 2009
ΔG23
Why not to blindly trust fit programs: A dramatic OR example Fractional Saturation
1.0 0.8 0.6
Site 1 Site 2
0.4
Site 3
0.2 0.0
-12
-11
-10
-9
-8
-7
-6
Log [ λ repressor dimer]
Fit Values for two data sets (kcal/mol): Set 1
ΔG1= -12.5 ΔG2= -10.5 ΔG3= -9.4 ΔG12= -2.9 ΔG23= -2.9
Thursday, April 16, 2009
ΔG Set1=2-12.5 ΔG2= -10.5 ΔG3= 0 ΔG12= -2.9 ΔG23= -12.9
Erroneous Conclusions: Resolved Parameters ΔG1, ΔG2*, ΔG12* Unresolved Parameters ΔG3, ΔG23
-12.0
-12.0
number of occurences
ΔG1 (σ = 0.1)
• •
-4.0
number of occurences
-12.4
-10.0
-11.0
ΔG2 (‘σ’= 0.5)
-3.0
-2.0
ΔG12 (‘σ’= 0.5)
-1.0
number of occurences
-12.8
number of occurences
number of occurences
Monte Carlo Analysis of OR
-11.0
-10.0
-9.0
-8.0
ΔG3 (‘σ’= 0.5) <σ> = kcal/mol -4
-2
0
ΔG23 (‘σ’= 0.5)
Fit values, uncertainties only meaningful for ΔG1 ‘standard deviations’ on other parameters are inaccurate
Thursday, April 16, 2009
Resolution problems endemic to cooperative systems Root of poor resolution is a signal to noise issue. Fractional Saturation
1.0 0.8 0.6
Site 1 Site 2
0.4
Site 3
0.2 0.0
-12
-11
-10
-9
-8
-7
Log [ λ repressor dimer]
-6
Example: Sites 1 and 2 fully saturated due to cooperativity at concentrations where site 3 begins to fill. Therefore there is little direct data on ΔG23; the pairwise interaction between just site 2 and site 3.
Thursday, April 16, 2009
OR parameter resolution problem recognized/addressed in the literature
Published solution uses empirically chosen mutant OR operators that emphasize poorly resolved parameters:
OR+
OR-1
OR-3
OR-1 -3
•
Global analysis of wild-type OR and 3 mutant operators required resolution. (Brenowitz, M., et al. PNAS (1986) 83, 8462-8466)
Thursday, April 16, 2009
-12.2
ΔG1 (σ = 0.1)
-3.2
-3.0
-10.7
-2.8
ΔG12 (σ = 0.1)
-10.5
-10.3
number of occurences
-12.4
ΔG2 (σ = 0.1)
-2.6
number of occurences
-12.6
number of occurences
-12.8
number of occurences
number of occurences
MC Analysis of published OR resolution technique:
-9.6
-9.2
-8.8
ΔG3 (‘σ’ = 0.2) <σ> = kcal/mol -3.6
-3.2
-2.8
-2.4
ΔG23 (‘σ’ = 0.2)
•Standard deviations for ΔG3, ΔG23 still somewhat inaccurate •Overall resolution is much better (more gaussian, lower σ) Thursday, April 16, 2009
Improved resolution with rational choice of mutants • Study of distributions clarifies how different mutants resolve specific parameters. OR+ OR-2
OR-1 -3
• Global analysis of wild-type OR and 2 mutant operators predicted to improve resolution
Thursday, April 16, 2009
-12.4
-10.7
number of occurences
ΔG1 (σ = 0.05)
-3.2
-10.5
number of occurences
-12.5
-10.3
-9.7
-3.0
-2.8
ΔG12 (σ = 0.1)
-2.6
-9.5
-9.3
ΔG3 (σ = 0.06)
ΔG2 (σ = 0.06) number of occurences
-12.6
number of occurences
number of occurences
MC Analysis of rationally designed resolution technique:
<σ> = kcal/mol -3.5
-3.0
-2.5
ΔG23 (σ = 0.2)
-2.0
•Rational design results in meaningful fit values and standard deviations for all parameters; resolution in fact improves. Thursday, April 16, 2009
Summary Monte Carlo Simulations examine the resolution of fit parameters without literally repeating experiments thousands of times. Analysis of distributions can assist in the rational design of experiments.
In the future my program and the insights obtained from analyzing OR will be applied to study the 4-site PRE system.
Thursday, April 16, 2009
Acknowledgements UCHSC Dept. of Pharmaceutical Sciences
Dr. David Bain
Bain Laboratory
Dr. Aaron Heneghan Nancy Berton Michael Miura
Thursday, April 16, 2009