Compute the alanine dipeptide [\(\Phi,\Psi\)] transition using OpenMM and OPES-eABF

In this brief tutorial, we calculate the Ramachandran [\(\Phi\),\(\Psi\)] plot of alanine dipeptide in vacuum using the OPES-eABF method.

Alanine dipeptide is a popular test system for enhanced sampling algorithms because it is a minimal example of sampling challenges that are posed by many biological systems. The slow motions of the molecule are largely governed by the \(\Psi\) (backbone N-C-C-N) and \(\Phi\) (backbone C-N-C-C) dihedrals.

An interactive visualization of the alanine dipeptide molecule can be shown with nglview. Atoms involved in the [\(\Phi\),\(\Psi\)] collective variable (CV) are shown in ball and stick representation, while other atoms are transparent.

[2]:

import pytraj as pt
import nglview as ngl

traj = pt.load("../data/alanine-dipeptide.pdb", top='../data/alanine-dipeptide.prmtop')

view = ngl.show_pytraj(traj)
view.clear_representations()
view.add_ball_and_stick('@6,8,14,16', opacity=1.0) # CV atoms of phi torsion
view.add_ball_and_stick('@4,6,8,14', opacity=1.0)  # CV atoms of psi torsion
#view.add_ball_and_stick('@1,4,6,8', opacity=1.0)   # CV atoms of theta torsion

view.add_licorice(opacity=0.5)

# uncomment for interactive visualization in a notebook
#view

While the interactive visualization is not possible on the web page, this is what you should see in a notebook:

Importance sampling of the [\(\Phi,\Psi\)] plane with OPES-eABF

Below, the OPES-eABF sampling algorithm is applied to enhance sampling in the [\(\Phi,\Psi\)] plane. For this purpose, an OpenMM simulation object is created using AMBER parameter and coordinate files and the OpenMM interface to the adaptive sampling package.

[3]:

from sys import stdout
import numpy as np

from openmm import *
from openmm.app import *
from openmm.unit import *

from adaptive_sampling.interface.interface_openmm import AdaptiveSamplingOpenMM

# ------------------------------------------------------------------------------------
# define collective variables
cv_atoms_psi    = [6, 8, 14, 16]  # backbone N-C-C-N torsion
cv_atoms_phi    = [4, 6, 8, 14]   # backbone C-N-C-C torsion
minimum         = -180.0          # minimum of the CV
maximum         = 180.0           # maximum of the CV
bin_width       = 5.0             # bin width along the CV
periodicity     = [               # define periodicity of CVs (needs to be given in radians as units are not converted)
    [-np.pi, np.pi],
    [-np.pi, np.pi],
]

collective_var = [
    ["torsion", cv_atoms_psi, minimum, maximum, bin_width],
    ["torsion", cv_atoms_phi, minimum, maximum, bin_width],
]

# ------------------------------------------------------------------------------------
# Setup OpenMM
prmtop = AmberPrmtopFile(f"../data/alanine-dipeptide.prmtop")
crd = AmberInpcrdFile(f"../data/alanine-dipeptide.crd")
system = prmtop.createSystem(
    nonbondedMethod=NoCutoff,
    constraints=HBonds,
)

# Initialize the `AdaptiveSamplingOpenMM` interface to couple the OpenMM simulation to a bias potential
# The OpenMM `simulation` object is set up internally, but can still be modified by calling `the_md.simulation` or `the_md.integrator`
the_md = AdaptiveSamplingOpenMM(
    crd.positions,
    prmtop.topology,
    system,
    dt=2.0,                                       # timestep in fs
    equil_temp=300.0,                             # temperature of simulation
    langevin_damping=1.0,                         # langevin damping in 1/ps
    cv_atoms=np.unique(cv_atoms_phi+cv_atoms_psi) # specifying CV atoms significantly speeds up simulation of large systems, as the bias force will only be calculated for those
)
the_md.integrator.setConstraintTolerance(0.00001)

# Append OpenMM reporters to simulation for output
the_md.simulation.reporters.append(DCDReporter('alanine-dipeptide.dcd', 100))
the_md.simulation.reporters.append(StateDataReporter(
    stdout,
    100,
    step=True,
    time=True,
    potentialEnergy=True,
    kineticEnergy=True,
    totalEnergy=True,
    temperature=True,
    speed=False,
    separator='\t')
)

The OPES-eABF sampling algorithm is attached to the OpenMM simulation.

[4]:

from adaptive_sampling.sampling_tools import OPESeABF

eabf_ext_sigma    = 5.0           # thermal width of coupling between CV and extended variables in degrees
eabf_ext_mass     = 100.0         # mass of extended variable
abf_nfull         = 100           # number of samples per bin when the ABF force is fully applied

opes_kernel_std   = [5.0, 5.0]    # kernel standard deviations of Phi and Psi in degrees
opes_frequency    = 500           # frequency of kernel creation in MD steps
opes_barrier      = 50.0          # Barrier parameter in kJ/mol
opes_adaptive     = False         # Adaptive standard deviation of kernels, useful for sampling along bad CVs
opes_gamma        = None          # Bias factor for Well-Tempered distribution, if None, calculated from barrier parameter

the_bias = OPESeABF(
    the_md,
    collective_var,
    # eABF parameters
    ext_sigma=eabf_ext_sigma,
    ext_mass=eabf_ext_mass,
    nfull=abf_nfull,
    # OPES parameters
    kernel_std=opes_kernel_std,
    update_freq=opes_frequency,
    bias_factor=opes_gamma,
    adaptive_std=opes_adaptive,
    energy_barr=opes_barrier,
    # general parameters
    output_freq=10,               # frequency of writing outputs
    f_conf=0.0,                   # confinement force of CV at boundaries
    equil_temp=300.0,             # equilibrium temperature of simulation
    periodicity=periodicity,      # periodicity of CVs
    verbose=True,                 # print verbose output
)
the_md.set_sampling_algorithm(the_bias) # to take effect, the sampling algorithm has to be set in the AdaptiveSamplingOpenMM interface


 >>> INFO: Initialize torsion as collective variable:
         Minimum0:      -180.0 Degree
         Maximum0:      180.0 Degree
         Bin width0:    5.0 Degree

 >>> INFO: Initialize torsion as collective variable:
         Minimum1:      -180.0 Degree
         Maximum1:      180.0 Degree
         Bin width1:    5.0 Degree
        ----------------------------------------------
         Total number of bins:          5184

 >>> INFO: OPES Parameters:
         ---------------------------------------------
         Kernel_std:    [0.08726646 0.08726646]
         Rescaling:     True
         Adaptive:      False   (5000 steps)
         Normalize:     True    (approximated: True)
         Explore:       False
         Barrier:       11.9503 kcal/mol
         Bias factor:   20.045403709021212
         Read force:    True
         Kernel merge:  True    (threshold: 1.0)
         ---------------------------------------------
 >>> INFO: Extended-system Parameters:
         ---------------------------------------------
         Coupling:      [0.08726646 0.08726646]
         Masses:        [100. 100.]
         ---------------------------------------------
 >>> INFO: ABF enabled for OPES-eABF (N_full=100)

>>> adaptive-sampling: Module 'ase' not found, will not import 'FENEB'
/home/robert/Bachelor_Thesis/Code/adaptive_sampling/adaptive_sampling/colvars/colvars.py:215: UserWarning: Using torch.cross without specifying the dim arg is deprecated.
Please either pass the dim explicitly or simply use torch.linalg.cross.
The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at /opt/conda/conda-bld/pytorch_1720538438750/work/aten/src/ATen/native/Cross.cpp:62.)
  self.cv = torch.atan2(torch.dot(torch.cross(q23_u, n1), n2), torch.dot(n1, n2))

[5]:

# WARNING: For long simulations, this can become expensive, and it is recommended to perform the computation on an HPC cluster.
if os.path.isfile('CV_traj.dat'):
    os.system("rm CV_traj.dat opeseabf.out")
the_md.run(nsteps=1000) # 500000 * 2 fs = 1 ns

#"Step" "Time (ps)"     "Potential Energy (kJ/mole)"    "Kinetic Energy (kJ/mole)"      "Total Energy (kJ/mole)"        "Temperature (K)"
100     0.20000000000000015     -14.69729698585465      81.81212348484868       67.11482649899403       385.87202990466926
200     0.4000000000000003      -17.35861527983259      85.32225493601415       67.96363965618156       402.4277858317756
300     0.6000000000000004      -46.42447199266495      87.90057858385677       41.47610659119182       414.58861160386914
400     0.8000000000000006      -16.564945457376268     77.15117856536602       60.586233107989756      363.888389818763
500     1.0000000000000007      -42.760607527450546     77.7586562795857        34.99804875213515       366.75359669426
600     1.2000000000000008      -35.78178557575954      43.68505452738508       7.9032689516255346      206.0433093403335
700     1.400000000000001       -52.89746176077644      69.58132670195477       16.683864941178328      328.1847070369183
800     1.6000000000000012      -48.167972714775914     76.24319766493682       28.0752249501609        359.6058407509813
900     1.8000000000000014      -46.69024601386794      76.35103121618545       29.660785202317506      360.11444448279667
1000    2.0000000000000013      -46.917748252927744     62.86855137403911       15.950803121111363      296.52347968158017

Analysis of Results

Visualize the trajectory with NGlView

The following will create an interactive visualization of the trajectory:

[6]:

path = '.'
traj = pt.iterload(f"{path}/alanine-dipeptide.dcd", top="../data/alanine-dipeptide.pdb")

view = ngl.show_pytraj(traj)
view.clear_representations()
view.add_ball_and_stick('@6,8,14,16', opacity=1.0) # CV atoms of phi torsion
view.add_ball_and_stick('@4,6,8,14', opacity=1.0)  # CV atoms of psi torsion
#view.add_ball_and_stick('@1,4,6,8', opacity=1.0)   # CV atoms of theta torsion

view.add_licorice(opacity=0.5)
#view

Trajectory of CVs

All important information for post/processing of the trajectory is stored in CV_traj.dat.

[7]:

cv_traj = np.loadtxt(f'{path}/CV_traj.dat', skiprows=1)
cv_phi = cv_traj[:,1] # Phi trajectory (first CV)
cv_psi = cv_traj[:,2] # Psi trajectory (second CV))
la_phi = cv_traj[:,3] # extended-system Phi trajectory
la_psi = cv_traj[:,4] # extended-system Psi trajectory

In the following, the trajectory of the extended system is plotted for \(\Phi\) (blue) and \(\Psi\) (orange). The physical trajectory of the corresponding CVs is plotted in light colors, which is tightly coupled to the extended system.

As the extended system trajectory is biased with OPES and ABF, the system immediatly starts diffusing along both reaction coordinates, such that in the long run the (\(\Phi,\Psi\)) plane is uniformely sampled.

[ ]:

import matplotlib.pyplot as plt
fig, axs = plt.subplots(1, 1, sharey=False, figsize=(8,6))
axs.scatter(cv_traj[:,0]/1000, cv_phi, s=0.1, alpha=0.5)
axs.scatter(cv_traj[:,0]/1000, la_phi, s=0.5, color='C0', linewidth=2, label='$\Phi$')

axs.scatter(cv_traj[:,0]/1000, cv_psi, s=0.1, alpha=0.5)
axs.scatter(cv_traj[:,0]/1000, la_psi, s=0.5, color='C1', linewidth=2, label='$\Psi$')

axs.set_xlim([0,2])
axs.set_ylim([-181,181])

axs.set_yticks([-180,0,180])
axs.set_xlabel('time / ps', fontsize=30)
axs.set_ylabel('CV / Degree', fontsize=30)
axs.tick_params(axis='y',length=6,width=3,labelsize=25, pad=10, direction='in')
axs.tick_params(axis='x',length=6,width=3,labelsize=25, pad=10, direction='in')
axs.spines['bottom'].set_linewidth(3)
axs.spines['top'].set_linewidth(3)
axs.spines['left'].set_linewidth(3)
axs.spines['right'].set_linewidth(3)
axs.legend(fontsize=20)
fig.tight_layout()

Use the MBAR estimator to compute ensemble properties

Now we will use the MBAR estimator to calculate the unbiased weights for the simulation frames. From those, periodic PMFs as well as other ensemble properties can be computed. Note that to converge PMFs, much longer trajectories are required!

WARNING: For long simulations, this can become expensive, and it is recommended to perform the computation on an HPC cluster.

[ ]:

from adaptive_sampling import units
from adaptive_sampling.processing_tools import mbar

ext_sigma = np.asarray([5.0,5.0])

# create grid for PMF
minimum   = -180.0
maximum   = 180.0
bin_width = 5.0
grid_1d = np.arange(minimum, maximum, bin_width)
xx, yy = np.meshgrid(grid_1d, grid_1d)
grid = np.vstack([xx.flatten(),yy.flatten()])

# trajectories of CVs and extended system
cv = np.vstack([cv_phi,cv_psi])
la = np.vstack([cv_phi,cv_psi])

recalc = True
if recalc:
    print("==================")
    print(" Running the MBAR ")
    print("==================\t")

    # run MBAR to obtain unbiased weights of frames
    print("\nBuilding state windows from the continuous trajectory:")
    traj_list, indices, meta_f = mbar.get_windows(
        grid.T,
        cv.T,
        la.T,
        ext_sigma,
        dx=np.asarray([bin_width,bin_width]),
        equil_temp=300.0,
        progress_bar=True,
    )
    print("\nBuilding Boltzmann factors:")
    exp_U, frames_per_traj = mbar.build_boltzmann(
        traj_list,
        meta_f,
        equil_temp=300.0,
        periodicity=[-180.,180.],
        progress_bar=True,
    )
    weights = mbar.run_mbar(
        exp_U,
        frames_per_traj,
        max_iter=10000,
        conv=1.0e-2,     # usually 1.0e-4
        conv_errvec=1.0,
        outfreq=10,
        device='cpu',
    )

    np.savez(f"{path}/results.npz", W=weights, idx=indices)
else:
    data = np.load(f'{path}/results.npz')
    weights = data['weigths']
    indices = data['idx']

Compute PMFs from frame weights

[ ]:

# 1D PMFs along phi and psi in kJ/mol
pmf_psi, rho_psi = mbar.pmf_from_weights(grid_1d, cv_psi[indices], weights, equil_temp=300.0)
pmf_phi, rho_phi = mbar.pmf_from_weights(grid_1d, cv_phi[indices], weights, equil_temp=300.0)
pmf_psi -= pmf_psi.min()
pmf_phi -= pmf_phi.min()

# 2D (phi,psi) PMF (Ramachandran plot) in kJ/mol
pmf_2d, rho = mbar.pmf_from_weights(
    grid.T,
    cv.T[indices],
    weights,
    dx=np.asarray([bin_width,bin_width]),
    equil_temp=300.0,
)
pmf_2d -= pmf_2d.min()
pmf_2d *= units.kJ_to_kcal # convert to kcal/mol

Below, an example is shown of how the PMF should evolve over the course of 2 ns, quickly converging for the full [\(\Phi,\Psi\)] plane.

title

[ ]: