{ "cells": [ { "cell_type": "markdown", "id": "28f73de5-b88f-465f-ac6a-768f7ad70dba", "metadata": {}, "source": [ "# Compute the alanine dipeptide [$\\Phi,\\Psi$] transition using OpenMM and OPES-eABF\n", "\n", "In this brief tutorial, we calculate the Ramachandran [$\\Phi$,$\\Psi$] plot of alanine dipeptide in vacuum using the OPES-eABF method." ] }, { "cell_type": "markdown", "id": "e6e82f00-549d-4daf-aceb-0d645de7567e", "metadata": {}, "source": [ "Alanine dipeptide is a popular test system for enhanced sampling algorithms because it is a minimal example of sampling challenges that are posed by many biological systems. The slow motions of the molecule are largely governed by the $\\Psi$ (backbone N-C-C-N) and $\\Phi$ (backbone C-N-C-C) dihedrals." ] }, { "cell_type": "markdown", "id": "ae8fbc05-6630-4a10-83cb-99252cc75f51", "metadata": {}, "source": [ "An interactive visualization of the alanine dipeptide molecule can be shown with nglview. Atoms involved in the [$\\Phi$,$\\Psi$] collective variable (CV) are shown in ball and stick representation, while other atoms are transparent. " ] }, { "cell_type": "code", "execution_count": 2, "id": "6892a570-7bcc-48c9-9ec9-f0e47399d186", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7ca5a1ff6fcf417da518b2e1cde45fb2", "version_major": 2, "version_minor": 0 }, "text/plain": [] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pytraj as pt\n", "import nglview as ngl\n", "\n", "traj = pt.load(\"../data/alanine-dipeptide.pdb\", top='../data/alanine-dipeptide.prmtop')\n", "\n", "view = ngl.show_pytraj(traj)\n", "view.clear_representations()\n", "view.add_ball_and_stick('@6,8,14,16', opacity=1.0) # CV atoms of phi torsion\n", "view.add_ball_and_stick('@4,6,8,14', opacity=1.0) # CV atoms of psi torsion\n", "#view.add_ball_and_stick('@1,4,6,8', opacity=1.0) # CV atoms of theta torsion\n", "\n", "view.add_licorice(opacity=0.5)\n", "\n", "# uncomment for interactive visualization in a notebook\n", "#view" ] }, { "cell_type": "markdown", "id": "23a07d94-39c8-447a-954d-71fec9326cb3", "metadata": {}, "source": [ "While the interactive visualization is not possible on the web page, this is what you should see in a notebook:" ] }, { "cell_type": "markdown", "id": "7735adcf-997e-498d-a3e1-7c765994e9f3", "metadata": {}, "source": [ "![](nglview_snapshot.png)" ] }, { "cell_type": "markdown", "id": "2415f7cc-c021-4442-ae81-4c2187d2eb60", "metadata": {}, "source": [ "## Importance sampling of the [$\\Phi,\\Psi$] plane with OPES-eABF\n", "\n", "Below, the OPES-eABF sampling algorithm is applied to enhance sampling in the [$\\Phi,\\Psi$] plane. \n", "For this purpose, an OpenMM simulation object is created using AMBER parameter and coordinate files and the OpenMM interface to the adaptive sampling package. " ] }, { "cell_type": "code", "execution_count": 3, "id": "b2ad231f-9d4d-401b-a944-0b454216f044", "metadata": { "scrolled": true }, "outputs": [], "source": [ "from sys import stdout\n", "import numpy as np\n", "\n", "from openmm import *\n", "from openmm.app import *\n", "from openmm.unit import *\n", "\n", "from adaptive_sampling.interface.interface_openmm import AdaptiveSamplingOpenMM\n", "\n", "# ------------------------------------------------------------------------------------\n", "# define collective variables\n", "cv_atoms_psi = [6, 8, 14, 16] # backbone N-C-C-N torsion\n", "cv_atoms_phi = [4, 6, 8, 14] # backbone C-N-C-C torsion\n", "minimum = -180.0 # minimum of the CV\n", "maximum = 180.0 # maximum of the CV\n", "bin_width = 5.0 # bin width along the CV\n", "periodicity = [ # define periodicity of CVs (needs to be given in radians as units are not converted)\n", " [-np.pi, np.pi],\n", " [-np.pi, np.pi],\n", "]\n", "\n", "collective_var = [\n", " [\"torsion\", cv_atoms_psi, minimum, maximum, bin_width],\n", " [\"torsion\", cv_atoms_phi, minimum, maximum, bin_width],\n", "]\n", "\n", "# ------------------------------------------------------------------------------------\n", "# Setup OpenMM\n", "prmtop = AmberPrmtopFile(f\"../data/alanine-dipeptide.prmtop\")\n", "crd = AmberInpcrdFile(f\"../data/alanine-dipeptide.crd\")\n", "system = prmtop.createSystem(\n", " nonbondedMethod=NoCutoff,\n", " constraints=HBonds,\n", ")\n", "\n", "# Initialize the `AdaptiveSamplingOpenMM` interface to couple the OpenMM simulation to a bias potential\n", "# The OpenMM `simulation` object is set up internally, but can still be modified by calling `the_md.simulation` or `the_md.integrator`\n", "the_md = AdaptiveSamplingOpenMM(\n", " crd.positions,\n", " prmtop.topology,\n", " system,\n", " dt=2.0, # timestep in fs\n", " equil_temp=300.0, # temperature of simulation\n", " langevin_damping=1.0, # langevin damping in 1/ps\n", " cv_atoms=np.unique(cv_atoms_phi+cv_atoms_psi) # specifying CV atoms significantly speeds up simulation of large systems, as the bias force will only be calculated for those\n", ")\n", "the_md.integrator.setConstraintTolerance(0.00001)\n", "\n", "# Append OpenMM reporters to simulation for output \n", "the_md.simulation.reporters.append(DCDReporter('alanine-dipeptide.dcd', 100))\n", "the_md.simulation.reporters.append(StateDataReporter(\n", " stdout, \n", " 100,\n", " step=True,\n", " time=True,\n", " potentialEnergy=True,\n", " kineticEnergy=True,\n", " totalEnergy=True,\n", " temperature=True,\n", " speed=False,\n", " separator='\\t')\n", ")" ] }, { "cell_type": "markdown", "id": "5d896eed-2476-4f5b-8b4e-d5e0e30d5a5e", "metadata": {}, "source": [ "The OPES-eABF sampling algorithm is attached to the OpenMM simulation." ] }, { "cell_type": "code", "execution_count": 4, "id": "a5a9fcef-7d41-42fb-be06-a66feb2c96da", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " >>> INFO: Initialize torsion as collective variable:\n", "\t Minimum0:\t-180.0 Degree\n", "\t Maximum0:\t180.0 Degree\n", "\t Bin width0:\t5.0 Degree\n", "\n", " >>> INFO: Initialize torsion as collective variable:\n", "\t Minimum1:\t-180.0 Degree\n", "\t Maximum1:\t180.0 Degree\n", "\t Bin width1:\t5.0 Degree\n", "\t----------------------------------------------\n", "\t Total number of bins:\t\t5184\n", "\n", " >>> INFO: OPES Parameters:\n", "\t ---------------------------------------------\n", "\t Kernel_std:\t[0.08726646 0.08726646]\n", "\t Rescaling:\tTrue\n", "\t Adaptive:\tFalse\t(5000 steps)\n", "\t Normalize:\tTrue\t(approximated: True)\n", "\t Explore:\tFalse\n", "\t Barrier:\t11.9503 kcal/mol\n", "\t Bias factor:\t20.045403709021212\n", "\t Read force:\tTrue\n", "\t Kernel merge:\tTrue\t(threshold: 1.0)\n", "\t ---------------------------------------------\n", " >>> INFO: Extended-system Parameters:\n", "\t ---------------------------------------------\n", "\t Coupling:\t[0.08726646 0.08726646]\n", "\t Masses:\t[100. 100.]\n", "\t ---------------------------------------------\n", " >>> INFO: ABF enabled for OPES-eABF (N_full=100)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ">>> adaptive-sampling: Module 'ase' not found, will not import 'FENEB'\n", "/home/robert/Bachelor_Thesis/Code/adaptive_sampling/adaptive_sampling/colvars/colvars.py:215: UserWarning: Using torch.cross without specifying the dim arg is deprecated.\n", "Please either pass the dim explicitly or simply use torch.linalg.cross.\n", "The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at /opt/conda/conda-bld/pytorch_1720538438750/work/aten/src/ATen/native/Cross.cpp:62.)\n", " self.cv = torch.atan2(torch.dot(torch.cross(q23_u, n1), n2), torch.dot(n1, n2))\n" ] } ], "source": [ "from adaptive_sampling.sampling_tools import OPESeABF\n", "\n", "eabf_ext_sigma = 5.0 # thermal width of coupling between CV and extended variables in degrees\n", "eabf_ext_mass = 100.0 # mass of extended variable \n", "abf_nfull = 100 # number of samples per bin when the ABF force is fully applied\n", "\n", "opes_kernel_std = [5.0, 5.0] # kernel standard deviations of Phi and Psi in degrees\n", "opes_frequency = 500 # frequency of kernel creation in MD steps\n", "opes_barrier = 50.0 # Barrier parameter in kJ/mol \n", "opes_adaptive = False # Adaptive standard deviation of kernels, useful for sampling along bad CVs\n", "opes_gamma = None # Bias factor for Well-Tempered distribution, if None, calculated from barrier parameter\n", "\n", "the_bias = OPESeABF(\n", " the_md, \n", " collective_var, \n", " # eABF parameters \n", " ext_sigma=eabf_ext_sigma,\n", " ext_mass=eabf_ext_mass,\n", " nfull=abf_nfull, \n", " # OPES parameters\n", " kernel_std=opes_kernel_std,\n", " update_freq=opes_frequency,\n", " bias_factor=opes_gamma,\n", " adaptive_std=opes_adaptive,\n", " energy_barr=opes_barrier,\n", " # general parameters\n", " output_freq=10, # frequency of writing outputs\n", " f_conf=0.0, # confinement force of CV at boundaries\n", " equil_temp=300.0, # equilibrium temperature of simulation\n", " periodicity=periodicity, # periodicity of CVs\n", " verbose=True, # print verbose output\n", ")\n", "the_md.set_sampling_algorithm(the_bias) # to take effect, the sampling algorithm has to be set in the AdaptiveSamplingOpenMM interface" ] }, { "cell_type": "code", "execution_count": 5, "id": "807819d2-caf4-4608-8a9f-c77d789015d1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#\"Step\"\t\"Time (ps)\"\t\"Potential Energy (kJ/mole)\"\t\"Kinetic Energy (kJ/mole)\"\t\"Total Energy (kJ/mole)\"\t\"Temperature (K)\"\n", "100\t0.20000000000000015\t-14.69729698585465\t81.81212348484868\t67.11482649899403\t385.87202990466926\n", "200\t0.4000000000000003\t-17.35861527983259\t85.32225493601415\t67.96363965618156\t402.4277858317756\n", "300\t0.6000000000000004\t-46.42447199266495\t87.90057858385677\t41.47610659119182\t414.58861160386914\n", "400\t0.8000000000000006\t-16.564945457376268\t77.15117856536602\t60.586233107989756\t363.888389818763\n", "500\t1.0000000000000007\t-42.760607527450546\t77.7586562795857\t34.99804875213515\t366.75359669426\n", "600\t1.2000000000000008\t-35.78178557575954\t43.68505452738508\t7.9032689516255346\t206.0433093403335\n", "700\t1.400000000000001\t-52.89746176077644\t69.58132670195477\t16.683864941178328\t328.1847070369183\n", "800\t1.6000000000000012\t-48.167972714775914\t76.24319766493682\t28.0752249501609\t359.6058407509813\n", "900\t1.8000000000000014\t-46.69024601386794\t76.35103121618545\t29.660785202317506\t360.11444448279667\n", "1000\t2.0000000000000013\t-46.917748252927744\t62.86855137403911\t15.950803121111363\t296.52347968158017\n" ] } ], "source": [ "# WARNING: For long simulations, this can become expensive, and it is recommended to perform the computation on an HPC cluster.\n", "if os.path.isfile('CV_traj.dat'):\n", " os.system(\"rm CV_traj.dat opeseabf.out\")\n", "the_md.run(nsteps=1000) # 500000 * 2 fs = 1 ns" ] }, { "cell_type": "markdown", "id": "f87064b7-2090-4a6a-a6b6-2b63370fc60a", "metadata": {}, "source": [ "## Analysis of Results\n", "\n", "### Visualize the trajectory with NGlView\n", "\n", "The following will create an interactive visualization of the trajectory:" ] }, { "cell_type": "code", "execution_count": 6, "id": "fd888874-3b8e-4ea6-8d6d-aa33a22a0805", "metadata": {}, "outputs": [], "source": [ "path = '.' \n", "traj = pt.iterload(f\"{path}/alanine-dipeptide.dcd\", top=\"../data/alanine-dipeptide.pdb\")\n", "\n", "view = ngl.show_pytraj(traj)\n", "view.clear_representations()\n", "view.add_ball_and_stick('@6,8,14,16', opacity=1.0) # CV atoms of phi torsion\n", "view.add_ball_and_stick('@4,6,8,14', opacity=1.0) # CV atoms of psi torsion\n", "#view.add_ball_and_stick('@1,4,6,8', opacity=1.0) # CV atoms of theta torsion\n", "\n", "view.add_licorice(opacity=0.5)\n", "#view" ] }, { "cell_type": "markdown", "id": "286ab551-b83a-44d0-980c-97a1adef21d5", "metadata": {}, "source": [ "### Trajectory of CVs\n", "\n", "All important information for post/processing of the trajectory is stored in `CV_traj.dat`." ] }, { "cell_type": "code", "execution_count": 7, "id": "0b24cd68-2155-4d1f-8bf3-2cca030ce8a1", "metadata": {}, "outputs": [], "source": [ "cv_traj = np.loadtxt(f'{path}/CV_traj.dat', skiprows=1)\n", "cv_phi = cv_traj[:,1] # Phi trajectory (first CV)\n", "cv_psi = cv_traj[:,2] # Psi trajectory (second CV))\n", "la_phi = cv_traj[:,3] # extended-system Phi trajectory\n", "la_psi = cv_traj[:,4] # extended-system Psi trajectory" ] }, { "cell_type": "markdown", "id": "06afb641-2e4b-4e49-93da-0d16d0f21551", "metadata": {}, "source": [ "In the following, the trajectory of the extended system is plotted for $\\Phi$ (blue) and $\\Psi$ (orange). The physical trajectory of the corresponding CVs is plotted in light colors, which is tightly coupled to the extended system. \n", "\n", "As the extended system trajectory is biased with OPES and ABF, the system immediatly starts diffusing along both reaction coordinates, such that in the long run the ($\\Phi,\\Psi$) plane is uniformely sampled. " ] }, { "cell_type": "code", "execution_count": null, "id": "7694614b-b08a-49aa-983e-4fa09e9f89cc", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "fig, axs = plt.subplots(1, 1, sharey=False, figsize=(8,6))\n", "axs.scatter(cv_traj[:,0]/1000, cv_phi, s=0.1, alpha=0.5)\n", "axs.scatter(cv_traj[:,0]/1000, la_phi, s=0.5, color='C0', linewidth=2, label='$\\Phi$')\n", "\n", "axs.scatter(cv_traj[:,0]/1000, cv_psi, s=0.1, alpha=0.5)\n", "axs.scatter(cv_traj[:,0]/1000, la_psi, s=0.5, color='C1', linewidth=2, label='$\\Psi$')\n", "\n", "axs.set_xlim([0,2])\n", "axs.set_ylim([-181,181])\n", "\n", "axs.set_yticks([-180,0,180])\n", "axs.set_xlabel('time / ps', fontsize=30)\n", "axs.set_ylabel('CV / Degree', fontsize=30)\n", "axs.tick_params(axis='y',length=6,width=3,labelsize=25, pad=10, direction='in')\n", "axs.tick_params(axis='x',length=6,width=3,labelsize=25, pad=10, direction='in')\n", "axs.spines['bottom'].set_linewidth(3)\n", "axs.spines['top'].set_linewidth(3)\n", "axs.spines['left'].set_linewidth(3)\n", "axs.spines['right'].set_linewidth(3)\n", "axs.legend(fontsize=20)\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "id": "8017486a-3c82-43cc-a3c0-920d0e7d9e2f", "metadata": {}, "source": [ "### Use the MBAR estimator to compute ensemble properties\n", "\n", "Now we will use the MBAR estimator to calculate the unbiased weights for the simulation frames. From those, periodic PMFs as well as other ensemble properties can be computed. Note that to converge PMFs, much longer trajectories are required!\n", "\n", "WARNING: For long simulations, this can become expensive, and it is recommended to perform the computation on an HPC cluster." ] }, { "cell_type": "code", "execution_count": null, "id": "b48d293e-7b0c-4b49-9b2b-04c6b930b37e", "metadata": {}, "outputs": [], "source": [ "from adaptive_sampling import units\n", "from adaptive_sampling.processing_tools import mbar\n", "\n", "ext_sigma = np.asarray([5.0,5.0])\n", "\n", "# create grid for PMF\n", "minimum = -180.0 \n", "maximum = 180.0\n", "bin_width = 5.0\n", "grid_1d = np.arange(minimum, maximum, bin_width)\n", "xx, yy = np.meshgrid(grid_1d, grid_1d)\n", "grid = np.vstack([xx.flatten(),yy.flatten()]) \n", "\n", "# trajectories of CVs and extended system\n", "cv = np.vstack([cv_phi,cv_psi])\n", "la = np.vstack([cv_phi,cv_psi])\n", "\n", "recalc = True\n", "if recalc:\n", " print(\"==================\")\n", " print(\" Running the MBAR \")\n", " print(\"==================\\t\")\n", " \n", " # run MBAR to obtain unbiased weights of frames\n", " print(\"\\nBuilding state windows from the continuous trajectory:\")\n", " traj_list, indices, meta_f = mbar.get_windows(\n", " grid.T,\n", " cv.T,\n", " la.T,\n", " ext_sigma,\n", " dx=np.asarray([bin_width,bin_width]),\n", " equil_temp=300.0,\n", " progress_bar=True,\n", " )\n", " print(\"\\nBuilding Boltzmann factors:\")\n", " exp_U, frames_per_traj = mbar.build_boltzmann(\n", " traj_list,\n", " meta_f,\n", " equil_temp=300.0,\n", " periodicity=[-180.,180.],\n", " progress_bar=True,\n", " )\n", " weights = mbar.run_mbar(\n", " exp_U,\n", " frames_per_traj,\n", " max_iter=10000,\n", " conv=1.0e-2, # usually 1.0e-4 \n", " conv_errvec=1.0,\n", " outfreq=10,\n", " device='cpu',\n", " )\n", " \n", " np.savez(f\"{path}/results.npz\", W=weights, idx=indices)\n", "else:\n", " data = np.load(f'{path}/results.npz')\n", " weights = data['weigths']\n", " indices = data['idx']" ] }, { "cell_type": "markdown", "id": "8f0b8847-1166-4b0d-b0e1-c633a8804e24", "metadata": {}, "source": [ "### Compute PMFs from frame weights" ] }, { "cell_type": "code", "execution_count": null, "id": "b4a1e552-6566-420f-aec7-1b2bd01d37af", "metadata": {}, "outputs": [], "source": [ "# 1D PMFs along phi and psi in kJ/mol\n", "pmf_psi, rho_psi = mbar.pmf_from_weights(grid_1d, cv_psi[indices], weights, equil_temp=300.0)\n", "pmf_phi, rho_phi = mbar.pmf_from_weights(grid_1d, cv_phi[indices], weights, equil_temp=300.0)\n", "pmf_psi -= pmf_psi.min()\n", "pmf_phi -= pmf_phi.min()\n", "\n", "# 2D (phi,psi) PMF (Ramachandran plot) in kJ/mol\n", "pmf_2d, rho = mbar.pmf_from_weights(\n", " grid.T,\n", " cv.T[indices],\n", " weights,\n", " dx=np.asarray([bin_width,bin_width]),\n", " equil_temp=300.0,\n", ")\n", "pmf_2d -= pmf_2d.min()\n", "pmf_2d *= units.kJ_to_kcal # convert to kcal/mol" ] }, { "cell_type": "markdown", "id": "c871e8d2-2d25-42b6-a909-225007eb647c", "metadata": {}, "source": [ "Below, an example is shown of how the PMF should evolve over the course of 2 ns, quickly converging for the full [$\\Phi,\\Psi$] plane. " ] }, { "cell_type": "markdown", "id": "c6bdc330-6be1-420b-8a59-84045024f429", "metadata": {}, "source": [ "![title](opeseabf_2D.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "e51b2374-aefb-4ef9-9037-45c645dacca0", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "hiwi", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.20" } }, "nbformat": 4, "nbformat_minor": 5 }