In this entry I'd like to document something that I get asked by few people and for which I assembled a script several years ago. I'd like to put it here for everyone to use (and perhaps for some to contribute, just a wish).
In the past I've worked on something called "Path Collective Variables" (PCVs) which are some useful descriptors that are used in enhanced sampling (umbrella sampling, metadynamics, steered-MD and others).
The essence of the PCVs is that if you have a trajectory, being it true or fictitious, of a reactive event you can use this as a blueprint, a sort of "track" and ask the system to walk over it using some enhanced sampling method so that you will be able to measure the thermodynamic force along it and calculate the free energy.
Getting a rough reactive trajectory can be simple in many cases. One can put a harmonic restraint on the coordinate of your target (say "state B") structure and pull the system toward it with some beastly method like Steered MD (SMD), dragging it from your initial state "A" to state "B". What you will get is probably a reactive path which is not great but it is a start, and sometimes you can also improve on it rather easily (something me and Alessio Lodola have shown in the past).
PCVs need a "template path" to draw the initial track. So what is needed once you have the reactive trajectory is a set of snapshot which are equally spaced in the PCVs metrics.
If you're using SMD most of the time you're in luck and the snapshots are almost equally spaced between the state "A" (the starting structure) and state "B" (the target structure), but they're not homogeneously spaced in the metrics used by PCVs and this is a sort of requirement for the most naive version of PCVs (this is not the case with Bernd Ensing geometric path formulation for example but there are other implications there).
The reason why the snapshots are inhomogeneously spaced may be due to the fact that you've been steering for example with a distance between two centers of mass but the PCVs you want to use are defined say in RMSD or contact map.
You may ask "If I can do everything with a distance why should I bother doing it with PCVs which are complex and tedious to setup". Good question.
My personal answer is that when restraining a distance you can control only that degree-of-freedom (DOF). With PCVs you can restrain the change of multiple coordinated degrees of freedom at the same time and therefore have a thermodynamic view on a specific realization of a given event.
This is an overkill when you have a single pathway between two states, but if you suspect you have multiple pathways, once you manage to harvest them (by multiple SMD for example), then you can use PCVs to calculate the barrier of each individual pathway without the need of finding a specific set of CVs per each pathway that would be able to drive each of them (because, if you're unlucky as I am, you may need different CVs to drive different pathways :-/ )
A way to homogenize the snapshot was formulated a while ago by Luca Maragliano and Eric Vanden-Eijnden in the context of the "string method" and I took inspiration from them to do put this together.
Assume I want to extract N "milestones" to form my PCVs. What I do here is first read a PDB file containing the reactive trajectory. For the sake of the discussion, s uppose that a representation of your initial set of trajectory snapshot is like this (here idealized as dots)
The line represent the interpolation between two frames acquired at consecutive times. One can get this by doing linear interpolation after optimal alignment. We'll see later how this is used.
Now, let's say you want a set of 6 equally distributed milestones for your PCVs. You first calculate the "total distance" as a sum of the red segments and you divide by 5 (6 snapshots correspond to 5 segments) and you calculate the "ideal" location of these along the red path.
Since I don't know how much I need to move, I just record the direction of motion and move for an arbitrary step I decide upfront (yup, this means there's a parameter involved). Note that I don't move the milestones at the edge (dark blue) otherwise the string will contract indefinitely.
./rmsd_mc.sh -f trajframes.pdb -r 20 -t 1
DELTA from 0 to 1 : 0.1592 DELTA from 1 to 2 : 0.1083 DELTA from 2 to 3 : 0.1312 DELTA from 3 to 4 : 0.1455 DELTA from 4 to 5 : 0.1191 DELTA from 5 to 6 : 0.1723 DELTA from 6 to 7 : 0.3124 DELTA from 7 to 8 : 0.2749 DELTA from 8 to 9 : 0.2412 DELTA from 9 to 10 : 0.2464 DELTA from 10 to 11 : 0.1935 DELTA from 11 to 12 : 0.1608 DELTA from 12 to 13 : 0.2007 DELTA from 13 to 14 : 0.1774 DELTA from 14 to 15 : 0.2200 DELTA from 15 to 16 : 0.0971 DELTA from 16 to 17 : 0.1091 DELTA from 17 to 18 : 0.1365 DELTA from 18 to 19 : 0.0907 HARMONIC_ENERGY 0.6430 ****************************** >>>> MSD 0.03384385352114148 Ang^2 >>>> MSD_STDEV 0.023862404602339612 Ang^2 >>>> LAMBDA 67.9591642412482 [Ang ^ -2] >>>> SAFE_LIMIT 39.8570289391908 [Ang ^ -2] >>>> Note: if you use gromacs you have to increase these values of 100 since units are [ nm ^-2 ] >>>> Note2: LAMBDA and SAFE_LIMIT are truly different your matrix could be messed up: check MSD_MATRIX FILE ******************************
./rmsd_mc.sh -f trajframes.pdb -r 20 -t 1 -n 10 -s 0.1. | grep HARMONIC_ENERGY
HARMONIC_ENERGY 0.6511 HARMONIC_ENERGY 0.6430 HARMONIC_ENERGY 0.6357 HARMONIC_ENERGY 0.6225 HARMONIC_ENERGY 0.6128 HARMONIC_ENERGY 0.6050 HARMONIC_ENERGY 0.5978 HARMONIC_ENERGY 0.5923 HARMONIC_ENERGY 0.5880 HARMONIC_ENERGY 0.5838 HARMONIC_ENERGY 0.5797 HARMONIC_ENERGY 0.5763 HARMONIC_ENERGY 0.5736 HARMONIC_ENERGY 0.5708 HARMONIC_ENERGY 0.5674 HARMONIC_ENERGY 0.5645 HARMONIC_ENERGY 0.5624 HARMONIC_ENERGY 0.5610 HARMONIC_ENERGY 0.5604 HARMONIC_ENERGY 0.5605 HARMONIC_ENERGY 0.5609
DELTA from 0 to 1 : 0.1406 DELTA from 1 to 2 : 0.1035 DELTA from 2 to 3 : 0.1742 DELTA from 3 to 4 : 0.1386 DELTA from 4 to 5 : 0.1333 DELTA from 5 to 6 : 0.1873 DELTA from 6 to 7 : 0.2155 DELTA from 7 to 8 : 0.2389 DELTA from 8 to 9 : 0.2429 DELTA from 9 to 10 : 0.2279 DELTA from 10 to 11 : 0.1969 DELTA from 11 to 12 : 0.1740 DELTA from 12 to 13 : 0.1921 DELTA from 13 to 14 : 0.1816 DELTA from 14 to 15 : 0.1569 DELTA from 15 to 16 : 0.1119 DELTA from 16 to 17 : 0.1172 DELTA from 17 to 18 : 0.1210 DELTA from 18 to 19 : 0.0957 HARMONIC_ENERGY 0.5609 ****************************** >>>> MSD 0.029523467304599694 Ang^2 >>>> MSD_STDEV 0.015375323992993602 Ang^2 >>>> LAMBDA 77.90412881625407 [Ang ^ -2] >>>> SAFE_LIMIT 51.22632332695527 [Ang ^ -2] >>>> Note: if you use gromacs you have to increase these values of 100 since units are [ nm ^-2 ] >>>> Note2: LAMBDA and SAFE_LIMIT are truly different your matrix could be messed up: check MSD_MATRIX FILE ******************************
Comments
Post a Comment