/geodesic-interpolate

Interpolation of molecular geometries through geodesics in redundant internal coordinate hyperspace for complex transformations

Primary LanguagePython

Geodesic interpolations of reaction pathways

Constructing interpolation paths between molecular geometries to obtain reaction path.

Traditional interpolation methods encounter difficulty when it comes to redundant internal coordinate spaces, because the feasible physical space compose only a very small and highly curved subspace. In this method, we avoid the problem of feasibility by operating strictly in feasible space, and bring in the benefit of internal coordinates through proper application of the corresponding metric tensor. With this new formulation, we view the configuration space as a Riemannian manifold with a metric generated from a set of internal coordinates. The interpolation paths are defined as geodesic curves on such manifolds. In other words the integrated total coordinate change is minimized. Such a definition ensures that the constructed paths are smooth and well behaved. The package is also used for smoothing discontinuous or noisy trajectories obtained from MD simulations.

It has been shown that the method generate smooth paths with reasonable barrier height even for highly complex reactions, such as protein unfolding or concerted cycloaddition reactions with many simutaneous ring formations. The default coordinate system uses Morse scaled pair-wise distances. The lengths in such coordinate systems have the physical meaning of the total number of bond changes along the path.

This is a pure python implementation, so it is not optimized for speed, but rather is intended to serve as a reference implementation of the algorithms described in the paper. Still, interpolating systems with ~1000 atoms should not be a problem.

Directory Structure

  • geodesic_interpolate Python package for interpolation and smoothing by finding geodesic curves with redundant internal metrics
    • __init__.py Python package file
    • __main__.py Standalone script for performing interpolation and smoothings.
    • geodesic.py Computation and minimization of path length in redundant internal metrics. This is used to optimize a path way to find a geodesic. Cannot change the number of images during optimization
    • interpolation.py Generating approximate interpolation points to be used as starting guess for the geodesic optimizations. Recursively attempt to perform bisections on largest segment in internal space. Generally will create path with discontinuities, so a smoothing process need to follow this.
    • coord_utils.py Coordinate utilities. A simplified version of Nanoreactor coordinates module which provide a scaled interatomic distance coordinate, pair screening based on threshold and connectivity, as well as trans-rotational alignment based on Kabsch algorithm.
    • fileio.py XYZ file reading and writing
  • setup.py Installation script. This will install both the Python package, which can be imported by name geodesic_interpolate and the standalone script, which is also named geodesic_interpolate
  • test_cases A few test cases used to check the performance of the code. Note that the large ones may take a few minutes thanks to Python. Especially need to run them when testing out alternative coordinate scaling methods.
    • H+CH4_CH3+H2.xyz A simple test case. Should always work
    • DielsAlder.xyz Dehydro-Diels-Alder reaction. This is an important test case because the initial structure is planar symmetric and could access both the final structure and its mirror image, which as exactly the same internal coordinates. Proper monitoring of geodesic length during the interpolation process is therefore crucial for this to work, without which the raw interpolated path will jump between mirror images and the optimized path would contain some flopping. Also tests the non-sweeping global path optimization algo in a relatively large system.
    • TrpCage_unfold.xyz Unfolding a Trp-cage mini-protein. Need at least 10 images to work. Folded geometry taken from Stefan''s test directory which is instead taken from Nathan. Unfolded structure is generated by force navigated MD for 1ns under 1nN on C and N terminal with ReaxFF, then optimizing without force at 6-31g/b3lyp level. For testing many simultaneous large amplitude motions.
    • collagen.xyz Interpolate between the solution and crystal structure of collagen Kunitz domain 1kun - 1kth. Solvents removed from the PDB structures. Tests collision avoidance for large movements in the core of folded protein. Other groups should slightly breath to create room for the part that changes.
    • calcium_binding.xyz Binding two Ca2+ ions to the yeast Calmodulin N terminal domain 1f54 -1f55. The apo structure did not have ions so two of which were added by hand at random locations away from the protein. It seem to be hard to avoid large movements of the Ca2+ cations when they are very far away from the protein but once in contact they should move smoothly. This is to test if the interpolater can correctly route and find the entry point and connect the path.

Prerequisites

Python : >2.7 or >3.5

Numpy : Tested with 1.13

Scipy : Tested with 0.19

Installation

The package can be used without installation from the package directory with

python -m geodesic_interpolate filename ...

To use the script from arbitrary location or to use the Python module, use setup tools:

python setup.py install

This will install a Python package geodesic_interpolate and a standalone script geodesic_interpolate. The package can be involved from arbitrary location using the aforementioned command line after installation, and a standalone script with the same signature can also be used

geodesic_interpolate filename.xyz --output output.xyz --nimages 20 ...

Usage

usage: geodesic_interpolate [-h] [--nimages NIMAGES] [--sweep] [--no-sweep] [--output OUTPUT] [--tol TOL] [--maxiter MAXITER] [--microiter MICROITER] [--scaling SCALING] [--friction FRICTION] [--dist-cutoff DIST_CUTOFF] [--logging {DEBUG,INFO,WARNING,ERROR}] [--save-raw SAVE_RAW] filename

Interpolates between two geometries

positional arguments:

  • filename XYZ file containing geometries. If the number of images is smaller than the desired number, interpolation points will be added. If the number is greater, subsampling will be performed.

optional arguments:

  • -h, --help show this help message and exit
  • --nimages NIMAGES Number of images. (default: 17)
  • --sweep Sweep across the path optimizing one image at a time, instead of moving all images at the same time. Default is to perform sweeping updates if there are more than 30 atoms. (default: None)
  • --no-sweep Do not perform sweeping. (default: None)
  • --output OUTPUT Output filename. Default is interp.xyz (default: interpolated.xyz)
  • --tol TOL Convergence tolerance (default: 0.002)
  • --maxiter MAXITER Maximum number of minimization iterations (default: 15)
  • --microiter MICROITER Maximum number of micro iterations for sweeping algorithm. (default: 20)
  • --scaling SCALING Exponential parameter for morse potential (default: 1.7)
  • --friction FRICTION Size of friction term used to prevent very large change of geometry. (default: 0.01)
  • --dist-cutoff DIST_CUTOFF Cut-off value for the distance between a pair of atoms to be included in the coordinate system. (default: 3)
  • --logging {DEBUG,INFO,WARNING,ERROR} Logging level to adopt [ DEBUG | INFO | WARNING | ERROR ] (default: INFO)
  • --save-raw SAVE_RAW When specified, save the raw path after bisections be before smoothing. (default: None)