Use of executables for Windows or Linux is the easiest way to run SC algorithm because this does not require installation of any prerequisites. If you wish to try out the executable version, use the Google Drive link below to download archived program, test suite and examples of config file and input geometry.
For Windows: Google Drive link to executable
For Linux: Google Drive link to executable
Instructions in sections 1.2-1.3 are intended for users who wish to run rcrilib
as python-library and be able to modify any part of the conformational search algorithm.
However, explanations in section 1.4 are intended for all the users because these sections contain basic information about the algorithm, features of this implemetation and its limitations.
gcc; g++; gfortran; cmake; Python 3 with libararies numpy, scipy, networkx and rdkit (the last one is optional).
NumPy is essential for handling matricies and vectors. SciPy is required for kd-tree-based algorithm of detection of undesired overlaps of atoms. Networkx is needed for working with graphs. RDKit is used outside of the library intesting/smiles_to_mol.py
script for convenient generation of molecule's 3D geometry from its SMILES.
At this time the rcrilib
Python-library was tested out of the box on Linux machines only.
The core functionality of the library was implemented using Python. However, to generate cycle geometries it is necessary to execute the TLC algorithm, which is implemented in Fortran and C (see files tlc_src/*.f90
and tlc_src/sturm.c
). In order to pass input data to TLC from the main code and return the output, the C++ interface was realized for convenient data transfer between Python- and Fortran-programs (see tlc_src/main.cpp
). All the files in tlc_src
directory have to be compiled and linked into libtlc.so
library. Produced file has to be moved in rcrilib/Solvers
directory. All of these tasks were automated using CMake (see CMakeLists.txt
). We will explain how to use it in the following section.
- Clone the repository:
git clone https://github.com/nkrivoshchapov/RCRI.git
cd RCRI
- If you wish to use the library only from this directory, then execute the following
mkdir build && cd build
cmake ..
make
make install
cd ..
In this way, the library libtlc.so
will be built and copied into rcrilib/Solvers
. Which means that rcrilib
will become a functioning python-library loadable only from RCRI
directory.
If you wish to install this library as your Python-package for it to be accessable from any directory, then execute this instaed:
pip install .
This will build libtlc.so
, copy it into rcrilib/Solvers
and install the rcrilib
as Python-package.
- Check the correctness of the installation on test molecules:
cp testing/run_tests.py testing/default.ini .
python run_tests.py
This script generates a few conformations for each structure in testing
directory.
- (Optional) Install Anaconda and RDKit.
Python-library RDKit can be used to generate 3D geometry of a molecule from its SMILES, that can be taken from ChemDraw. So, input data can be quickly generated from molecule's drawing.
First, install conda by running instructions like (the commands below might be outdated):
wget https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
bash ./Anaconda3-2020.07-Linux-x86_64.sh
conda update --prefix /home/$USER/anaconda3 anaconda
And then install RDKit:
conda install -c conda-forge rdkit
The rcrilib
provides all necessary procedures to manupulate molecular geometry using inverse kinematics, but the algorithm of conformational search itself is stored in the driver-script outside of the library (see examples in confsearch_drivers
directory). Workflow is that user executes a driver-script that generates conformations by calling functions from rcrilib
.
1. Create an input MOL-file of the desired molecule, you want to run conformational search for.
In order to run conformational search, this method requires the connectivity of molecule (its graph), bond lengths and valence angles. However, data in XYZ is not enough, because it does not contain connectivity information. Therefore, MOL is chosen to be format for input geometry since these files (extension .sdf
or .mol
, see examples in directory testing
) contain all necessary information.
MOL-file of starting geometry can be drawn in any editor, like GaussView, ChemCraft, etc. Or, if you have carried out the instructions of step 4 in section 1.3, input data can be generated from molecule drawing. In order to do it, select the structure of interest in ChemDraw, press Ctrl+Alt+C to copy it as SMILES. Then run
cp testing/smiles_to_mol.py .
and change the 4th lines in the script by selecting the text in quotes and then pressing Ctrl+V. After that, input MOL-file can be generated by executing the script smiles_to_mol.py
: `
python smiles_to_mol.py
In order to keep some of the dihedral angles fixed during conformational search, open molfromsmiles.sdf
and set the type of corresponding bond to anything except single.
2. Set options of rcrilib
and conformational search driver in config-file.
The directory testing
already contains the config-file default.ini
as an example. Copy it in the directory from where you wish to run the conformational search script. The contents of config-file are given below:
[IK_All]
RunGeomCheck = True
MinDistance = 1.2
DoValidation = False
SmartAPS = True
[IK_Molecule]
DoValidation = True
[IK_FlappingSolver]
GenerationMode = intime
AngleThreshold = 10
[IK_TLCSolver]
AllowBondPerturbation = False
AllowAnglePerturbation = False
BondSDeviation = 0.1
AngleSDeviation = 10
[MCDriver]
InputFile = molfromsmiles.sdf
OutputFile = res_%%d.mol
MaxTries = 400
MaxIter = 100
NumberOfConfs = 10
PrintLevelConsole = warning
PrintLevelFile = warning
LogFile = montecarlo.log
AutoCleanup = True
[Testing]
NumberOfConfs = 5
TestMolecules = testing/*.sdf
Section IK_All
contains the parameters of rcrilib
-
RunGeomCheck
- whether to eliminate conformations with atomic overlaps. -
MinDistance
- the minimum distance between unbound atoms whenRunGeomCheck
is enabled. -
DoValidation
- whether to check if found conformations satisfy the geometric constrains (bond lengths, valence and dihedral angles). This feature is mainly for debugging and it is not compatible with neitherAllowBondPerturbation
or ``. -
SmartAPS
- performance-improving feature. Functions do not generate the conformations of the parts of the molecule again if their degrees of freedom have not changed since the last run.
Flags in the IK_All
section are accessible to all classes in the library. If it is desired to override the IK_All
parameter value for some particular class, an another section named as that class has to be created (see the IK_Molecule
section in the example above). In the example above the value of DoValidation
parameter was overwritten within IK_Molecule
class. As a result, geometry validation step is executed only for the final conformation, but not for intermediate structures on levels of IK_CyclicPart
, IK_GeomUnit
etc.
Some of options are targeted specifically for only one class. It is valid to set their values inside IK_All
section, however, these keys are set from section of their target class so that its easier to remember their purpose. Thus, IK_FlappingSolver
generates conformers by "flapping" atoms and bonds, this class has two associated parameters:
-
GenerationMode
-cached
orintime
- at which point "flapping" operations should be executed. Valueintime
sets library to generate conformers by "flapping" just in time when they are requered during iterations over degrees of freedom.intime
mode is more memory-efficient and more time-efficient in the short run and when the number of possible flapping operations is very large. Modecached
sets library to pregenerate and store conformations by iterating over all acceptable flapping combinations during initialization before these geometries are requested.cached
mode is faster in the long run but initialization may a lot of resources for large cycles with a lot of possibilities for "flapping". -
AngleThreshold
- parameter that determines the tolerance to valence angle distortion. Geometrically, flapping of a fragment of a cycle preserves valence angles if and only if bonds at the ends of the fragment are coplanar, meaning that dihedral angle on these four terminal atoms is zero. If this diheral angle deviates from zero, then valence angles will be distorted by flapping and the bigger the deviation - the larger the distortion.AngleThreshold
is the maximal absolute value of dihedral angle for which flapping is acceptable.
Class IK_TLCSolver
is responsible for generation of cycle geometries with TLC and it has 4 specific parameters associated with random perturbations in the lengths and bond angles during the generation of geometries in order to increase the variety of conformers:
-
AllowBondPerturbation
- whether to use random perturbations of bond lengths in cycles solved by TLC algorithm. This feature is not compatible withDoValidation
. -
AllowAnglePerturbation
- whether to use random perturbations of valence angles in cycles solved by TLC algorithm. -
BondSDeviation
- standard deviation of normally distributed perturbation in angstroms added to bond lengths. -
AngleSDeviation
- standard deviation of normally distributed perturbation in angles added to valence angles.
MCDriver
section contains settings of conformational search driver scripts (simple_mc.py
, recursive_mc.py
, ddof_mc.py
and run_tests.py
):
-
InputFile
- mol-file containing information about connectivity of the molecule, optimal bond lengths and valence angles.rcrilib
treats bonds of type "single" differently from all others - dihedral angles of single bonds are allowed to rotate and considered as degrees of freedom, while all others are fixed. This parameter is required by driver-scriptssimple_mc.py
,recursive_mc.py
andddof_mc.py
. -
OutputFile
- files with resulting geometries to be created (%%d will be substituted with count number). Extensions.mol
and.xyz
will be detected automatically. This parameter is required by driver-scriptssimple_mc.py
,recursive_mc.py
andddof_mc.py
. -
NumberOfConfs
- number of conformers to be found. This parameter is required by driver-scriptssimple_mc.py
,recursive_mc.py
andddof_mc.py
. -
MaxIter
- maximal number of attempts to generate a conformation. Program will make as many attempts as needed to reachNumberOfConfs
geometries ifMaxIter
is set to -1. -
MaxTries
- maximal allowed number of attempts to redefine the unsuccessfully chosen values of the degrees of freedom (eliminate VdW clash or make TLC to return non-zero number of solutions). The optimal value ofMaxTries
is larger for bigger molecules and vice versa. -
PrintLevelConsole
- parameter that controls the amount of output in console. Options:none
,error
,warning
,info
,debug
. -
PrintLevelFile
- parameter that controls the amount of output in log-file. Options:none
,error
,warning
,info
,debug
. In case if exception occurs, driver-script will automatically create a bugreport directory containing log-file and values of degrees of freedom. -
LogFile
- name of the log-file. -
AutoCleanup
- whether to remove log-file after successful termination of driver-script.
Section Testing
contains parameters of automated testing of the library on about 25 molecules (driver-script run_tests.py
).
-
TestMolecules
- input-files for testing. -
NumberOfConfs
- number of conformations to be generated for each test molecule.
For a routine conformational search run user should set the values of InputFile
, OutputFile
, NumberOfConfs
options in section MCDriver
and, optionally, MinDistance
in section IK_All
.
3. Execute the driver-script.
Directory confsearch_drivers
contains 3 examples of driver-scripts (difference between them is explained in section 2.2). For instance, to apply recursive_MC.py
one should run the following commands:
cp confsearch_drivers/recursive_MC.py .
python recursive_MC.py
Settings will be loaded from config-file of default name ./default.ini
or any other file, if its name was passed as an argument (for example, python recursive_MC.py myconfig.cfg
).
Limitations of this algorithm are shown on the figure below:
Papers on the TLC algorithm:
Coutsias E.A., Seok C., Jacobson M.P., Dill K.A. A kinematic view of loop closure // Journal of Computational Chemistry. – 2004. – Vol. 25, № 4. – P. 510–528. – DOI: https://doi.org/10.1002/jcc.10416
Coutsias E.A., Seok C., Wester M.J., Dill K.A. Resultants and loop closure // International Journal of Quantum Chemistry. – 2006. – Vol. 106, № 1. – P. 176–189. – DOI: https://doi.org/10.1002/qua.20751
Link to open-source implementation of TLC:
https://math.unm.edu/~vageli/codes/loop-closure.tar.gz
MacroModel testset (structures testing/mm*.sdf
)
Watts K.S., Dalal P., Tebben A.J., Cheney D.L., Shelley J.C. Macrocycle Conformational Sampling with MacroModel // J. Chem. Inf. Model. – American Chemical Society, 2014. – Vol. 54, No 10. – P. 2680–2696.
Earlier mentions of corner flapping and edge flipping procedures:
Goto H., Osawa E. Corner flapping: a simple and fast algorithm for exhaustive generation of ring conformations // J. Am. Chem. Soc. – American Chemical Society, 1989. – Vol. 111, № 24. – P. 8950–8951. – DOI: https://doi.org/10.1021/ja00206a046
Gotō H., Ōsawa E. An efficient algorithm for searching low-energy conformers of cyclic and acyclic molecules // J. Chem. Soc., Perkin Trans. 2. – The Royal Society of Chemistry, 1993. – № 2. – P. 187–198. – DOI: https://doi.org/10.1039/P29930000187