Command-line analysis routines of Molecular Dynamics data with aggregating molecules.
Here are the steps to install cluster_tools in ~/programs
, but you can of course change the installation directory.
-
Clone this repository on your computer.
mkdir -p ~/programs cd ~/programs git clone https://github.com/ahardiag/cluster_tools_md
-
Use a python environment to install the package on it, e.g. using
conda
:conda create -n cluster_tools_md python=3.10
-
Build the package using
pip
:cd ~/programs pip install ./cluster_tools_md
clustsize -h
eccentricity -h
multirdf -h
radialdens -h
To retrieve the same conda environment used during the development, use :
conda env create --file "~/programs/cluster_tools/cluster_tools_md.yml" -n cluster_tools_md
conda activate cluster_tools_md
Computes the size distributions of atomic clusters in gas phase for a set of MD trajectories.
This is equivalent to gmx clustsize
in GROMACS, but provides more options and editable outputs.
usage: clustsize [-h] -i INPUTDIR -m RESNAME [-o OUTDIRNAME]
[-r REFFILE] [-f TRJFILE] [-b BEGIN] -e END [-v]
--mins MINS --eps EPS [--with-distance-matrix]
[--with-double-labelling] [--ignore-warnings]
[--pattern-dirs PATTERN_DIRS]
Find all aggregate clusters for a set of MD trajectories.
For selection commands, see MDAnalysis doc :
https://docs.mdanalysis.org/stable/documentation_pagses/selections.html
optional arguments:
-h, --help show this help message and exit
-i INPUTDIR, --inputdir INPUTDIR
directory with all trajectories to analyse
-m RESNAME, --resname RESNAME
Resname or molecule name. e.g: HC8
-o OUTDIRNAME, --outdirname OUTDIRNAME
Output Sub-directory name.
-r REFFILE, --reffile REFFILE
Commmon name of reference file in all simulation directories.
-f TRJFILE, --trjfile TRJFILE
Commmon name of trajectory file in all simulation directories.
-b BEGIN, --begin BEGIN
Start frame of analysis.
-e END, --end END Stop frame of analysis.
-v, --verbose Turn on for more details during the calculation.
--mins MINS Minimum number of points (atoms) to define a cluster.
--eps EPS Maximum distance between two points (atoms) in the same cluster.
--with-distance-matrix
Use total matrix distance to treat periodic boundary conditions. Slow for large systems!
--with-double-labelling
Use double labelling method to perform clustering analysis on two shifted set of positions. Allow to search for clusters close to the box boundaries.
--ignore-warnings Avoid stdout all warnings from using clustering on sparse matrix.
--pattern-dirs PATTERN_DIRS
String pattern to select a serie of data directories with similar pattern in the input directory.
The script generates three CSV files located in OUTDIRNAME
directory (by default : output
) and these data are easily editables using, for instance, the DataFrame format from the pandas library.
-
cluster_histo.csv : size indices in columns and
$simulation$ key, aggregation number$n$ and size distribution$P(n)$ in rows. -
cluster_properties : time (in ns) in columns and
$simulation$ key, number of clusters$nclust$ and number of molecules in the biggest cluster$maxclust$ in rows. -
cluster_resids.csv : time (in ns) in columns and
$simulation$ key and$resid$ indices in rows. For a given simulation, time and resid index one have stored the label (positive or ) of the cluster where is found the molecule.
More insights of the output data is accesible in tests
directory.
Run sequentially Radial Distribution Function analysis for a set of MD trajectories.
First you need to create a config file with some default parameters, in a file on the root location : ~/.runRDFrc
:
[DefaultParameters]
BEGIN= 100 # Start time in ns for the analysis
END= 800
MAX= 20 # Maximum radial distance (in
# Angstrom)
SUB= 2 # Number of time intervals to
# divide the
# Trajectory total time
NBINS= 200 # Number of points for one RDF
# trace
PATH= ../Data_traj/ # Relative path to data from the
# current directory
EXCLUSION_BLOCK= (1,1)
Then you need to specify the parameters you want to change for each analysis in a input file, e.g. parameters.in
:
TASK SIM OUTDIRNAME FREQ SEL1 EXCLUSION_BLOCK
taks1 U01_3ch run0 1000 "resname U01 and name N12" (1,1)
task2 U02_3ch run0 1000 "resname U02 and name N4" (1,1)
Then you just have to run the main script in a directory where you want to store results:
multirdf
Compute the eccentricity of each cluster found after using clustsize
command.
usage: eccentricity [-h] -s SIM -i INPUTDIR -c CSVDIR [--nbins NBINS] [-m RESNAME] [-o OUTDIRNAME]
[--outfigdirname OUTFIGDIRNAME] [-r FILEREF] [-f FILETRJ]
[--method {double_loop,best_centering}] [--compound {residues,atoms}]
[--rename RENAME] [--in_memory] [-z SIZE [SIZE ...]] [--range] [-v]
Compute Spherical Radial Density around cluster COM.
options:
-h, --help show this help message and exit
-s SIM, --sim SIM Name of the Data trajectory.
-i INPUTDIR, --inputdir INPUTDIR
directory with all trajectories to analyse.
-c CSVDIR, --csvdir CSVDIR
directory with the dataframe `cluster_resids.csv`.
--nbins NBINS Number of bins of the density distribution.
-m RESNAME, --resname RESNAME
Resname or molecule name. e.g: HC8.
-o OUTDIRNAME, --outdirname OUTDIRNAME
Output Sub-directory name.
--outfigdirname OUTFIGDIRNAME
Output Sub-directory name for figures.
-r FILEREF, --fileref FILEREF
Commmon name of reference file in all
simulation directories.
-f FILETRJ, --filetrj FILETRJ
Commmon name of trajectory file in all
simulation directories.
--method {double_loop,best_centering}
Choose the method for computing the inertia tensor.
--compound {residues,atoms}
Choose the level at which is computed the inertia tensor.
`residues` is faster as it only consider the center of mass of the molecules.
--rename RENAME Set to True to consider sim name with no suffix `Data_`.
--in_memory Charge the whole trajectory in RAM memory.
Faster than reading from file.
-z SIZE [SIZE ...], --size SIZE [SIZE ...]
Choose specific size(s) of clusters.
--range Range of values provided by three values `start end step`by -z/--size option.
-v, --verbose Turn on for more details during the
calculation.
Compute the radial density of an atom group around each cluster, depending on its size.
usage: radialdens [-h] -s SIM -z SIZE [SIZE ...] [--range] -i INPUTDIR -c CSVDIR [--nbins NBINS]
[-m RESNAME] [-o OUTDIRNAME] [--outfigdirname OUTFIGDIRNAME] [-r FILEREF]
[-f FILETRJ] [--rename RENAME] [--in_memory] [-v]
Compute Spherical Radial Density around cluster COM.
options:
-h, --help show this help message and exit
-s SIM, --sim SIM Name of the Data trajectory.
-z SIZE [SIZE ...], --size SIZE [SIZE ...]
Size of the cluster.
--range Range of values provided by three values `start end step`by -z/--size option.
-i INPUTDIR, --inputdir INPUTDIR
directory with all trajectories to analyse.
-c CSVDIR, --csvdir CSVDIR
directory with the dataframe `cluster_resids.csv`.
--nbins NBINS Number of bins of the density distribution.
-m RESNAME, --resname RESNAME
Resname or molecule name. e.g: HC8.
-o OUTDIRNAME, --outdirname OUTDIRNAME
Output Sub-directory name.
--outfigdirname OUTFIGDIRNAME
Output Sub-directory name for figures.
-r FILEREF, --fileref FILEREF
Commmon name of reference file in all
simulation directories.
-f FILETRJ, --filetrj FILETRJ
Commmon name of trajectory file in all
simulation directories.
--rename RENAME Set to True to consider sim name with no suffix `Data_`.
--in_memory Charge the whole trajectory in RAM memory.
Faster than reading from file.
-v, --verbose Turn on for more details during the
calculation.
A set of scripts (test1.sh
,test2.sh
,...) are given as examples and control tests, it processes the data from the article cited in section Citation.
Run the two first tests
cd ~/programs/cluster_tools_md/tests/
./run_tests.sh test1 test2
If you use cluster_tools in your research, please cite the following article:
[coming soon]