
Command-line analysis routines of Molecular Dynamics data with aggregating molecules.

Primary LanguagePythonOtherNOASSERTION


Command-line analysis routines of Molecular Dynamics data with aggregating molecules.


Here are the steps to install cluster_tools in ~/programs, but you can of course change the installation directory.

  1. Clone this repository on your computer.

    mkdir -p ~/programs
    cd ~/programs
    git clone https://github.com/ahardiag/cluster_tools_md
  2. Use a python environment to install the package on it, e.g. using conda :

     conda create -n cluster_tools_md python=3.10
  3. Build the package using pip :

    cd ~/programs
    pip install ./cluster_tools_md


clustsize -h
eccentricity -h
multirdf -h
radialdens -h


To retrieve the same conda environment used during the development, use :

conda env create --file "~/programs/cluster_tools/cluster_tools_md.yml" -n cluster_tools_md
conda activate cluster_tools_md


Computes the size distributions of atomic clusters in gas phase for a set of MD trajectories. This is equivalent to gmx clustsize in GROMACS, but provides more options and editable outputs.


usage: clustsize [-h]    -i INPUTDIR -m RESNAME [-o OUTDIRNAME]
                        [-r REFFILE] [-f TRJFILE] [-b BEGIN] -e END [-v]
                        --mins MINS --eps EPS [--with-distance-matrix]
                        [--with-double-labelling] [--ignore-warnings]
                        [--pattern-dirs PATTERN_DIRS]

Find all aggregate clusters for a set of MD trajectories.
For selection commands, see MDAnalysis doc :

optional arguments:
  -h, --help            show this help message and exit
  -i INPUTDIR, --inputdir INPUTDIR
                        directory with all trajectories to analyse
  -m RESNAME, --resname RESNAME
                        Resname or molecule name. e.g: HC8
  -o OUTDIRNAME, --outdirname OUTDIRNAME
                        Output Sub-directory name.
  -r REFFILE, --reffile REFFILE
                        Commmon name of reference file in all simulation directories.
  -f TRJFILE, --trjfile TRJFILE
                        Commmon name of trajectory file in all simulation directories.
  -b BEGIN, --begin BEGIN
                        Start frame of analysis.
  -e END, --end END     Stop frame of analysis.
  -v, --verbose         Turn on for more details during the calculation.
  --mins MINS           Minimum number of points (atoms) to define a cluster.
  --eps EPS             Maximum distance between two points (atoms) in the same cluster.
                        Use total matrix distance to treat periodic boundary conditions. Slow for large systems!
                        Use double labelling method to perform clustering analysis on two shifted set of positions. Allow to search for clusters close to the box boundaries.
  --ignore-warnings     Avoid stdout all warnings from using clustering on sparse matrix.
  --pattern-dirs PATTERN_DIRS
                        String pattern to select a serie of data directories with similar pattern in the input directory.


The script generates three CSV files located in OUTDIRNAME directory (by default : output) and these data are easily editables using, for instance, the DataFrame format from the pandas library.

  • cluster_histo.csv : size indices in columns and $simulation$ key, aggregation number $n$ and size distribution $P(n)$ in rows.
  • cluster_properties : time (in ns) in columns and $simulation$ key, number of clusters $nclust$ and number of molecules in the biggest cluster $maxclust$ in rows.
  • cluster_resids.csv : time (in ns) in columns and $simulation$ key and $resid$ indices in rows. For a given simulation, time and resid index one have stored the label (positive or ) of the cluster where is found the molecule.

More insights of the output data is accesible in tests directory.


Run sequentially Radial Distribution Function analysis for a set of MD trajectories.


First you need to create a config file with some default parameters, in a file on the root location : ~/.runRDFrc:

    BEGIN=              100             # Start time in ns for the analysis
    END=                800
    MAX=                20              # Maximum radial distance (in
                                        # Angstrom)
    SUB=                2               # Number of time intervals to
                                        # divide the
                                        # Trajectory total time 
    NBINS=              200             # Number of points for one RDF 
                                        # trace        
    PATH=               ../Data_traj/   # Relative path to data from the
                                        # current directory 
    EXCLUSION_BLOCK=    (1,1)           

Then you need to specify the parameters you want to change for each analysis in a input file, e.g. parameters.in:

TASK  SIM     OUTDIRNAME  FREQ   SEL1                       EXCLUSION_BLOCK   
taks1 U01_3ch run0        1000  "resname U01 and name N12"  (1,1)
task2 U02_3ch run0        1000  "resname U02 and name N4"   (1,1)

Then you just have to run the main script in a directory where you want to store results:



Compute the eccentricity of each cluster found after using clustsize command.

usage: eccentricity [-h] -s SIM -i INPUTDIR -c CSVDIR [--nbins NBINS] [-m RESNAME] [-o OUTDIRNAME]
                    [--outfigdirname OUTFIGDIRNAME] [-r FILEREF] [-f FILETRJ]
                    [--method {double_loop,best_centering}] [--compound {residues,atoms}]
                    [--rename RENAME] [--in_memory] [-z SIZE [SIZE ...]] [--range] [-v]

Compute Spherical Radial Density around cluster COM.

  -h, --help            show this help message and exit
  -s SIM, --sim SIM     Name of the Data trajectory.
  -i INPUTDIR, --inputdir INPUTDIR
                        directory with all trajectories to analyse.
  -c CSVDIR, --csvdir CSVDIR
                        directory with the dataframe `cluster_resids.csv`.
  --nbins NBINS         Number of bins of the density distribution.
  -m RESNAME, --resname RESNAME
                        Resname or molecule name. e.g: HC8.
  -o OUTDIRNAME, --outdirname OUTDIRNAME
                        Output Sub-directory name.
  --outfigdirname OUTFIGDIRNAME
                        Output Sub-directory name for figures.
  -r FILEREF, --fileref FILEREF
                        Commmon name of reference file in all 
                        simulation directories.
  -f FILETRJ, --filetrj FILETRJ
                        Commmon name of trajectory file in all 
                        simulation directories.
  --method {double_loop,best_centering}
                        Choose the method for computing the inertia tensor.
  --compound {residues,atoms}
                        Choose the level at which is computed the inertia tensor.
                        `residues` is faster as it only consider the center of mass of the molecules.
  --rename RENAME       Set to True to consider sim name with no suffix `Data_`.
  --in_memory           Charge the whole trajectory in RAM memory.
                        Faster than reading from file.
  -z SIZE [SIZE ...], --size SIZE [SIZE ...]
                        Choose specific size(s) of clusters.
  --range               Range of values provided by three values `start end step`by -z/--size option.
  -v, --verbose         Turn on for more details during the 


Compute the radial density of an atom group around each cluster, depending on its size.

usage: radialdens [-h] -s SIM -z SIZE [SIZE ...] [--range] -i INPUTDIR -c CSVDIR [--nbins NBINS]
                  [-m RESNAME] [-o OUTDIRNAME] [--outfigdirname OUTFIGDIRNAME] [-r FILEREF]
                  [-f FILETRJ] [--rename RENAME] [--in_memory] [-v]

Compute Spherical Radial Density around cluster COM.

  -h, --help            show this help message and exit
  -s SIM, --sim SIM     Name of the Data trajectory.
  -z SIZE [SIZE ...], --size SIZE [SIZE ...]
                        Size of the cluster.
  --range               Range of values provided by three values `start end step`by -z/--size option.
  -i INPUTDIR, --inputdir INPUTDIR
                        directory with all trajectories to analyse.
  -c CSVDIR, --csvdir CSVDIR
                        directory with the dataframe `cluster_resids.csv`.
  --nbins NBINS         Number of bins of the density distribution.
  -m RESNAME, --resname RESNAME
                        Resname or molecule name. e.g: HC8.
  -o OUTDIRNAME, --outdirname OUTDIRNAME
                        Output Sub-directory name.
  --outfigdirname OUTFIGDIRNAME
                        Output Sub-directory name for figures.
  -r FILEREF, --fileref FILEREF
                        Commmon name of reference file in all 
                        simulation directories.
  -f FILETRJ, --filetrj FILETRJ
                        Commmon name of trajectory file in all 
                        simulation directories.
  --rename RENAME       Set to True to consider sim name with no suffix `Data_`.
  --in_memory           Charge the whole trajectory in RAM memory.
                        Faster than reading from file.
  -v, --verbose         Turn on for more details during the 


A set of scripts (test1.sh,test2.sh,...) are given as examples and control tests, it processes the data from the article cited in section Citation.


Run the two first tests

cd ~/programs/cluster_tools_md/tests/
./run_tests.sh test1 test2


If you use cluster_tools in your research, please cite the following article:

[coming soon]