The commonnn
Python package provides a flexible interface to use the common-nearest-neighbour (CommonNN) clustering procedure. While the method can be applied to arbitrary data, this implementation was made before the background of processing trajectories from Molecular Dynamics (MD) simulations. In this context the cluster result can serve as a suitable basis for the construction of a core-set Markov-state (cs-MSM) model to capture the essential dynamics of the underlying molecular processes.
The package provides a main module:
cluster
: User interface to (hierarchical) CommoNN clustering
Further, it contains among others the modules:
plot
: Convenience functions to evaluate cluster results_types
: Direct access to generic types representing needed cluster components_fit
: Direct access to generic clustering procedures
Features:
- Flexible: The clustering can be done for data sets in different input formats. Internal parts of the procedure can be exchanged. Interfacing with external methods is made easy.
- Convenient: Integration of functionality, which may be handy in the context of MD data analysis.
- Fast: Core functionalities have been implemented in Cython.
Please refer to the following papers for the scientific background (and consider citing if you find the method useful):
- B. Keller, X. Daura, W. F. van Gunsteren J. Chem. Phys., 2010, 132, 074110.
- O. Lemke, B.G. Keller J. Chem. Phys., 2016, 145, 164104.
- O. Lemke, B.G. Keller Algorithms, 2018, 11, 19.
The package documentation is available here online or under docs/index.html
.
The sources for the documentation can be found under docsrc/
and can be build using Sphinx.
Refer to the documentation for more details. Install from PyPi
$ pip install commonnn-clustering
or clone the development version and install from a local branch
$ git clone https://github.com/bkellerlab/CommonNNClustering.git
$ cd CommonNNClustering
$ pip install .
>>> from commonnn import cluster
>>> # 2D data points (list of lists, 12 points in 2 dimensions)
>>> data_points = [ # point index
... [0, 0], # 0
... [1, 1], # 1
... [1, 0], # 2
... [0, -1], # 3
... [0.5, -0.5], # 4
... [2, 1.5], # 5
... [2.5, -0.5], # 6
... [4, 2], # 7
... [4.5, 2.5], # 8
... [5, -1], # 9
... [5.5, -0.5], # 10
... [5.5, -1.5], # 11
... ]
>>> clustering = cluster.Clustering(data_points)
>>> clustering.fit(radius_cutoff=1.5, similarity_cutoff=1, v=False)
>>> clustering.labels
array([1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2])
We provide an alternative approach to CommonNN clustering in the spirit of the scikit-learn project within scikit-learn-extra.
The present development repository has diverged with changes from the original one under github.com/janjoswig/CommonNNClustering.
A previous implementation of the clustering can be found under github.com/bettinakeller/CNNClustering.