Common-nearest-neighbour clustering

The commonnn Python package provides a flexible interface to use the common-nearest-neighbour (CommonNN) clustering procedure. While the method can be applied to arbitrary data, this implementation was made before the background of processing trajectories from Molecular Dynamics (MD) simulations. In this context the cluster result can serve as a suitable basis for the construction of a core-set Markov-state (cs-MSM) model to capture the essential dynamics of the underlying molecular processes.

The commonnn package

The package provides a main module:

cluster: User interface to (hierarchical) CommoNN clustering

Further, it contains among others the modules:

plot: Convenience functions to evaluate cluster results
_types: Direct access to generic types representing needed cluster components
_fit: Direct access to generic clustering procedures

Features:

Flexible: The clustering can be done for data sets in different input formats. Internal parts of the procedure can be exchanged. Interfacing with external methods is made easy.
Convenient: Integration of functionality, which may be handy in the context of MD data analysis.
Fast: Core functionalities have been implemented in Cython.

Please refer to the following papers for the scientific background (and consider citing if you find the method useful):

B. Keller, X. Daura, W. F. van Gunsteren J. Chem. Phys., 2010, 132, 074110.
O. Lemke, B.G. Keller J. Chem. Phys., 2016, 145, 164104.
O. Lemke, B.G. Keller Algorithms, 2018, 11, 19.

Documentation

The package documentation is available here online or under docs/index.html. The sources for the documentation can be found under docsrc/ and can be build using Sphinx.

Install

Refer to the documentation for more details. Install from PyPi

$ pip install commonnn-clustering

or clone the development version and install from a local branch

$ git clone https://github.com/bkellerlab/CommonNNClustering.git
$ cd CommonNNClustering
$ pip install .

Quickstart

>>> from commonnn import cluster

>>> # 2D data points (list of lists, 12 points in 2 dimensions)
>>> data_points = [   # point index
...     [0, 0],       # 0
...     [1, 1],       # 1
...     [1, 0],       # 2
...     [0, -1],      # 3
...     [0.5, -0.5],  # 4
...     [2,  1.5],    # 5
...     [2.5, -0.5],  # 6
...     [4, 2],       # 7
...     [4.5, 2.5],   # 8
...     [5, -1],      # 9
...     [5.5, -0.5],  # 10
...     [5.5, -1.5],  # 11
...     ]

>>> clustering = cluster.Clustering(data_points)
>>> clustering.fit(radius_cutoff=1.5, similarity_cutoff=1, v=False)
>>> clustering.labels
array([1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2])

Alternative scikit-learn implementation

We provide an alternative approach to CommonNN clustering in the spirit of the scikit-learn project within scikit-learn-extra.

Development history

The present development repository has diverged with changes from the original one under github.com/janjoswig/CommonNNClustering.

A previous implementation of the clustering can be found under github.com/bettinakeller/CNNClustering.