/copac

COPAC clustering

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

https://travis-ci.com/VarIr/copac.svg?token=Pv7ns6A7X34baaBVUTz8&branch=master

High dimensional data clustering with COPAC

We implement COPAC (Correlation Partition Clustering), which

  1. computes the local correlation dimensionality based on the largest eigenvalues
  2. partitions the data set based on this dimension
  3. calculates a Euclidean distance variant weighted with the correlation dimension, called correlation distance
  4. further clusters objects within each partition with Generalized DBSCAN, requiring a minimum number of objects to be within eps range for each core point.

Installation

Make sure you have a working Python3 environment (at least 3.6) with numpy, scipy and scikit-learn packages. Consider using Anaconda. You can install COPAC from within the cloned directory with

python3 setup.py install

COPAC is then available through the copac package.

Example

COPAC usage follows scikit-learn's cluster API.

from copac import COPAC
# load some X here ...
copac = COPAC(k=10, mu=5, eps=.5, alpha=.85)
y_pred = copac.fit_predict(X)

Implementation

Published in GitHub: https://github.com/VarIr/copac

Citation

The original publication of COPAC.

@article{Achtert2007,
         author = {Achtert, E and Bohm, C and Kriegel, H P and Kroger, P and Zimek, A},
         title = {{Robust, Complete, and Efficient Correlation Clustering}},
         journal = {Proceedings of the Seventh Siam International Conference on Data Mining},
         year = {2007},
         pages = {413--418}
}

License

This work is free open source software licensed under GPLv3.