/DCFcluster

An implementation of the Density Core Finding clustering method.

Primary LanguagePython

DCFcluster

DCFcluster is a Python library for implementing the Density Core Finding (DCF) clustering method.

Tobin J, Zhang M. (2021) DCF: An Efficient and Robust Density-Based Clustering Methods. DOI 10.1109/ICDM51629.2021.00074 (To appear in ICDM 2021)

Demo of the DCF Method

Set Up

DCFcluster requires the NumPy, SciPy, SciKit Learn and Itertools libraries to operate. To run the example included, MatPlotLib is also required.

Run

To run CPFcluster on the synthetic datasets used as benchmarks in SciKit Learn:

python3 run_synthetic.py 

This will save a PDF in the directory demonstrating the ability of DCF to correctly cluster data.

To call DCF for a dataset use

from DCFcluster import DCFcluster
result = DCFcluster.train(X, k, beta)

The returned object contains the computed values for the peak-finding criterion (peak_values), the indices of points belonging to the cluster cores (core_sets) and the final clustering of the data (labels).

DCF Applied to Synthetic Datasets

To replicate the experiments included in 'DCF: An Efficient and Robust Density-Based Clustering Method', run the following:

python3 download_data.py
python3 run_downloaded.py

License

MIT

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Questions or Comments

Please contact Joshua Tobin (tobinjo@tcd.ie).

Future additions to the repository will provide ways to pass arguments to DCF from the command line.