DCFcluster is a Python library for implementing the Density Core Finding (DCF) clustering method.
Tobin J, Zhang M. (2021) DCF: An Efficient and Robust Density-Based Clustering Methods. DOI 10.1109/ICDM51629.2021.00074 (To appear in ICDM 2021)
DCFcluster requires the NumPy, SciPy, SciKit Learn and Itertools libraries to operate. To run the example included, MatPlotLib is also required.
To run CPFcluster on the synthetic datasets used as benchmarks in SciKit Learn:
python3 run_synthetic.py
This will save a PDF in the directory demonstrating the ability of DCF to correctly cluster data.
To call DCF for a dataset use
from DCFcluster import DCFcluster
result = DCFcluster.train(X, k, beta)
The returned object contains the computed values for the peak-finding criterion (peak_values), the indices of points belonging to the cluster cores (core_sets) and the final clustering of the data (labels).
To replicate the experiments included in 'DCF: An Efficient and Robust Density-Based Clustering Method', run the following:
python3 download_data.py
python3 run_downloaded.py
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Please contact Joshua Tobin (tobinjo@tcd.ie).
Future additions to the repository will provide ways to pass arguments to DCF from the command line.