This repository contains numerous clustering methods, including hierarchical clustering, CURE (Clustering Using REpresentatives), k-means and DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Clustering is a useful technique in data mining and statistical data analysis used to group similar data together and identify patterns in distributions.
The clustering algorithms above have been separated into two scripts - k-means can be found in kmeans.q
, while the other algorithms can be found in clust.q
. Additionally, example notebooks have been provided to show how the algorithms perform on a variety of datasets.
A k-dimensional tree (k-d tree) is used by the single and centroid hierarchical algorithms, as well as for CURE which can use both q and C implementations of the k-d tree.
- embedPy
- Matplotlib 2.1.1
- Matplot3d
- Scikit learn
- PyClustering (data samples)
The clustering library is still in development, further improvements will be made to the library in the coming months.