Automatic Clustering of Hierarchical Clustering Representations Library Dependencies: numpy, if graphing is desired - matplotlib OPTICS implementation used has dependencies include numpy, scipy, and hcluster An implementation of the following algorithm, with some minor add-ons: J. Sander, X. Qin, Z. Lu, N. Niu, A. Kovarsky, K. Whang, J. Jeon, K. Shim, J. Srivastava, Automatic extraction of clusters from hierarchical clustering representations. Advances in Knowledge Discovery and Data Mining (2003) Springer Berlin / Heidelberg. 567-567 available from http://dx.doi.org/10.1007/3-540-36175-8_8 Implemented in Python by Amy X. Zhang, while at Cambridge Computer Laboratory in March 2012. Now at MIT. axz@mit.edu http://people.csail.mit.edu/axz This algorithm assumes you have already obtained a hierarchical clustering representation of your data, through OPTICS or other means. The clustering algorithm I use is OPTICS and an implementation in python can be found here: http://chemometria.us.edu.pl/download/optics.py. See the demo on how to use the OPTICS code with this. OPTICS is a hierarchical clustering algorithm for finding density-based clusters in spatial data. It returns a list of points linearly ordered so that spatially close points are neighbors as well as an associated reachability value for every point. This makes up a reachability plot, a special kind of dendrogram. For this algorithm, the input is a reachability plot. If necessary, dendrograms can be converted to reachability plots and vice versa, as outlined in the above paper. The algorithm takes in a reachability plot and list of points ordered by OPTICS. It returns the root node of the hierarchical clustering tree created by finding the local maximas in the reachability plot and choosing whether to use or discard them. Please see the paper for a more detailed description. There are a few changes in the algorithm given in the paper, and they are noted in the code. For functionality, I have added helper functions to print out, graph, write to file, or extract only leaves from the tree. See demo for examples on using them. This code was used for the following publication (Bibtex format): @inproceedings{zhang2013hoodsquare, title={Hoodsquare: Modeling and recommending neighborhoods in location-based social networks}, author={Zhang, Amy X and Noulas, Anastasios and Scellato, Salvatore and Mascolo, Cecilia}, booktitle={Social Computing (SocialCom), 2013 International Conference on}, pages={69--74}, year={2013}, organization={IEEE} }
amyxzhang/OPTICS-Automatic-Clustering
automatic hierarchical clustering of OPTICS reachability plots implemented in Python
Python