Unsupervised metric learning via K-ISOMAP for high-dimensional data clustering
Clustering does not assume the knowledge of the class labels, defining an unsupervised learning paradigm relevant for many pattern recognition and machine learning problems. One of the limitations of clustering is that most algorithms rely heavily in a distance function used to compute a dissimilarity measure between the samples. The usual choice is the Euclidean distance, which is known to have a poor discriminant power in high-dimensional spaces and also is quite sensitive to the presence of outliers. In this paper, we propose to investigate how unsupervised metric learning via the curvature-based ISOMAP algorithm (K-ISOMAP) can influence the clustering performance by means of quantitative evaluation metrics. The computation of the local curvature in the dimensionality reduction process allows the incorporation of an adaptive and intrinsic distance metric function into clustering, making it more aware of the geometry of the underlying data manifold. Computational experiments with real-world datasets indicate that the use of K-ISOMAP prior to a clustering algorithm may produce superior Rand, Calinski-Harabasz and Fowlkes-Mallows indices, in comparison to raw and regular ISOMAP data, suggesting that learning a suitable metric can be a relevant pre-processing step before clustering.