/GNCC

Geodesic nearest centroid classifier

Primary LanguagePython

GNCC

Geodesic nearest centroid classifier

Supervised classification is a fundamental task for pattern recognition and machine learning. Many different classifiers have been proposed to overcome the limitations of supervised learning in practical problems. Despite all the efforts, due to the curse of the dimensionality, classifying high-dimensional and small sample size data still pose a challenge to reasearchers and practitioners from several fields of science. In this paper, we propose the geodesic nearest centroid classifier (GNCC), a graph-based method for classification that builds a discrete approximation to the data manifold and employs shortest paths to approximate the geodesic distances between sample points, replacing the regular Euclidean distance by a more meaningful metric. An analysis of the complexity reveals that GNCC is log-linear in the number of samples and linear in the number of edges, showing that the method is quite efficient in comparison to modern supervised classification techniques. We performed several computational experiments with real datasets to demonstrate the effectiveness of GNCC. The obtained results show that the proposed GNCC algorithm is capable of improving the mean accuracy for high-dimensional data classification, especially when we have a reduced number of samples in the training set, in comparison to the regular nearest neighbor classifier (NCC), support vector machines (SVM), k-nearest neighbors classifier (k-NN) and XGBoost. Given our findings, the proposed method can be considered as a viable and promissing alternative for supervised classification of high-dimensional data.