/kkNN

KKNN: an adaptive curvature based nearest neighbor classifier

Primary LanguagePython

kkNN

KKNN: an adaptive curvature based nearest neighbor classifier

Non-parametric classification does not assume a particular functional form for the underlying data distribution, consisting of a suitable approach for many data sets. One of the most popular methods for non-parametric classification is the k-nearest neighbors (k-NN) algorithm. However, one of its limitations concerns the definition of the number of neighbors $k$. This parameter exerts a direct impact on several properties of the classifier, such as: (i) The bias-variance trade-off; (ii) underfitting and overfitting; (iii) smoothness of decision boundaries; (iv) robustness to noise, as a larger value of $k$ may result in a classifier that is less sensitive to outliers; and (v) managing class imbalance, where some classes are underrepresented, a smaller $k$ might be prone to bias towards the dominant class. In this paper, we propose the kk-NN, a new adaptive k-nearest neighbours algorithm that explores the local curvature at a sample to automatically define the neighborhood size. The intuition behind this strategy is that points with low curvature could have larger neighborhoods (i.e. tangent space is tight), whereas points with high curvature could have smaller neighborhoods (tangent space is loose). The idea is to compute an approximation to the local shape operator in terms of the local covariance matrix and local Hessian matrix, allowing the estimation of the approximate local Gaussian curvature. Results obtained with real-world datasets suggest that the proposed kk-NN algorithm may produce superior balanced accuracy compared to the regular k-NN, particularly when the number of samples in the training data is limited - suggesting that the kk-NN is capable of learning more discriminant functions with less data in some relevant cases.