Incremental Semi-Supervised Kernel Density Estimator K-Means

Python implementation of the Incremental Semi-Supervised Kernel Density Estimator K-Means by Bortoloti et al.

Frederico Damasceno Bortoloti, Elias de Oliveira, Patrick Marques Ciarelli, Supervised kernel density estimation K-means, Expert Systems with Applications, 2020, 114350, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2020.114350. (http://www.sciencedirect.com/science/article/pii/S0957417420310344)

Abstract: K-means is a well-known unsupervised-learning algorithm. It assigns data points to k clusters, the centers of which are termed centroids. However, these centroids have a structure usually represented by a list of quantized vectors, so that kernel density estimation models can better represent complex data distributions. This paper proposes a k-means-based supervised-learning clustering method termed supervised kernel-density-estimation k-means. The proposed approach uses kernel density estimation for class examples inside each cluster to obtain a better representation of the data distribution. The algorithm constructs an initial model using supervised k-means with an equal seed distribution among the classes so that a balance between majority and minority classes is achieved. We also incorporate incremental semi-supervised learning into the proposed method. Experiments were conducted using publicly available benchmark datasets. The results demonstrated that, compared with state-of-the-art supervised methods, the proposed algorithm, which can also perform incremental semi-supervised learning, achieved highly satisfactory performance.

Keywords: K-means; Supervised learning; Kernel density estimation; Incremental semi-supervised learning