The repository includes a modular implementation for Fuzzy K-Means based on numpy with sklearn like interface.
The algorithm iteratively computes two values until convergence:
- the centroid of the ith cluster
- the degree to which a data point belongs to a cluster whose centroid is ;
note ,
Given a fuzzification index, m, and the number of clusters, n, we compute the above values as below:
As well, the cluster centroid is just a weighted mean of all the data points, having weights equal to how much it belongs to this cluster or mathematically:
Therefore, we keep iterating on computing these two values until convergence.
Our module has a similar interface to that of normal KMeans
provided by sklearn
. The initializer interface accepts the parameters of KMeans
besides:
m
: indicates the fuzziness index according to the above equationseps
: determines the threshold value to recognize convergence.
The lower the value to more accurate the results would be. Its default value is0.001
Given that, the below code demonstrates how to use the module:
# ==============================================================================
# We assume that <X> holds the data samples, upon which we will cluster them
# ------------------------------------------------------------------------------
# We initialize the fuzziness index, m, with 2
# As well, we would like to have 3 clusters
# ==============================================================================
fkm = FuzzyKMeans(m=2, n_clusters= 3)
# ==============================================================================
# Fit the model to the training data <X>
# ==============================================================================
fkm = fkm.fit(X)
# ==============================================================================
# Get the fitting results
# - cluster_centers_: the centroids of the clusters
# - labels_: the data point labels, where each belongs to the cluster hav-
# ing the highest membership value of <w>
# - fmm_: the fuzzy membership value of each data point to each cluster, w
# ==============================================================================
fitted_centroids = fkm.cluster_centers_
X_labels = fkm.labels_
fmm = fkm.fmm_
# ==============================================================================
# You can as well predict, get the labels of other data and get the membership
# values
# ==============================================================================
new_labels = fkm.predict(new_X)
new_fmm = fkm.compute_membership(new_X)
Please feel free to checkout this notebook that compares between KMeans and our fuzzy implementation of it. Notice: we change the opacity to indicate how much a data point belongs to a cluster. Below is a the brief results at various values of m
sklearn 1.0.2
numpy 1.19.5
- Github: Fuzzy-K-Means
- Email: ammarsherif90 [at] gmail [dot] com