"ValueError: Clustering algorithm could not initialize" Getting this error on having the number of clusters for any value above 6
sumitkrmahato opened this issue · 2 comments
The dataset that I am using has 3547 rows and 47 columns. It contains 5 categorical columns.
I am trying to find the optimal number of clusters by finding and plotting the cost of the model for each number of clusters.
Following is the code snippet:
cost = [] clusters = [] range_clusters = list(range(2,10)) for i in range_clusters: if(i==7): print() print("Running k-prototypes for num clusters = {}...".format(i)) kproto = KPrototypes(n_clusters=i, init='Huang', verbose=0) clusters_i = kproto.fit_predict(X, categorical=[42, 43, 44, 45, 46]) cost_i = kproto.cost_ print('Cost = {}'.format(cost_i)) cost.append(cost_i) clusters.append(clusters_i)
It generates clusters until num of clusters = 6 but fails for any value above it. Can you please help as to why this is happening
I tried clearing the local variables at the end of each iteration. Now it generates cluster for num of clusters = 7 but fails for 8 and onwards.
#Clearing the local variables kproto = None clusters_i = None cost_i = None
See relevant entry in FAQ: https://github.com/nicodv/kmodes/blob/master/README.rst#faq