nathanieljevans/cyclicIF_registration

k-means-constrained failing - `There was an issue with the min cost flow input.`

Closed this issue · 2 comments

I suspect it's because there are more observations than allowed due to the constraint (k is set by R0)

number of clusters in each round: (78, 80, 81)

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-55-152602d503a2> in <module>
----> 1 cluster_labels = match.match_cores_across_rounds(res)
      2 res = res.assign(cluster = cluster_labels)

/home/exacloud/lustre1/NGSdev/evansna/cyclicIF/cyclicIF_registration/workflow/libs/match.py in match_cores_across_rounds(info)
     64     # https://github.com/joshlk/k-means-constrained
     65     clus = KMeansConstrained(n_clusters=num_R0_components, init=seeds, size_max=num_of_rounds, n_init=1, tol=1e-8, max_iter=1000)
---> 66     _ = clus.fit( X )
     67 
     68     return clus.labels_ + 1

/home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in fit(self, X, y)
    629 
    630         self.cluster_centers_, self.labels_, self.inertia_, self.n_iter_ = \
--> 631             k_means_constrained(
    632                 X, n_clusters=self.n_clusters,
    633                 size_min=self.size_min, size_max=self.size_max,

/home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in k_means_constrained(X, n_clusters, size_min, size_max, init, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, return_n_iter)
    173         for it in range(n_init):
    174             # run a k-means once
--> 175             labels, inertia, centers, n_iter_ = kmeans_constrained_single(
    176                 X, n_clusters,
    177                 size_min=size_min, size_max=size_max,

/home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in kmeans_constrained_single(X, n_clusters, size_min, size_max, max_iter, init, verbose, x_squared_norms, random_state, tol)
    325         # labels assignment is also called the E-step of EM
    326         labels, inertia = \
--> 327             _labels_constrained(X, centers, size_min, size_max, distances=distances)
    328 
    329         # computation of the means is also called the M-step of EM

/home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in _labels_constrained(X, centers, size_min, size_max, distances)
    396 
    397     edges, costs, capacities, supplies, n_C, n_X = minimum_cost_flow_problem_graph(X, C, D, size_min, size_max)
--> 398     labels = solve_min_cost_flow_graph(edges, costs, capacities, supplies, n_C, n_X)
    399 
    400     # cython k-means M step code assumes int32 inputs

/home/exacloud/lustre1/NGSdev/evansna/external/anaconda3/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py in solve_min_cost_flow_graph(edges, costs, capacities, supplies, n_C, n_X)
    483     # Find the minimum cost flow between node 0 and node 4.
    484     if min_cost_flow.Solve() != min_cost_flow.OPTIMAL:
--> 485         raise Exception('There was an issue with the min cost flow input.')
    486 
    487     # Assignment

Exception: There was an issue with the min cost flow input.

Check for exception and fall back on normal k-means?

Switched to using DBSCAN - this allows outliers and works quite a bit better.