Method to rename clusters
gmaze opened this issue · 2 comments
gmaze commented
Cluster IDs are set randomly by the classifier.
So when running multiple configurations of a PCM, it is complicated to understand the analysis if cluster IDs are changing every time.
A simple solution to this issue is to sort cluster IDs using a metric from the training set, this could be for instance:
- the vertical average of features,
- a value at a given depth,
- a cluster median latitude or longitude
- etc ...
This function is available in the Matlab PCM toolbox as a rename_labels
function and should be implemented within pyXpcm as well.
sdat2 commented
I think this function solves this problem in sklearn:
import copy
def sort_gmm_by_mean(gmm):
weights = copy.deepcopy(gmm.weights_)
means = copy.deepcopy(gmm.means_)
covariances = copy.deepcopy(gmm.covariances_)
precisions = copy.deepcopy(gmm.precisions_)
precisions_cholesky = copy.deepcopy(gmm.precisions_cholesky_)
# sorts so that the lowest is 0
new_order = np.argsort(gmm.means_[:, 0]) # means.mean(axis=1))
for i in range(means.shape[0]):
# altering GMM
gmm.weights_[i] = weights[new_order[i]]
gmm.means_[i, :] = means[new_order[i], :]
gmm.covariances_[i, :, :] = covariances[new_order[i], :, :]
gmm.precisions_[i, :, :] = precisions[new_order[i], :, :]
gmm.precisions_cholesky_[i, :, :] = precisions_cholesky[new_order[i], :, :]
return gmm