obidam/pyxpcm

Method to rename clusters

gmaze opened this issue · 2 comments

gmaze commented

Cluster IDs are set randomly by the classifier.
So when running multiple configurations of a PCM, it is complicated to understand the analysis if cluster IDs are changing every time.
A simple solution to this issue is to sort cluster IDs using a metric from the training set, this could be for instance:

  • the vertical average of features,
  • a value at a given depth,
  • a cluster median latitude or longitude
  • etc ...

This function is available in the Matlab PCM toolbox as a rename_labels function and should be implemented within pyXpcm as well.

sdat2 commented

I think this function solves this problem in sklearn:

import copy

def sort_gmm_by_mean(gmm):
    weights = copy.deepcopy(gmm.weights_)
    means = copy.deepcopy(gmm.means_)
    covariances = copy.deepcopy(gmm.covariances_)
    precisions = copy.deepcopy(gmm.precisions_)
    precisions_cholesky = copy.deepcopy(gmm.precisions_cholesky_)
    # sorts so that the lowest is 0
    new_order = np.argsort(gmm.means_[:, 0]) # means.mean(axis=1))

    for i in range(means.shape[0]):
        # altering GMM
        gmm.weights_[i] =  weights[new_order[i]]
        gmm.means_[i, :] = means[new_order[i], :]
        gmm.covariances_[i, :, :] = covariances[new_order[i], :, :]
        gmm.precisions_[i, :, :] = precisions[new_order[i], :, :]
        gmm.precisions_cholesky_[i, :, :] = precisions_cholesky[new_order[i], :, :]

    return gmm
gmaze commented

thanks @sdat2 for pointing this out !
this could be indeed much more simple to implement and would return sorted clusters by default
let's give this a try
g