covartech/PRT

Proposal - All clustering objects sort their cluster indices by increasing first dim

peterTorrione opened this issue · 2 comments

Clustering objects for the most part return un-sorted cluster centers, but this makes interpretation difficult.

For most clustering algorithms, there's no natural ordering of the data, but we can enforce a simple sorting, e.g., the first dim of the cluster centers are increasing.

If no complaints, I may implement in some clustering algorithms.

I have no complaints. It's not always straight forward to resort the internals of specific clustering algorithms, but for many it should be relatively straight forward. This isn't something we can do automatically or something that we can assert that all clusters do, but we can do it as a best practice for commonly used clusters like k-means and a few others.

This is implemented in 00db823

for all the clustering algorithms we have that I can run (see the recent two issues #70 and #71 for details).

It works -

%%
ds = prtDataGenBimodal;
ds = rt(prtPreProcZmuv,ds);
clusterers = {prtClusterDpMeans('lambda',1);
    prtClusterGmm;
    prtClusterKmeans;
%     prtClusterKmodes;
%     prtClusterMeanShift;
    prtClusterMeanShiftEuclidean;
    prtClusterSpectralKmeans;
    prtClusterSphericalKmeans;};

[mm,nn] = prtUtilGetSubplotDimensions(length(clusterers));

for cEnum = cvrEnumerate(clusterers)
    c{i} = cEnum.value.train(ds);
    subplot(nn,mm,cEnum.index)
    plot(c{i});
end

The clusters are in the right order - blue, red, green, along the 1st dimension

clusterers_sorted