Silhouette aggregation revisited: A study of the Silhouette Coefficient when it is micro- and macro-averaged to assess clustering solutions

Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset can be either (micro) averaged into a single value or averaged first at the cluster level and then (macro) averaged. As we illustrate in this work with a synthetic example, the micro-averaging strategy is sensitive both to cluster imbalance and outliers (background noise) while macro-averaging is far more robust to both. Therefore, the aggregation strategy should not be selected arbitrarily, as is currently the common practice, to avoid reporting misleading results. The problem becomes greater when sampling is used with the Silhouette coefficient, because current implementations result in micro-averaged spaces even when macro-averaging is used. To bypass this problem, we propose a per-cluster sampling method. Furthermore, we undertake an experimental study on real-world datasets, analysing both coefficients, micro and macro, and investigating their fit per dataset.

ipavlopoulos/revisiting-silhouette-aggregation

Silhouette aggregation revisited: A study of the Silhouette Coefficient when it is micro- and macro-averaged to assess clustering solutions