wvangansbeke/Unsupervised-Classification

Clustering CIFAR-100

Closed this issue · 2 comments

Hi,
Thanks for your great work.
I was wondering if you tried clustering CIFAR100.
I tried to do so by making the following changes:

  1. change the number of clusters in the config from 20 to 100
  2. comment out the lines that convert from cifar100 to cifar20 indices in cifar.CIFAR20

This resulted in worse ARI than K-Means had:
SCAN (no pseudo labeling) - 15.154
KMeans on SimCLR - 22.953

I also tried to change the hyperparameters of SCAN as done in ImageNet (freeze backbone, 10 heads, change optimizer hyper-parameters)
This resulted in:
SCAN (ImageNet params) - 23.2
Also, Pseudo labeling decreased performance in that case.

Have you seen this phenomenon?
Do you have ideas on how to improve clustering performance?

Thanks a lot!

Hi @avihu111,

It is certainly possible that the third step can reduce the performance if the alignment with the ground truth classes is not sufficient after step 2. This can be the case in CIFAR100, so the improvements might be small. The problem with CIFAR100 is that you don't have as many samples per class as ImageNet for example. I haven't run this myself because we were more interested in ImageNet and prior work used the 20 superclasses for CIFAR100. The best advice I can give you, is to use the same settings as we used in ImageNet but I would personally focus on ImageNet if possible. (Also, issue #21 might be helpful.)

Thanks a lot @wvangansbeke - that was helpful!