facebookresearch/swav

[Question] compare deepclustering v1 and v2

pp1016 opened this issue · 2 comments

Hi, I have some questions regarding the implementation of deep clustering v1 and v2,

(1) I found in previous version (v1), the projection head is reset for each epoch, but in this v2 is not reset for every epoch?
(2) In previous version, there are two optimizers for the lower layers and projection head respectively, but in v2 it only has one optimizer for all the parameters in the model? Is it necessary to build a separate optimizer for the projection head? Or can we set the require_grad == false for parameters in projection head and use only one optimizer?

I am trying to reproduce the results on CIFAR but it seems the cluster quality (NMI) fails to increase over epochs. I am not sure where the problem is so I am looking for the key steps in your work. Thanks for your help and look forward to your reply!

Hi, I have some questions regarding the implementation of deep clustering v1 and v2,

(1) I found in previous version (v1), the projection head is reset for each epoch, but in this v2 is not reset for every epoch?
(2) In previous version, there are two optimizers for the lower layers and projection head respectively, but in v2 it only has one optimizer for all the parameters in the model? Is it necessary to build a separate optimizer for the projection head? Or can we set the require_grad == false for parameters in projection head and use only one optimizer?

I am trying to reproduce the results on CIFAR but it seems the cluster quality (NMI) fails to increase over epochs. I am not sure where the problem is so I am looking for the key steps in your work. Thanks for your help and look forward to your reply!

Hi, pp1016,
If you read SwAV paper you can see what is the difference between deep cluster v1 and v2. paper:
"Training phase in DeepCluster-v2. In the original DeepCluster work, both the classification head
c and the convnet weights are trained to classify the images into their corresponding pseudo-label between two assignments. Intuitively, this classification head is optimized to represent prototypes
for the different pseudo-classes. However, since there is no mapping between two consecutive
assignments: the classification head learned during an assignment becomes irrelevant for the following
one. Thus, this classification head needs to be re-set at each new assignment which considerably
disrupts the convnet training. For this reason, we propose to simply use for classification head c the
centroids given by k-means clustering"

Hi @pp1016

  1. In DeepCluster-v2 we do not backpropagate through the projection head prototypes and simply set them to the centroids of k-means. As mentioned in our paper, this improves the model stability.

  2. I think it is better to have only one optimized. In the first version I was using two optimizers but that was mainly a hack. See facebookresearch/deepcluster#82