Question about loss and hyperparameters

Question

Question about loss and hyperparameters

TopTea1 opened this issue 3 years ago · 4 comments

Hi, first, thanks for sharing your great work !

I'm trying to train my network using swav in 2x160 + 4x96 setting, I'm using the hyperparameters provided in the bs_256 script. The loss start to decrease, and seems to be stuck after 3-4epochs did I need to adapt the hyperparameters ? Or made some other changes ?

Thanks for your helps

Answer 1 · 2021-06-03T06:46:39.000Z

Hi, first, thanks for sharing your great work !

I'm trying to train my network using swav in 2x160 + 4x96 setting, I'm using the hyperparameters provided in the bs_256 script. The loss start to decrease, and seems to be stuck after 3-4epochs did I need to adapt the hyperparameters ? Or made some other changes ?

Thanks for your helps

Reduce queue length. I encountered this issue in some of my training and using a smaller queue solved the problem.

Answer 2 · 2021-06-04T16:10:50.000Z

I had the same issue. During my warm up training, the loss was not reduced anymore and kept the same value.

Answer 3 · 2021-06-08T20:17:53.000Z

Hi @TopTea1
Thanks for your interest and your kind words.
As suggested by @Erfun76 I would suggest to reduce the queue length or starting the queue later on in training (i.e. --epoch_queue_starts 50 for example). Also feel free to take a look at this section for tips on how to get the model training https://github.com/facebookresearch/swav#common-issues

Answer 4 · 2022-12-27T18:25:09.000Z

If I understood correctly, you still keep the same number of prototypes (3000) and batch size (64), while reducing the queue size from 3840 to, e.g., 50? How does that influence the equipartition constraint?