About the hyper-parameters settting.

Question

About the hyper-parameters settting.

blue-blue272 opened this issue 2 years ago · 3 comments

It is a wonderful work!

But I see that you use different hyper-parameters (epochs/batchsize/lr) for different datasets, such as CIFAR and ImageNet. How do you set these hyperparameters if you do not have access to a labeled validation set.

Answer 1 · 2022-05-19T09:42:50.000Z

Hi @blue-blue272,

This is due to practical reasons. ImageNet contains larger images, so it can be hard to fit the same batch size into memory. Well, if you change the batch size you should change the learning rate as well. As a rule of thumb, reduce the learning rate with factor x if you reduced the batch size with factor x (the same applies when increasing it). Please read the README, you can determine these parameters based on the loss. If the loss doesn't go down, you should increase or decrease the learning rate. You don't need a labeled validation set to set the learning rate. A similar argument can be made for the number of epochs. As a result, this shouldn't be considered an issue.

Answer 2 · 2022-05-19T10:13:45.000Z

@wvangansbeke Thanks for your replying! It seems that we can rely on experience to set some hyper-parameters. It seems that some unsupervised works, such as IIC (I am not very sure about that), use a labeled validation set to select training hyper-parameters. Is this operation (select hyper-parameters using a labeled validation set) reasonable in unsupervised learning? I am very confused about how to select hyper-parameters in unsupervised learning areas.

Answer 3 · 2022-05-19T10:27:25.000Z

You could argue that once these hyperparameters are determined for complex datasets (e.g., ImageNet, COCO) the same set is applicable to other datasets (of a similar domain), allowing you to train on this new dataset in an "unsupervised" fashion. Most parameters can be set based on the loss function and stay fairly consistent from my experience. But I agree that some approaches use uncommon augmentations and various custom thresholds to make the approach "working". You could question if it is still "unsupervised" in this case. That's why I mention in the README to use a validation set to report results at the very least. So, I do get your point.