Shall we freeze the pretrained model in main_supervised_baseline.py

Question

Shall we freeze the pretrained model in main_supervised_baseline.py

xiqxin1 opened this issue 2 years ago · 4 comments

xiqxin1 commented 2 years ago

Hi,

I'm curious about how to compare the performance between Self-supervised learning (SSL) and supervised learning tasks.

There are several possible ways:

Train SSL for the pretrain task, then freeze the pretrain model to run the downstream task, which only trains the classification layer.
Train SSL for the pretrain task, then run a downstream task to train the whole model.
Train the whole model from scratch with random parameters.

I want to know if we compare the performance between (1)/(2) with (3), to measure the performance of SSL.

However, as for (1) and (2), I'm not sure if (1) can (or usually) perform better than (3) when the pretrain model is good enough.

Thanks for your answers.

Answer 1 · 2022-11-03T01:22:31.000Z

Hi @xiqxin1,

We are comparing (1) with (3).

In SSL setting, we evaluate the quality of representations (output of backbone network) learned by the model when the labels are not available. In (2), the backbone network has access to the label since the whole model will be retrained on downstream task, which is not consistent with SSL setting. However, in the evaluation phase of (1), only the classifier has access to the label, which is in accord with the SSL setting.

To address your concern, even when the pertained SSL model is good enough, (1) doesn’t necessarily perform better than (3).

Answer 2 · 2022-11-08T12:30:33.000Z

Hi Tian,

Sorry for the late reply. Thank you very much.

It supposes that only when the SSL model extracts features that are similar to the pure supervised learning model, the performance between (1) and (3) will be similar. However, in many cases, if the input data is very complicated, the performance of SSL in case (1) will be poor.

But the SSL can also be helpful if the result in case (2) is better than case (3).

I'm not sure if I understand correctly.

Answer 3 · 2022-11-11T09:05:37.000Z

Hi @xiqxin1,

SSL model’s feature doesn’t necessarily need to be similar to supervised model’s feature to have good performance because they use different objective functions. Two models producing features of different distributions could have similar performances.

For (2), you might consider use it in real application as long as you have labels of downstream task available. However, if (2) performs better than (3), you need to decide if the credit should be given to the SSL model or the labels of downstream task you use to retrain the model.

Hope this helps!

Answer 4 · 2022-11-16T08:39:09.000Z

Thank you for your answer, maybe you're right. I just wonder how the SSL model makes sense. It is different from computer vision, where the SSL combines with data augmentation to guide feature extraction.