no-pretraining case
kocemir opened this issue · 2 comments
Hello,
Thank you very much for this study. I would like to kindly ask why do you freeze most of the parameters in "dist_finetune_nopretraining.py" code. The model does not take weights from the pretrained model, but you still freeze most of them.
Hi and thanks for your interest!
scBERT consists of 6 transformer layers, followed by a 3-layer neural network that is appended during fine-tuning to predict the cell type label. During fine-tuning, the majority of the layers in the transformer are frozen, leaving only the final 2 layers with trainable weights (see GitHub). In order to set up an experiment that would evaluate the benefit of pretraining, we left everything about the finetuning setup unchanged, except we replaced the pretrained transformer weights with random weights. This meant having the same setup as the authors main finetuning setup, where the majority of the transformer weights are frozen and not updated during finetuning.
By the way, you might notice the scBERT authors presented results from a similar pretraining ablation in their Extended Data Figure 1a which showed qualitatively different results than ours. I believe that the authors unfroze all weights in the transformer during this ablation experiment, which meant that their “no pretraining” model was larger (i.e. had more trainable weights) than the finetuning model used in their main results. Given the fact that their experimental setup led to worse results on finetuning and ours did not, it seems that their "no pretraining" model performed worse on cell type annotation due to the larger model capacity, but not due to the removal of pretraining.
Hi,
Thank you very much for the detailed answer.