Fine-Tune Downstream Tasks

Question

Fine-Tune Downstream Tasks

yuanenming opened this issue 3 years ago · 2 comments

I am trying to reproduce the downstream evaluation results. (e.g. 0.68 spearman rho for the fluorescence task).

I use the command below:

tape-train-distributed transformer fluorescence \
	--from_pretrained bert-base \
	--batch_size 64 \
	--learning_rate 0.00001 \
  	--warmup_steps 1000 \
  	--nproc_per_node 8 \
  	--gradient_accumulation_steps 1 \
  	--num_train_epochs 20 \
    --patience 3 \
    --save_freq improvement

then I evaluate the model using:

tape-eval transformer fluorescence results/fluorescence_transformer_21-04-12-06-55-02_240214 --metrics spearmanr

But the spearman rho is much lower (~0.3 for several trials).

(BTW, I use learning rate 1e-5 here, because the result of using 1e-4 is even worse.)

So am I got something wrong in the command above?

do you have any suggestions about the hpaparms?

THANKS A LOT !!!!

Answer 1 · 2021-04-14T07:48:23.000Z

Hi, @rmrao ,

I suspect that the pre-training task is MLM pseudo task, which did not learn protein-level signal explicitly but only AA-level signal. So it is a little bit hard for the BERT model to 'fine-tune' for a downstream task. So the fine-tuning process might be unstable, sensitive to hyper-parameters.

But in NLP, this problem can be alleviated, because the words' information is very rich.

Answer 2 · 2021-04-14T15:57:06.000Z

First, yes MLM does not use a protein level signal. Second, the hyperparameters for fluorescence are quite tricky. See here for a hyperparameter sweep on this task.

Changes I suggest - use a total batch size of 32 (large batches are worse for fluorescence). So probably don't need more than 1 GPU. Also a warmup_constant learning rate schedule and 10000 warmup steps seem best. Learning rate 1e-5 is best.