Fine-Tune Downstream Tasks
yuanenming opened this issue · 2 comments
I am trying to reproduce the downstream evaluation results. (e.g. 0.68 spearman rho for the fluorescence task).
I use the command below:
tape-train-distributed transformer fluorescence \
--from_pretrained bert-base \
--batch_size 64 \
--learning_rate 0.00001 \
--warmup_steps 1000 \
--nproc_per_node 8 \
--gradient_accumulation_steps 1 \
--num_train_epochs 20 \
--patience 3 \
--save_freq improvement
then I evaluate the model using:
tape-eval transformer fluorescence results/fluorescence_transformer_21-04-12-06-55-02_240214 --metrics spearmanr
But the spearman rho is much lower (~0.3 for several trials).
(BTW, I use learning rate 1e-5 here, because the result of using 1e-4 is even worse.)
So am I got something wrong in the command above?
do you have any suggestions about the hpaparms?
THANKS A LOT !!!!
Hi, @rmrao ,
I suspect that the pre-training task is MLM pseudo task, which did not learn protein-level signal explicitly but only AA-level signal. So it is a little bit hard for the BERT model to 'fine-tune' for a downstream task. So the fine-tuning process might be unstable, sensitive to hyper-parameters.
But in NLP, this problem can be alleviated, because the words' information is very rich.
First, yes MLM does not use a protein level signal. Second, the hyperparameters for fluorescence are quite tricky. See here for a hyperparameter sweep on this task.
Changes I suggest - use a total batch size of 32 (large batches are worse for fluorescence). So probably don't need more than 1 GPU. Also a warmup_constant learning rate schedule and 10000 warmup steps seem best. Learning rate 1e-5 is best.