what is the super-parameters for RL training
Zyq-scut opened this issue · 2 comments
Zyq-scut commented
Hi, thanks for the nice job. I try to reproduce the result reported in the paper. However, I didn't find the detail about the training parameters (eg. learning rate, number of epoch) of second stage fine-tune (RL). I train RL with the same parameters as the first stage fine-tune (SL), but the performance degrade a lot. I think it is due to the wrong super-parameters. Could you share the detail about that? Thanks in advance.
henryhungle commented
@Zyq-scut for RL finetuning, the training can be quite sensitive to hyperparameters. Based on my experience, you should experiment with a larger batch size e.g. 256 samples per training step, and experiment with lower learning rates.
Zyq-scut commented
Thanks. I will try again.