what is the super-parameters for RL training

Question

what is the super-parameters for RL training

Zyq-scut opened this issue 2 years ago · 2 comments

Hi, thanks for the nice job. I try to reproduce the result reported in the paper. However, I didn't find the detail about the training parameters (eg. learning rate, number of epoch) of second stage fine-tune (RL). I train RL with the same parameters as the first stage fine-tune (SL), but the performance degrade a lot. I think it is due to the wrong super-parameters. Could you share the detail about that? Thanks in advance.

Answer 1 · 2023-03-14T07:34:35.000Z

@Zyq-scut for RL finetuning, the training can be quite sensitive to hyperparameters. Based on my experience, you should experiment with a larger batch size e.g. 256 samples per training step, and experiment with lower learning rates.

Answer 2 · 2023-03-31T12:36:29.000Z

Thanks. I will try again.