salesforce/CodeRL

what is the super-parameters for RL training

Zyq-scut opened this issue · 2 comments

Hi, thanks for the nice job. I try to reproduce the result reported in the paper. However, I didn't find the detail about the training parameters (eg. learning rate, number of epoch) of second stage fine-tune (RL). I train RL with the same parameters as the first stage fine-tune (SL), but the performance degrade a lot. I think it is due to the wrong super-parameters. Could you share the detail about that? Thanks in advance.

@Zyq-scut for RL finetuning, the training can be quite sensitive to hyperparameters. Based on my experience, you should experiment with a larger batch size e.g. 256 samples per training step, and experiment with lower learning rates.

Thanks. I will try again.