OFA-Sys/gsm8k-ScRel

Questions about RFT Inference

waterhorse1 opened this issue · 5 comments

Thanks for this great work. I have two questions: the first one is that the generation code for 7b/13b seems to be missing. The second is about the specific hyperparameter settings. The default hyperparameters set in single_inference_30b.py are not reasonable for generating different reasoning paths.

Thank you for your help!

You want to check group_7b_13b.sh. We have discussed in the paper, if you use temp=0.7 for 33b, you will generate like 2 different paths for 100 sampling times. If you use temp=1.0, you will have 4 different paths for 100 sampling times.

I will upload gen_train.sh later.

@GanjinZero What kind of decoding strategy are you using, direct sampling or beam search?

sampling

thanks for your answer! I will close the issue.