Questions about RFT Inference
waterhorse1 opened this issue · 5 comments
Thanks for this great work. I have two questions: the first one is that the generation code for 7b/13b seems to be missing. The second is about the specific hyperparameter settings. The default hyperparameters set in single_inference_30b.py are not reasonable for generating different reasoning paths.
Thank you for your help!
You want to check group_7b_13b.sh. We have discussed in the paper, if you use temp=0.7 for 33b, you will generate like 2 different paths for 100 sampling times. If you use temp=1.0, you will have 4 different paths for 100 sampling times.
I will upload gen_train.sh later.
@GanjinZero What kind of decoding strategy are you using, direct sampling or beam search?
sampling
thanks for your answer! I will close the issue.