Feature: Dr. GRPO Support
Opened this issue · 0 comments
giladfrid009 commented
It would be nice if art would support the Dr. GRPO optimizer.
Basically, Dr. GRPO is supposed to remove the response length bias which is present in the original GRPO, while retaining the performance benefits of GRPO. See Dr. GRPO Paper for more info.
Important Note: trl's GRPOTrainer already supports this by setting loss_type="dr_grpo" in GRPOConfig.
I believe this technique might be beneficial, especially for multi-turn RL scenarios, in which token length increase is more significant.