OpenPipe/ART

Feature: Dr. GRPO Support

Opened this issue · 0 comments

It would be nice if art would support the Dr. GRPO optimizer.
Basically, Dr. GRPO is supposed to remove the response length bias which is present in the original GRPO, while retaining the performance benefits of GRPO. See Dr. GRPO Paper for more info.

Important Note: trl's GRPOTrainer already supports this by setting loss_type="dr_grpo" in GRPOConfig.

I believe this technique might be beneficial, especially for multi-turn RL scenarios, in which token length increase is more significant.