reinforcement learning improvement
Ski-ing opened this issue · 2 comments
Ski-ing commented
How significant is the improvement in code generation performance metrics attributed to the Group Relative Policy Optimization (GRPO) within the reinforcement learning component?
DeepSeekPH commented
The performance of GRPO varies depending on the test sets. Generally, GRPO demonstrates an improvement of approximately 0.5 points on code generation test sets. The enhancements on math-related benchmarks are more substantial.
Ski-ing commented
Thanks for your reply