[FEATURE REQUEST] Dynamic Rewarding with Prompt Optimization (DRPO)
Opened this issue · 0 comments
rrfaria commented
Feature Request
I would like to suggestion:
Dynamic Rewarding with Prompt Optimization (DRPO):
https://arxiv.org/html/2411.08733v1#S1
Motivation
Another approach to try
I don't know if it works but can be a starting point:
https://github.com/Singla17/dynamic-alignment-optimization/tree/master
Additional Context
No response