huggingface/autotrain-advanced

[FEATURE REQUEST] Dynamic Rewarding with Prompt Optimization (DRPO)

Opened this issue · 0 comments

Feature Request

I would like to suggestion:
Dynamic Rewarding with Prompt Optimization (DRPO):
https://arxiv.org/html/2411.08733v1#S1

Motivation

Another approach to try

I don't know if it works but can be a starting point:
https://github.com/Singla17/dynamic-alignment-optimization/tree/master

Additional Context

No response