RLHFlow/Online-RLHF

More RLHF algorithms in the implementation

WayXG opened this issue · 1 comments

I saw the choice of the loss type indicating that several other loss functions can be used like hinge, ipo, raft ...

I am wondering whether we only need to modify the loss choice and do not need to modify other parts of the codes.

Yes. That is also our purpose.

We leave the loss type api for users and researchers to develop other loss functions beyond KL-induced ones.

We hope the community can find better recipes and solutions for the whole pipeline.