More RLHF algorithms in the implementation
WayXG opened this issue · 1 comments
WayXG commented
I saw the choice of the loss type indicating that several other loss functions can be used like hinge, ipo, raft ...
I am wondering whether we only need to modify the loss choice and do not need to modify other parts of the codes.
hendrydong commented
Yes. That is also our purpose.
We leave the loss type api for users and researchers to develop other loss functions beyond KL-induced ones.
We hope the community can find better recipes and solutions for the whole pipeline.