pytorch/rl

[Feature Request] multi-turn reward for RLHF

vmoens opened this issue · 1 comments

Implement rewards as proposed in https://arxiv.org/pdf/2405.14655

I am very interested in multi-turn RLHF, can you give a sample code