facebookresearch/RLCD

Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment

PythonMIT

Issues

Share the checkpoint of the reward model and rl-tuned model?
#5 opened a year ago by chenweixin107
1
Argument meaning.
#4 opened a year ago by LanShanPi
2
How to find file?
#3 opened a year ago by LanShanPi
2