facebookresearch/RLCD

Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment

PythonMIT