/RLCD

Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment

Primary LanguagePythonMIT LicenseMIT

Stargazers