Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment
Primary LanguagePythonMIT LicenseMIT