/dpo_toxic

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.

Primary LanguageJupyter NotebookMIT LicenseMIT

Stargazers