/refined-dpo

Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs

Primary LanguageJupyter NotebookMIT LicenseMIT

Stargazers