Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
Primary LanguageJupyter NotebookMIT LicenseMIT