/Dr_DPO

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Primary LanguagePython

Watchers