/dpo

Implementation of Direct Preference Optimization

Primary LanguageJupyter Notebook

Watchers