/trl-dpo-alpaca-farm-demo

Demo training Alpaca Farm dataset with trl DPO

Primary LanguagePython

Watchers