/DPO

Primary LanguageJupyter Notebook

ADVANCED FINE-TUNING REPO

The repo is split across branches. This branch is for Direct Preference Optimization (DPO).

Helper scripts

  • hh_rlhf_dpo.ipynb allows for dataset generation