/beta-DPO

$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Primary LanguagePython

Stargazers