/rlfh

Primary LanguagePython

PPO-DPO-RLFH

DPO bradley-terry model (backbone of DPO) image loss function of DPO image objective function image 최적화 image policy 정책 image

policy to bradley model image

architecture finally image

code implementation image