JongKook-Heo/DPPO
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
PythonMIT
No issues in this repository yet.
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
PythonMIT
No issues in this repository yet.