liziniu/policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
Python
No issues in this repository yet.
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
Python
No issues in this repository yet.