Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
Primary LanguagePython