/RLHF-APA

RL algorithm: Advantage induced policy alignment

Primary LanguagePythonMIT LicenseMIT

Watchers