/RPBT

Implementation of RPPO(Risk-sensitive PPO) and RPBT(Population-based self-play with RPPO)

Primary LanguagePythonMIT LicenseMIT

Watchers