/CleanRL

Reinforcement Learning algorithms and use-cases, including DQN, PG, A3C, PPO etc. and RLHF, AlphaZero implementations. Designed for clarity, ease of use, and educational purposes.

Primary LanguagePythonMIT LicenseMIT

Stargazers