firechecking/CleanRL
Reinforcement Learning algorithms and use-cases, including DQN, PG, A3C, PPO etc. and RLHF, AlphaZero implementations. Designed for clarity, ease of use, and educational purposes.
PythonMIT
Stargazers
- ssdutyuyang199401
- aboycoder
- huoliangyu
- soldatjiangBeijing
- ddz-mark
- qwsdeef
- TENGBINN
- jaykay233China
- Maxwell-RBeijing, China
- BellmanTimeHut
- huozqqq
- Zephyn-W
- wananshadan
- ruziniuuuuuSingapore
- jkznst
- ACkuku
- georgethrax
- lmz-0915
- SongTunes
- SylvanLiu
- TTanJian
- chenchangjin-ccj
- hiplayerShanghai
- ccola-iceBeijingļ¼China
- Code-Tyro
- Whitemillet