学习 OpenAI spinningup
为了更友好, 进一步解耦. 不止为了学习算法, 还有RL的运行流程. 这里只使用pytroch实现.
以下每个算法都在相应文件夹, 都可单独运行.
- Policy Gradient (PG)
- Vanilla Policy Gradient (VPG)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed DDPG (TD3)
- Soft Actor-Critic (SAC)