This repository contains implementations of common reinforcement learning algorithms including
- Q Learning for wind-blow example (see page 25 of David Silver's Slide)
- (Double) DQN for cart-pole example
- (Actor-critic) Policy gradient for cart-pole example
- Proximal Policy Optimization (with Clipped Surrogate) for BipedalWalker example
- Modern version of Soft Actor-Critic (according to Soft Actor-Critic Algorithms and Applications) for walker example
- Standard version of Soft Actor-Critic (according to Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor) for walker example
- Diversity is All You Need (according to Diversity is All You Need: Learning Skills without a Reward Function) for BipedalWalker example
Skill 8 | Skill 16 | Skill 18 |
---|---|---|