Implementing RL agents from scratch. Tested on CartPole for discrete action and Pendulum for continuous action.
- Q-Learning
- DQN
- Double DQN
- Dueling DQN
- Prioritized Experience Replay DDQN
- Policy Optimization
- Vanilla Policy Gradient (REINFORCE)
- Actor-Critic
- Advantage Actor-Critic (A2C)
- Deep Deterministic Policy Gradient (DDPG)