/RL_gym

Implementing different reinforcement learning algorithms on different gym environments and comparing results.

Primary LanguageJupyter NotebookMIT LicenseMIT

RL gym

Implementing different reinforcement learning algorithms on different gym environments.

These algorithms are implemented in this repo:

And tested on these environments.

Cartpole
Pendulum
Acrobat
Lunar Lander Continuous

A2C

A2C is a on-policy, model-free reinforcement learning algorithm. Here is the pseudo code for A3C which is almost similar to A2C.

Agent trained using A2C playing Acrobat game.

DDPG

DDPG is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for DDPG


Agent trained using DDPG playing lunar lander game.

Double_DQN

Double DQN is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for Double DQN

Agent trained using Double DQN playing Cartpole game.

Dueling_DQN

Similar to DDQN, dueling network contains two separate estimators: one for the state value function and one for the state-dependent action advantage function.

Formula for the decomposition of Q-value:

  • θ is shared parameter for the network.
  • α parameterizes output stream for advantage function Α.
  • β parameterizes output stream for value function V.
Agent trained using Dueling DQN playing Acrobat game.

TD3

TD3 is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for TD3


Agent trained using TD3 playing Pendulum game.

References

© V I S H W A S