RL gym

Implementing different reinforcement learning algorithms on different gym environments.

These algorithms are implemented in this repo:

A2C
DDPG
Double DQN
Dueling DQN
TD3

And tested on these environments.


Cartpole	Pendulum	Acrobat	Lunar Lander Continuous

A2C

A2C is a on-policy, model-free reinforcement learning algorithm. Here is the pseudo code for A3C which is almost similar to A2C.

Agent trained using A2C playing Acrobat game.

DDPG

DDPG is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for DDPG

Agent trained using DDPG playing lunar lander game.

Double_DQN

Double DQN is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for Double DQN

Agent trained using Double DQN playing Cartpole game.

Dueling_DQN

Similar to DDQN, dueling network contains two separate estimators: one for the state value function and one for the state-dependent action advantage function.

Formula for the decomposition of Q-value:

θ is shared parameter for the network.
α parameterizes output stream for advantage function Α.
β parameterizes output stream for value function V.

Agent trained using Dueling DQN playing Acrobat game.

TD3

TD3 is a off-policy, model-free reinforcement learning algorithm. Here is the pseudo code for TD3

Agent trained using TD3 playing Pendulum game.

vstark21/RL_gym

RL gym

A2C

DDPG

Double_DQN

Dueling_DQN

TD3

References