alfoudari/prop

A library of Reinforcment Learning agents

PythonMIT

prop

prop is a library of Reinforcment Learning agents implemented in pytorch.

Algorithms

	Model	Policy
DQN	Model-Free	Off-Policy
A2C	Model-Free	On-Policy

DQN

Deep Q-Learning is a variant of Q-learning with a deep neural network used for estimating Q-values (hence DQN; Deep Q-Network).

Both DQN and DDQN (Double DQN) are implemented.

A2C

Advantage Actor Critic is a variant of Actor-Critic that:

Uses a neural network to approximate a policy and a value function.
Computes the advantage of an action to scale the computed gradients. This acts as a vote of confidence (or skepticism) on actions produced by the actor.