prop is a library of Reinforcment Learning agents implemented in pytorch.
Model | Policy | |
---|---|---|
DQN | Model-Free | Off-Policy |
A2C | Model-Free | On-Policy |
Deep Q-Learning is a variant of Q-learning with a deep neural network used for estimating Q-values (hence DQN; Deep Q-Network).
Both DQN and DDQN (Double DQN) are implemented.
Advantage Actor Critic is a variant of Actor-Critic that:
- Uses a neural network to approximate a policy and a value function.
- Computes the advantage of an action to scale the computed gradients. This acts as a vote of confidence (or skepticism) on actions produced by the actor.